Extract Tables

Type: lib.pdfplumber.ExtractTables

Namespace: lib.pdfplumber

Description

Extract tables from a PDF file into dataframes. pdf, tables, dataframe, extract

Use cases:
- Extract tabular data from PDF documents
- Convert PDF tables to structured data formats
- Process PDF tables for analysis
- Import PDF reports into data analysis pipelines

Properties

Property	Type	Description	Default
pdf	`document`	The PDF document to extract tables from	`{'type': 'document', 'uri': '', 'asset_id': None, 'data': None}`
start_page	`int`	First page to extract tables from (0-based, None for first page)	`0`
end_page	`int`	Last page to extract tables from (0-based, None for last page)	`4`
table_settings	`Dict[Any, Any]`	Settings for table extraction algorithm	`{'vertical_strategy': 'text', 'horizontal_strategy': 'text', 'snap_tolerance': 3, 'join_tolerance': 3, 'edge_min_length': 3, 'min_words_vertical': 3, 'min_words_horizontal': 1, 'keep_blank_chars': False, 'text_tolerance': 3, 'text_x_tolerance': 3, 'text_y_tolerance': 3}`

Outputs

Output	Type	Description
output	`List[dataframe]`

Metadata

Browse other nodes in the lib.pdfplumber namespace.

Extract Tables

Description

Properties

Outputs

Metadata

Related Nodes