Type: lib.pdfplumber.ExtractTables
Namespace: lib.pdfplumber
Description
Extract tables from a PDF file into dataframes. pdf, tables, dataframe, extract
Use cases:
- Extract tabular data from PDF documents
- Convert PDF tables to structured data formats
- Process PDF tables for analysis
- Import PDF reports into data analysis pipelines
Properties
| Property | Type | Description | Default |
|---|---|---|---|
document |
The PDF document to extract tables from | {'type': 'document', 'uri': '', 'asset_id': None, 'data': None} |
|
| start_page | int |
First page to extract tables from (0-based, None for first page) | 0 |
| end_page | int |
Last page to extract tables from (0-based, None for last page) | 4 |
| table_settings | Dict[Any, Any] |
Settings for table extraction algorithm | {'vertical_strategy': 'text', 'horizontal_strategy': 'text', 'snap_tolerance': 3, 'join_tolerance': 3, 'edge_min_length': 3, 'min_words_vertical': 3, 'min_words_horizontal': 1, 'keep_blank_chars': False, 'text_tolerance': 3, 'text_x_tolerance': 3, 'text_y_tolerance': 3} |
Outputs
| Output | Type | Description |
|---|---|---|
| output | List[dataframe] |
Metadata
Related Nodes
Browse other nodes in the lib.pdfplumber namespace.