Type: lib.beautifulsoup.HTMLToText
Namespace: lib.beautifulsoup
Description
Converts HTML to plain text by removing tags and decoding entities using BeautifulSoup. html, text, convert
Use cases:
- Cleaning HTML content for text analysis
- Extracting readable content from web pages
- Preparing HTML data for natural language processing
Properties
| Property | Type | Description | Default |
|---|---|---|---|
| text | str |
`` | |
| preserve_linebreaks | bool |
Convert block-level elements to newlines | True |
Outputs
| Output | Type | Description |
|---|---|---|
| output | str |
Metadata
Related Nodes
Browse other nodes in the lib.beautifulsoup namespace.