Type: lib.beautifulsoup.WebsiteContentExtractor
Namespace: lib.beautifulsoup
Description
Extract main content from a website, removing navigation, ads, and other non-essential elements. scrape, web scraping, content extraction, text analysis
Use cases:
- Clean web content for further analysis
- Extract article text from news websites
- Prepare web content for summarization
Properties
| Property | Type | Description | Default |
|---|---|---|---|
| html_content | str |
The raw HTML content of the website. | `` |
Outputs
| Output | Type | Description |
|---|---|---|
| output | str |
Metadata
Related Nodes
Browse other nodes in the lib.beautifulsoup namespace.