Content Scraping
Configure how ChattyBox crawls your website and indexes content for AI responses.
Scraping Modes
ChattyBox supports three scraping modes:
1. Homepage Only
Scrapes just your homepage. Best for simple landing pages or when you want minimal content indexed.
2. Sitemap Mode (Recommended)
Provide a sitemap URL and ChattyBox will scrape all pages listed in it. This is the most reliable way to ensure all your pages are indexed.
https://example.com/sitemap.xml3. Manual URLs
Specify exact URLs to scrape, one per line. Use this when you want precise control over which pages are indexed.
https://example.com/pricing
https://example.com/features
https://example.com/about
https://example.com/faqPage Limits
The number of pages you can scrape depends on your plan:
| Plan | Pages per Site |
|---|---|
| Free | 10 |
| Starter | 50 |
| Pro | 100 |
| Business | 250 |
Auto Re-scraping
Keep your chatbot up-to-date by enabling automatic re-scraping. The frequency depends on your plan:
| Plan | Refresh Frequency |
|---|---|
| Free | Weekly |
| Starter | Daily |
| Pro | Hourly |
| Business | Every 15 minutes |
Content Extraction
ChattyBox extracts:
- Page title - The
<title>tag - Main content - Text from
<main>,<article>, or<body> - Headings - All
<h1>through<h6>tags - Paragraphs - All
<p>content - Lists -
<ul>and<ol>items
Ignored Content
ChattyBox automatically ignores:
- Navigation menus
- Footer content
- Scripts and styles
- Hidden elements
- Cookie banners
Troubleshooting
Pages not being scraped?
- Check that the URL is publicly accessible
- Ensure your
robots.txtallows our crawler - Verify the page isn't behind authentication
Content missing from responses?
- Re-scrape the page to get the latest content
- Check that content isn't loaded via JavaScript (we render JS)
- Ensure the content is in the main body, not in iframes