Content Scraping

Configure how ChattyBox crawls your website and indexes content for AI responses.

Scraping Modes

ChattyBox supports three scraping modes:

1. Homepage Only

Scrapes just your homepage. Best for simple landing pages or when you want minimal content indexed.

2. Sitemap Mode (Recommended)

Provide a sitemap URL and ChattyBox will scrape all pages listed in it. This is the most reliable way to ensure all your pages are indexed.

https://example.com/sitemap.xml

3. Manual URLs

Specify exact URLs to scrape, one per line. Use this when you want precise control over which pages are indexed.

https://example.com/pricing https://example.com/features https://example.com/about https://example.com/faq

Page Limits

The number of pages you can scrape depends on your plan:

PlanPages per Site
Free10
Starter50
Pro100
Business250

Auto Re-scraping

Keep your chatbot up-to-date by enabling automatic re-scraping. The frequency depends on your plan:

PlanRefresh Frequency
FreeWeekly
StarterDaily
ProHourly
BusinessEvery 15 minutes

Content Extraction

ChattyBox extracts:

  • Page title - The <title> tag
  • Main content - Text from <main>, <article>, or <body>
  • Headings - All <h1> through <h6> tags
  • Paragraphs - All <p> content
  • Lists - <ul> and <ol> items

Ignored Content

ChattyBox automatically ignores:

  • Navigation menus
  • Footer content
  • Scripts and styles
  • Hidden elements
  • Cookie banners

Troubleshooting

Pages not being scraped?

  • Check that the URL is publicly accessible
  • Ensure your robots.txt allows our crawler
  • Verify the page isn't behind authentication

Content missing from responses?

  • Re-scrape the page to get the latest content
  • Check that content isn't loaded via JavaScript (we render JS)
  • Ensure the content is in the main body, not in iframes