Self-cleaning Web Crawler based on changes in our sitemap
To enhance our AI agent's capabilities, we need an automated and self-cleaning Web Crawler. This Web Crawler should run at a certain frequency, automatically adding new URLs and removing those no longer present on the sitemap. This feature would ensure our AI agent has the most up-to-date information, improving its efficiency and accuracy.
Fleur Nouwens
Hey thanks for your feedback! Following up on this:
- What specific frequency do you envision for the WebCrawler to run (e.g., daily, weekly)?
- Are there any specific types of URLs or content that should be prioritized or excluded by the WebCrawler?
- How should the WebCrawler handle URLs that are temporarily unavailable or return errors?