Automated and self-cleaning WebCrawler
F
Fireflies
To enhance our AI agent's capabilities, we need an automated and self-cleaning WebCrawler. This WebCrawler should run at a certain frequency, automatically adding new URLs and removing those no longer present on the sitemap. This feature would ensure our AI agent has the most up-to-date information, improving its efficiency and accuracy.
W
Wouter Rosekrans
It would be nice if it was possible to set the frequency with which this update runs. For our situation, it would be ideal if the web crawler could compare the sitemap of the most recent crawl with the current sitemap and then only the changes (remove pages that have been removed from the knowledge base and add new pages).
Fleur Nouwens
Hey Fireflies, thanks for your feedback! Following up on this:
- What specific frequency do you envision for the WebCrawler to run (e.g., daily, weekly)?
- Are there any specific types of URLs or content that should be prioritized or excluded by the WebCrawler?
- How should the WebCrawler handle URLs that are temporarily unavailable or return errors?