For businesses that have outgrown
the hobbyist or R&D stages.
Adds high availability, error recovery,
and horizontal scaling.
Installation on Kubernetes is relatively straightforward, and example manifests are supplied. External dependencies can be configured single-availability to get you up and running quickly, or offloaded for high-availability.
Performance is upgraded substantially, adding PostgreSQL as an external database replacing SQLite, Redis as an in-memory cache, and RabbitMQ as a clusterable message broker.
Reliability is massively improved, bringing high availability through the use of redundancy and external services, and error recovery of crawling, processing, and streaming queues.
Scalability makes large crawls a possibility, supporting horizontal scaling for when a single computer is not enough, concurrent crawls in parallel, and multi-process workloads.
Configuration possibilities include custom user agents and custom rate limits. Out-the-box robots.txt compliance respects site owners' wishes. Cancellable crawls clean queues.
Efficiency is tunable using limitable max pages and limitable max depth. External links validation checks sites in parallel. List crawls support large sets of URLs.
Containers are controlled by dynamic resource management, facilitating multi-channel scalable crawlers, scalable processors, and scalable streamers.
Support or custom development is available from Isoxya's creator, with years of experience as devops engineer, programmer, and data architect, formerly VP of Engineering of an SEO web crawling company.