🛸 next-generation crawling system
Isoxya is a web crawler and data processing system, representing years of research into building a next-generation web crawler. It can process websites with tens of millions of pages, and extract and transform that data in myriad ways, including streaming data into Elasticsearch.
🌱 coming soon
A limited version of Isoxya is planned to launch in 2020, with more features available commercially at a later date. It’s already begun crawling sites on the public internet. Subscribe to Nic Williams’s newsletter to keep up-to-date about the development and launch process.
Isoxya’s spiders would like to visit you! Even before Isoxya launches, some data is available—for free! To invite some spiders to your web, or to request access to data if Isoxya’s spiders have already found you, get in touch.
crawling as a service
You concentrate on your core product; Isoxya concentrates on processing the data and streaming it to you.
Multi-computer, designed for close to 24/7 operation, with automated error recovery and backlog queues.
Crawls typically start and begin streaming data within seconds; no ‘crawl finalisation’ stage; analyse data immediately.
Tested with sites with millions of pages; designed to scale to sites with tens of millions of pages.
Supports many-tiny-site workloads; able to process tiny sites end-to-end within seconds, cost-effectively.
Not just an SEO crawler: multi-industry, multi-purpose; spellchecking, data mining, machine learning…