Isoxya

🛸 next-generation crawling system

Isoxya is a web crawler and data processing system, representing years of research into building a next-generation web crawler. It can process websites with tens of millions of pages, and extract and transform that data in myriad ways, including streaming data into Elasticsearch.

🌱 coming soon

A limited version of Isoxya is planned to launch in 2020, with more features available commercially at a later date. It’s already begun crawling sites on the public internet. Subscribe to Nic Williams’s newsletter to keep up-to-date about the development and launch process.

🕷️ spiders!

Isoxya’s spiders would like to visit you! Even before Isoxya launches, some data is available—for free! To invite some spiders to your web, or to request access to data if Isoxya’s spiders have already found you, get in touch.

benefits

crawling as a service

You concentrate on your core product; Isoxya concentrates on processing the data and streaming it to you.

scalable

Multi-computer, designed for close to 24/7 operation, with automated error recovery and backlog queues.

fast

Crawls typically start and begin streaming data within seconds; no ‘crawl finalisation’ stage; analyse data immediately.

large crawls

Tested with sites with millions of pages; designed to scale to sites with tens of millions of pages.

tiny crawls

Supports many-tiny-site workloads; able to process tiny sites end-to-end within seconds, cost-effectively.

flexible

Not just an SEO crawler: multi-industry, multi-purpose; spellchecking, data mining, machine learning…

latest posts