Isoxya plugin: Crawler HTML
Isoxya plugin: Crawler HTML is an open-source (BSD 3-Clause) processor plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to provide a core run loop for the crawling engine, receiving data for each page post-request, parsing it as static HTML, constructing URL metadata, and responding with a set of outbound URLs.
{
"data": {
"status": 200,
"method": "GET",
"header": {
"Vary": "Accept-Encoding",
"Content-Type": "text/html; charset=UTF-8",
"Content-Encoding": "gzip",
"Etag": "\"3147526947+gzip\"",
"Expires": "Thu, 04 Feb 2021 12:54:27 GMT",
"Age": "596804",
"Last-Modified": "Thu, 17 Oct 2019 07:18:26 GMT",
"Date": "Thu, 28 Jan 2021 12:54:27 GMT",
"Server": "ECS (bsa/EB16)",
"Content-Length": "648",
"Cache-Control": "max-age=604800",
"X-Cache": "HIT"
},
"err": null,
"duration": {
"denominator": 1000000000,
"numerator": 97712189
}
},
"urls": [
"https://www.iana.org/domains/example"
]
}
Isoxya plugin: Elasticsearch
Isoxya plugin: Elasticsearch is an open-source (BSD 3-Clause) streamer plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to stream data into an Elasticsearch cluster, making it possible to query using all the normal features provided by Elasticsearch and Kibana.
{
"took": 460,
"errors": false,
"items": [
{
"index": {
"_index": "isoxya.1df9ee03-3c25-4ea6-9276-d4c7a58de332.2021-01-28",
"_type": "_doc",
"_id": "1efe7f46f5048ce328233f8591efd14dc0b58275b3d90e3ae8339b8f4b6883f6.1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
Isoxya plugin: Spellchecker
Isoxya plugin: Spellchecker is an open-source (BSD 3-Clause) processor plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to provide spellchecking capabilities to entire websites, even if they have millions of pages.
[
{
"paragraph": "Global heating is increesing droughts,",
"results": [
{
"correct": false,
"offset": 19,
"status": "miss",
"suggestions": [
"increasing",
"screening",
"resining",
"cresting",
"resisting"
],
"word": "increesing"
}
]
}
]