Isoxya plugin: Crawler HTML

Isoxya plugin: Crawler HTML is an open-source (BSD 3-Clause) processor plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to provide a core run loop for the crawling engine, receiving data for each page post-request, parsing it as static HTML, constructing URL metadata, and responding with a set of outbound URLs.

view or fork on GitHub download from Docker Hub

{
  "data": {
    "status": 200,
    "method": "GET",
    "header": {
      "Vary": "Accept-Encoding",
      "Content-Type": "text/html; charset=UTF-8",
      "Content-Encoding": "gzip",
      "Etag": "\"3147526947+gzip\"",
      "Expires": "Thu, 04 Feb 2021 12:54:27 GMT",
      "Age": "596804",
      "Last-Modified": "Thu, 17 Oct 2019 07:18:26 GMT",
      "Date": "Thu, 28 Jan 2021 12:54:27 GMT",
      "Server": "ECS (bsa/EB16)",
      "Content-Length": "648",
      "Cache-Control": "max-age=604800",
      "X-Cache": "HIT"
    },
    "err": null,
    "duration": {
      "denominator": 1000000000,
      "numerator": 97712189
    }
  },
  "urls": [
    "https://www.iana.org/domains/example"
  ]
}

Isoxya plugin: Elasticsearch

Isoxya plugin: Elasticsearch is an open-source (BSD 3-Clause) streamer plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to stream data into an Elasticsearch cluster, making it possible to query using all the normal features provided by Elasticsearch and Kibana.

view or fork on GitHub download from Docker Hub

{
  "took": 460,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "isoxya.1df9ee03-3c25-4ea6-9276-d4c7a58de332.2021-01-28",
        "_type": "_doc",
        "_id": "1efe7f46f5048ce328233f8591efd14dc0b58275b3d90e3ae8339b8f4b6883f6.1",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}

Isoxya plugin: Spellchecker

Isoxya plugin: Spellchecker is an open-source (BSD 3-Clause) processor plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to provide spellchecking capabilities to entire websites, even if they have millions of pages.

view or fork on GitHub download from Docker Hub

[
  {
    "paragraph": "Global heating is increesing droughts,",
    "results": [
      {
        "correct": false,
        "offset": 19,
        "status": "miss",
        "suggestions": [
          "increasing",
          "screening",
          "resining",
          "cresting",
          "resisting"
        ],
        "word": "increesing"
      }
    ]
  }
]