Isoxya plugin: Crawler HTML

Isoxya plugin: Crawler HTML is an open-source (BSD 3-Clause) processor plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to provide a core run loop for the crawling engine, receiving data for each page post-request, parsing it as static HTML, constructing URL metadata, and responding with a set of outbound URLs.

view or fork on GitHub download from Docker Hub

This is an open-source preview of Isoxya 2, not yet released.

{
  "data": {
    "header": {
      "Accept-Ranges": "bytes",
      "Age": "453125",
      "Cache-Control": "max-age=604800",
      "Content-Encoding": "gzip",
      "Content-Length": "648",
      "Content-Type": "text/html; charset=UTF-8",
      "Date": "Wed, 30 Sep 2020 14:43:18 GMT",
      "Etag": "\"3147526947\"",
      "Expires": "Wed, 07 Oct 2020 14:43:18 GMT",
      "Last-Modified": "Thu, 17 Oct 2019 07:18:26 GMT",
      "Server": "ECS (dcb/7EA3)",
      "Vary": "Accept-Encoding",
      "X-Cache": "HIT"
    },
    "status_code": 200
  },
  "urls": [
    "https://www.iana.org/domains/example"
  ]
}

Isoxya plugin: Elasticsearch

Isoxya plugin: Elasticsearch is an open-source (BSD 3-Clause) streamer plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to stream data into an Elasticsearch cluster, making it possible to query using all the normal features provided by Elasticsearch and Kibana.

view or fork on GitHub download from Docker Hub

This is an open-source preview of Isoxya 2, not yet released.

{
  "took": 149,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "isoxya.8717f181-0948-4103-81bb-9df141ea2b79.2020-10-01",
        "_type": "_doc",
        "_id": "aaa1a3e08960c796a6db9ce35e181585039c139e4e1362aee291591b612b6fb3.1",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}

Isoxya plugin: Spellchecker

Isoxya plugin: Spellchecker is an open-source (BSD 3-Clause) processor plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to provide spellchecking capabilities to entire websites, even if they have millions of pages.

view or fork on GitHub download from Docker Hub

This is an open-source preview of Isoxya 2, not yet released.

[
  {
    "paragraph": "Global heating is increesing droughts,",
    "results": [
      {
        "correct": false,
        "offset": 19,
        "status": "miss",
        "suggestions": [
          "increasing",
          "screening",
          "resining",
          "cresting",
          "resisting"
        ],
        "word": "increesing"
      }
    ]
  }
]