Linked Data RDF Mining

The EasyMiner system is now using the EasyMiner-Rdf module for rules mining from RDF graphs. This module uses rdfrules library and is able to load RDF graphs in formats n-triples and turtle. In datasets, created by RDF the specification, we can easily identify data types, schemas, relations between objects and extend data with others public datasets thanks to the linked data paradigm. With these special features we can discover more knowledges from RDF data against the typical data formats for rules mining (like transactions or tables). The EasyMiner-Rdf package offers several operations for RDF data processing and rules mining:

The EasyMiner-Rdf module uses AMIE+ algorithm for RDF rules mining. This algorithm uses standard measures (like support, confidence, head coverage, etc.) as minimal thresholds for the search space pruning (like other well-known algorithms: apriori, fp-growth). It requires to have the whole dataset loaded in the memory as a set of hash maps; therefore it has considerable memory requirements especially for larger datasets. The algorithm mines rules in the Horn clauses form where a rule has just one atom on the right side and several atoms on the left side, which are separated by logical conjunction. In the rdfrules library, there are also implemented several extensions of the AMIE+ algorithm, such as rule pattern and constraints application during mining and additional interest measures.


    (?x <hasChild> ?c) ^ (?y <hasChild> ?c) => (?x <isMarriedTo> ?y)

Please note, that the EasyMiner-Rdf module is still under development and its releases are considered only for experimental purposes.

Integration with the EasyMiner system

Currently, the EasyMiner-Rdf module is in the experimental version, but it is integrated into the EasyMiner-Miner module as a remote service (see the developer part). This version uses only basic functions of the rdfrules library; we can only mine rules from RDF data by basic interest measures thresholds and the AMIE+ algorithm (without extensions). For now, other operations (like dataset transformations, preprocessing and rules postprocessing) are not allowed. We plan to add other functions as soon as there are real user needs. The RESTful operation is in detail described in the swagger doc. See the short description below:

RDF mining init HTTP request

Url: <easyminer-miner-address>/<task-id>
Method: POST
Content-Type: multipart/form-data
Accept: application/json
Parameters:
- name: amie
- timeout: integer, max running time for the task in minutes (default is 10 minutes)
- format:
- body: dataset body in n-triples or turtle format in UTF-8 encoding. You should use Content-Type for this body part: application/n-triples or text/turtle
- min-headsize: integer, range: $\left \langle 1; \infty \right )$ , default: 100
- min-head-coverage: double, range: $\left \langle 0; 1 \right \rangle$ , default: 0.05
- min-confidence: double, range: $\left \langle 0; 1 \right \rangle$ , default: not-set
- max-rule-length: integer, range: $\left \langle 1; 5 \right \rangle$ , default: 3
- topk: integer, range: $\left \langle 1; \infty \right )$ , default: not-set
- instances: enums, not-set = without instances, all = with instances, objects = with instances only on the object position; default: not-set
- duplicit-predicates: not-set = without duplicit predicates, set = with duplicit predicates in one rule; default: not-set
Response codes:
- 202: task has been accepted
- 400: invalid task parameters

RDF mining status HTTP request

Url: <easyminer-miner-address>/<task-id>
Method: GET
Accept: application/json
Response codes:
- 404: task does not exist
- 202: task is still in progress; it returns log messages.
- 500: error during mining; it returns an error message.
- 200: task has been successfully completed; it returns a rules list in the JSON format.

You can use a simple demo tool and try the service now!