EasyMiner easy association rule mining, classification and anomaly detection

API Tutorial

Do you want to use full EasyMiner functionality in your own project, or simply automate running of tasks that you have tuned in the user interface? Try our REST API.

The main, top-level API offering the full functionality of EasyMiner is provided by the integration component EasyMinerCenter. If you want to use  APIs of the individual services, you can also do that. This is documented in the For Developers section.

Requirements

To use EasyMiner API, you need to have access to the configured EasyMiner system. You can use your own installation, or use our demo server.

On the selected EasyMiner server, go to the user interface and create a new user account. Then click on your user image (your photo or generic user icon) in the top right corner of the screen, use the link "Show my profile" and copy your API key. The URL of the profile page is: <easyminercenter-url>/em/user/details

 

The API key has to be send in all requests to identify the user account.

API documentation

The API is fully documented using Swagger documentation. The API endpoint and the API documentation are available at: <easyminercenter-url>/api

You can look at the Swagger documentation at our demo server.

Data mining using API - usage examples

Jupyter notebooks

For i simple start with using REST API, we recommend you to try new, fully commented examples in form of jupyter notebooks. These examples are based on dataset ESIF from UCI repository. You can find them in github repository:

Rule mining

A complex, commented example is available on GitHub - see the commented code...

Python client

For evaluation purposes, we prepared a complex benchmarking use case  on 40 datasets from the UCI repository. This benchmarking suite is written in Python. This project is stored in KIZI/EasyMiner-Evaluation GitHub repository.

A part of this project is a Python client for the EasyMiner REST API.

Simple example

1. Upload data in CSV

headers = {"Accept": "application/json"}
files = {("file", open(CSV_FILE, 'rb'))}
r = requests.post(API_URL + '/datasources?separator=' + urllib.parse.quote(CSV_SEPARATOR) + '&encoding=' + CSV_ENCODING + '&type=limited&apiKey=' + API_KEY, files=files, headers=headers)
datasource_id = r.json()["id"]

2. Create miner

headers = {'Content-Type': 'application/json', "Accept": "application/json"}
json_data = json.dumps({"name": "TEST MINER", "type": "cloud", "datasourceId": datasource_id})
r = requests.post(API_URL + "/miners?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
miner_id = r.json()["id"]

3. Preprocess data – generate data fields from attributes stored in data source

The user defines preprocessing for each data field. it is also possible to generate multiple attributes from one data field.

headers = {'Content-Type': 'application/json', "Accept": "application/json"}
r = requests.get(API_URL + '/datasources/' + str(datasource_id) + '?apiKey=' + API_KEY, headers=headers)
datasource_columns = r.json()['column']
attributes_columns_map = {}
for col in datasource_columns:
    column = col["name"]
    json_data = json.dumps(
        {"miner": miner_id, "name": column, "columnName": column, "specialPreprocessing": "eachOne"})
    r = requests.post(API_URL + "/attributes?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
    if r.status_code != 201:
        break  # error occured
    attributes_columns_map[column] = r.json()['name']  # map of created attributes (based on the existing data fields)

4. Define association rule mining task

Define attributes for the antecedent and consequent parts of association rules. The attributes can be configured  to either appear with any value or constrained to only one fixed value.

This step also entails definition of threshold values on interest measures (confidence, support, lift)

# define data mining task
antecedent = []
consequent = []

# prepare antecedent pattern
if len(ANTECEDENT_COLUMNS):
    # add to antecedent only fields defined in the constant
    for column in ANTECEDENT_COLUMNS:
         antecedent.append({"attribute":attributes_columns_map[column]})
else:
    # add to antecedent all fields not used in consequent
    for (column, attribute_name) in attributes_columns_map.items():
        if not(column in CONSEQUENT_COLUMNS):
            antecedent.append({"attribute": attribute_name})

# prepare consequent pattern
for column in CONSEQUENT_COLUMNS:
   consequent.append({"attribute": attributes_columns_map[column]})

    json_data = json.dumps({"miner": miner_id,
                            "name": "Test task",
                            "limitHits": 1000,
                            "IMs": [
                                {
                                    "name": "CONF",
                                    "value": MIN_CONFIDENCE
                                },
                                {
                                    "name": "SUPP",
                                    "value": MIN_SUPPORT
                                }
                            ],
                            "antecedent": antecedent,
                            "consequent": consequent
                            })
# define new data mining task
r = requests.post(API_URL + "/tasks/simple?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = str(r.json()["id"])

5. Execute the mining task

r = requests.get(API_URL + "/tasks/" + task_id + "/start?apiKey=" + API_KEY, headers=headers)
while True:
    time.sleep(1)
    # check state
    r = requests.get(API_URL + "/tasks/" + task_id + "/state?apiKey=" + API_KEY, headers=headers)
    task_state = r.json()["state"]
    print("task_state:" + task_state)
    if task_state == "solved":
        break
    if task_state == "failed":
        print("task failed executing")
        break

6. Export the results (in PMML AssociationModel, GUHA PMML or simple JSON)

# export rules in JSON format
headers = {"Accept": "application/json"}
r = requests.get(API_URL + '/tasks/' + task_id + '/rules?apiKey=' + API_KEY, headers=headers)
task_rules = r.json()

# export of standardized PMML AssociationModel
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=associationmodel&apiKey=' + API_KEY)
pmml = r.text

# export of GUHA PMML
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=guha&apiKey=' + API_KEY)
guha_pmml = r.text

Anomaly detection

The same steps until 3. Preprocess data from above.

4. Define outlier detection mining task

headers = {'Content-Type': 'application/json', "Accept": "application/json"}
json_data = json.dumps({"miner": miner_id, "minSupport": min_support})
r = requests.post(API_URL + "/outliers-tasks?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
outlier_task_id = r.json()["id"]

5. Execute the anomaly detection task

r = requests.get(API_URL + "/outliers-tasks/" + outlier_task_id + "/start?apiKey=" + API_KEY, headers=headers)
while True:
    time.sleep(1)
    # check state
    r = requests.get(API_URL + "/outliers-tasks/" + outlier_task_id + "/state?apiKey=" + API_KEY, headers=headers)
    task_state = r.json()["state"]
    print("task_state:" + task_state)
    if task_state == "solved":
        break
    if task_state == "failed":
        print("task failed executing")
        break

6. Read the results

offset = 0
limit = 10
headers = {"Accept": "application/json"}
r = requests.get(API_URL + '/outliers-tasks/' + outlier_task_id + '/outliers?apiKey=' + API_KEY + '&offset=' + offset + '&limit=' + limit, headers=headers)
outliers = r.json()['outlier']