API Tutorial
Do you want to use full EasyMiner functionality in your own project, or simply automate running of tasks that you have tuned in the user interface? Try our REST API.
The main, top-level API offering the full functionality of EasyMiner is provided by the integration component EasyMinerCenter. If you want to use APIs of the individual services, you can also do that. This is documented in the For Developers section.
Requirements
To use EasyMiner API, you need to have access to the configured EasyMiner system. You can use your own installation, or use our demo server.
On the selected EasyMiner server, go to the user interface and create a new user account. Then click on your user image (your photo or generic user icon) in the top right corner of the screen, use the link "Show my profile" and copy your API key. The URL of the profile page is: <easyminercenter-url>/em/user/details
The API key has to be send in all requests to identify the user account.
API documentation
The API is fully documented using Swagger documentation. The API endpoint and the API documentation are available at: <easyminercenter-url>/api
You can look at the Swagger documentation at our demo server.
Data mining using API - usage examples
Jupyter notebooks
For i simple start with using REST API, we recommend you to try new, fully commented examples in form of jupyter notebooks. These examples are based on dataset ESIF from UCI repository. You can find them in github repository:
Rule mining
A complex, commented example is available on GitHub - see the commented code...
Python client
For evaluation purposes, we prepared a complex benchmarking use case on 40 datasets from the UCI repository. This benchmarking suite is written in Python. This project is stored in KIZI/EasyMiner-Evaluation GitHub repository.
A part of this project is a Python client for the EasyMiner REST API.
Simple example
1. Upload data in CSV
headers = {"Accept": "application/json"} files = {("file", open(CSV_FILE, 'rb'))} r = requests.post(API_URL + '/datasources?separator=' + urllib.parse.quote(CSV_SEPARATOR) + '&encoding=' + CSV_ENCODING + '&type=limited&apiKey=' + API_KEY, files=files, headers=headers) datasource_id = r.json()["id"]
2. Create miner
headers = {'Content-Type': 'application/json', "Accept": "application/json"} json_data = json.dumps({"name": "TEST MINER", "type": "cloud", "datasourceId": datasource_id}) r = requests.post(API_URL + "/miners?apiKey=" + API_KEY, headers=headers, data=json_data.encode()) miner_id = r.json()["id"]
3. Preprocess data – generate data fields from attributes stored in data source
The user defines preprocessing for each data field. it is also possible to generate multiple attributes from one data field.
headers = {'Content-Type': 'application/json', "Accept": "application/json"} r = requests.get(API_URL + '/datasources/' + str(datasource_id) + '?apiKey=' + API_KEY, headers=headers) datasource_columns = r.json()['column'] attributes_columns_map = {} for col in datasource_columns: column = col["name"] json_data = json.dumps( {"miner": miner_id, "name": column, "columnName": column, "specialPreprocessing": "eachOne"}) r = requests.post(API_URL + "/attributes?apiKey=" + API_KEY, headers=headers, data=json_data.encode()) if r.status_code != 201: break # error occured attributes_columns_map[column] = r.json()['name'] # map of created attributes (based on the existing data fields)
4. Define association rule mining task
Define attributes for the antecedent and consequent parts of association rules. The attributes can be configured to either appear with any value or constrained to only one fixed value.
This step also entails definition of threshold values on interest measures (confidence, support, lift)
# define data mining task antecedent = [] consequent = [] # prepare antecedent pattern if len(ANTECEDENT_COLUMNS): # add to antecedent only fields defined in the constant for column in ANTECEDENT_COLUMNS: antecedent.append({"attribute":attributes_columns_map[column]}) else: # add to antecedent all fields not used in consequent for (column, attribute_name) in attributes_columns_map.items(): if not(column in CONSEQUENT_COLUMNS): antecedent.append({"attribute": attribute_name}) # prepare consequent pattern for column in CONSEQUENT_COLUMNS: consequent.append({"attribute": attributes_columns_map[column]}) json_data = json.dumps({"miner": miner_id, "name": "Test task", "limitHits": 1000, "IMs": [ { "name": "CONF", "value": MIN_CONFIDENCE }, { "name": "SUPP", "value": MIN_SUPPORT } ], "antecedent": antecedent, "consequent": consequent }) # define new data mining task r = requests.post(API_URL + "/tasks/simple?apiKey=" + API_KEY, headers=headers, data=json_data.encode()) print("create task response code:" + str(r.status_code)) task_id = str(r.json()["id"])
5. Execute the mining task
r = requests.get(API_URL + "/tasks/" + task_id + "/start?apiKey=" + API_KEY, headers=headers) while True: time.sleep(1) # check state r = requests.get(API_URL + "/tasks/" + task_id + "/state?apiKey=" + API_KEY, headers=headers) task_state = r.json()["state"] print("task_state:" + task_state) if task_state == "solved": break if task_state == "failed": print("task failed executing") break
6. Export the results (in PMML AssociationModel, GUHA PMML or simple JSON)
# export rules in JSON format headers = {"Accept": "application/json"} r = requests.get(API_URL + '/tasks/' + task_id + '/rules?apiKey=' + API_KEY, headers=headers) task_rules = r.json() # export of standardized PMML AssociationModel r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=associationmodel&apiKey=' + API_KEY) pmml = r.text # export of GUHA PMML r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=guha&apiKey=' + API_KEY) guha_pmml = r.text
Anomaly detection
The same steps until 3. Preprocess data from above.
4. Define outlier detection mining task
headers = {'Content-Type': 'application/json', "Accept": "application/json"} json_data = json.dumps({"miner": miner_id, "minSupport": min_support}) r = requests.post(API_URL + "/outliers-tasks?apiKey=" + API_KEY, headers=headers, data=json_data.encode()) outlier_task_id = r.json()["id"]
5. Execute the anomaly detection task
r = requests.get(API_URL + "/outliers-tasks/" + outlier_task_id + "/start?apiKey=" + API_KEY, headers=headers) while True: time.sleep(1) # check state r = requests.get(API_URL + "/outliers-tasks/" + outlier_task_id + "/state?apiKey=" + API_KEY, headers=headers) task_state = r.json()["state"] print("task_state:" + task_state) if task_state == "solved": break if task_state == "failed": print("task failed executing") break
6. Read the results
offset = 0 limit = 10 headers = {"Accept": "application/json"} r = requests.get(API_URL + '/outliers-tasks/' + outlier_task_id + '/outliers?apiKey=' + API_KEY + '&offset=' + offset + '&limit=' + limit, headers=headers) outliers = r.json()['outlier']