EasyMiner easy association rule mining, classification and anomaly detection

LISp-Miner backend

LISp-Miner is the original (now alternative) mining backend for EasyMiner.

LISp-Miner is academic pattern mining system developed from mid-1990s.The theoretical foundation of the system is constituted by the GUHA method (see also here) and the observational calculi. LISp-Miner is primarily a desktop Windows application, but it can be run also in Linux using wine.

EasyMiner - LISp-Miner integration

EasyMiner interacts with LISp-Miner using its LM-Connect component,  which is a web application providing the functionality of LISp-Miner through REST API. This component is no longer developed and maintained, and the integration of the current version of EasyMiner and LISp-Miner is thus currently broken.

Unique features of EasyMiner with LISp-Miner backend

  • Negation on attributes
  • Disjunction between attributes
  • Subpatterns allows for scoping logical connectives
  • 18 interest measures, which can be freely combined
  • Mines directly on multivalued attributes, no need to create "items"
  • Dynamic binning operators
  • PMML-based import and export
  • Computing grid support

Description of invidiual features

Negation on attributes

Negation is a very concise way to tell the miner to focus on rules not containing a specific value or a set of values. This specific value might be hard set by the user, or left to be determined automatically. In the latter case, the binning wildcards can be used to automatically merge multiple values.

Negation

Disjunction between attributes.

Disjunction between attributes EasyMiner allows to input rule parts (antecedent, consequent) The disjunction connective can be placed between attributes

Subpatterns - scoping for logical connectives

Disjunction can be added also only on subpattern. To create a subpattern, first hoover with mouse over the attributes that should form the subpattern, in the floating menu which appears click on the star symbol. After you have stared all the attributes, click on "Group marked fields". Note that there must be at least three attributes in the given rule part (antecedent, consequent) for subpatterns to be enabled. Subpatterns cannot be nested, individual subpatterns are always connected by conjunction.

18 interest measures, which can be freely combined

The list of commonly used interest measures: Confidence, Support, Lift, Fischer, Chi-Square

There are also additional interest measures coming from the GUHA theory: Double Founded Implication, Founded Equivalence,Lower Critical Implication,Upper Critical Implication,Lower Critical Equivalence,Upper Critical Equivalence,Double Lower Critical Implication,Double Upper Critical Implication

Also, frequencies from the four field contingency tables can be used as interest measures: a-frequency, b-frequency, c-frequency, d-frequency, r-frequency. s-frequency, k-frequency, l-frequency.

  Sukcedent ¬Sukcedent  
Antecedent a b r
¬Antecedent c d s
  k l  

Four field contingency table

 


Mines directly on multivalued attributes, no need to create "items"

Binning wild cards are uedr to allow multivalued attributes. The individual values are connected with disjunction.

E.g. District(Benesov, Bruntal) is equivallent to District(Benesov) or District(Bruntal).Such result is produced e.g. by setting the Subset with max length 2 binning wildcard on attribute District in the Task setting.

Dynamic binning operators

To allow attributes having many values with low support to be used directly in the mining task without special preprocessing, EasyMiner offers a unique feature – binning wildcards, which allow to group finegrained values on the fly, thus producing ‘items’ with higher support.

 

  • Subset 1-1 wildcard ("Simple wildcard") is the default added when new attribute is tragged to the task pane. This wildcard tells the miner to generate as many ‘items’ as there are values of the attribute. This is similar to other association rule mining systems that support multiple attributes.
  • Subset wildcard with max length n>1 instructs the miner to dynamically merge up to n values into one ‘item’ during mining.
  • Interval wildcard with max length n>1 instructs the minedr to dynamically merge up to n consecutive values into one ‘item’ during mining.
  • Cyclical interval wildcard: same as interval, but the borders of the value range are considered as consecutive
  • Left cut with max length n>1: up to n lowest values in the attribute range are merged. This is useful for involving only extremely low attribute values.
  • Right cut with max length n>1: up to n highest values in the attribute range are merged. This is useful for involving only extremely high attribute values.
  • Cut with max length >1: merge of functionality provided by left cut and right cut
  • One category: Adding a Fixed value attribute to the mining setting allows the user to limit the search space only to rules containing a selected attribute-value pair.