Spark FP-Growth Backend

EasyMiner supports an alternate mining backend built on top of Apache Spark/Hadoop. As shown in the figure below, this mining backend replaces the default open source mining backend built on top of the R ecosystem.

The Spark backend is suitable for larger datasets, which can benefit from parallel computation distributed over multiple machines. The Spark backend also uses FP-Growth frequent pattern mining algorithm instead of apriori. FP-Growth is generally considered as faster than apriori. However, for smaller datasets using apriori with the R backend is recommended as it provides faster response times.

The Spark backend is provided under a custom license. In case of interest, please contact the EasyMiner team.

Information regarding the Spark-based backend is available on GitHub.