Getting Started with Spark

MLeap Spark integration provides serialization of Spark-trained ML pipelines to MLeap Bundles. MLeap also provides several extensions to Spark, including enhanced one hot encoding, one vs rest models and unary/binary math transformations.

Adding MLeap Spark to Your Project

MLeap Spark and its snapshots are hosted on Maven Central and so should be easily accessible via a maven build file or SBT. MLeap is currently cross-compiled for Scala versions 2.10 and 2.11. We try to maintain Scala compatibility with Spark.

Using SBT

libraryDependencies += "ml.combust.mleap" %% "mleap-spark" % "0.8.0"

To use MLeap extensions to Spark:

libraryDependencies += "ml.combust.mleap" %% "mleap-spark-extension" % "0.8.0"

Using Maven

<dependency>
  <groupId>ml.combust.mleap</groupId>
  <artifactId>mleap-spark_2.11</artifactId>
  <version>0.8.0</version>
</dependency>

To use MLeap extensions to Spark:

<dependency>
  <groupId>ml.combust.mleap</groupId>
  <artifactId>mleap-spark-extension_2.11</artifactId>
  <version>0.8.0</version>
</dependency>
  1. See build instructions to build MLeap from source.
  2. See core concepts for an overview of ML pipelines.
  3. See Spark documentation to learn how to train ML pipelines in Spark.
  4. See Demo notebooks on how to use MLeap with PySpark to serialize your pipelines to Bundle.ML and score with MLeap.

results matching ""

    No results matching ""