Getting Started with PySpark
MLeap PySpark integration provides serialization of PySpark-trained ML pipelines to MLeap Bundles. MLeap also provides several extensions to Spark, including enhanced one hot encoding and one vs rest models. Unlike Mleap<>Spark integration, MLeap doesn't' yet provide PySpark integration with Spark Extensions transformers.
Adding MLeap Spark to Your Project
Before adding MLeap Pyspark to your project, you first have to compile and add MLeap Spark.
MLeap PySpark is available in the combust/mleap github repository in the python package.
To add MLeap to your PySpark project, just clone the git repo, add the
path, and import
git clone firstname.lastname@example.org:combust/mleap.git
Then in your python environment do:
import sys sys.path.append('<git directory>/mleap/python') import mleap.pyspark
Note: the import of
mleap.pyspark needs to happen before any other PySpark
libraries are imported.
Note: If you are working from a notebook environment, be sure to take a look at instructions of how to set up MLeap PySpark with:
PIP support for PySpark is coming soon.
To use MLeap extensions to PySpark: