Real-life Machine Learning (ML) workloads typically require more than training and predicting: data often needs to be pre-processed and post-processed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm, but a sequence of algorithms that will collaborate in delivering predictions from raw data. In this session, we’ll first show you how to use Apache Spark MLlib to build ML pipelines, and we’ll discuss scaling options when datasets grow huge. We’ll then show how to how implement inference pipelines on Amazon SageMaker, using Apache Spark, Scikit-learn, as well as ML algorithms implemented by Amazon.