Más contenido relacionado

Presentaciones para ti(20)

Similar a Ai & Data Analytics 2018 - Azure Databricks for data scientist(20)


Más de Alberto Diaz Martin(20)


Ai & Data Analytics 2018 - Azure Databricks for data scientist

  1. 18 Octubre MADRID 2018 AI&DATA ANALYTICS Databricks para Científicos de Datos Alberto Diaz Martin
  2. Alberto Diaz Martin - @adiazcan Alberto Diaz cuenta con más de 15 años de experiencia en la Industria IT, todos ellos trabajando con tecnologías Microsoft. Actualmente, es Chief Technology Innovation Officer en ENCAMINA, liderando el desarrollo de software con tecnología Microsoft, y miembro del equipo de Dirección. Para la comunidad, trabaja como organizador y speaker de las conferencias más relevantes del mundo Microsoft en España, en las cuales es uno de los referentes en SharePoint, Office 365 y Azure. Autor de diversos libros y artículos en revistas profesionales y blogs, en 2013 empezó a formar parte del equipo de Dirección de CompartiMOSS, una revista digital sobre tecnologías Microsoft. Desde 2011 ha sido nombrado Microsoft MVP, reconocimiento que ha renovado por séptimo año consecutivo. Se define como un geek, amante de los smartphones y desarrollador. Fundador de TenerifeDev (, un grupo de usuarios de .NET en Tenerife, y coordinador de SUGES (Grupo de Usuarios de SharePoint de España,
  3. • Infrastructure management • Data exploration and visualization at scale • Time to value - From model iterations to intelligence • Integrating with various ML tools to stitch a solution together • Operationalize ML models to integrate them into applications Challenges for Data Scientists
  4. Machine Learning on Azure Sophisticated pretrained models To simplify solution development Azure Databricks Machine Learning VMs Popular frameworks To build advanced deep learning solutions TensorFlow KerasPytorch Onnx Azure Machine Learning LanguageSpeech … SearchVision On-premises Cloud Edge Productive services To empower data science and development teams Powerful infrastructure To accelerate deep learning Flexible deployment To deploy and manage models on intelligent cloud and edge
  5. Recommended architecture to build e2e ML solutions ServeStore Prep and trainIngest Batch data Streaming data Azure Kubernetes service Power BI Azure analysis services Azure SQL data warehouse Cosmos DB, SQL DB Azure Data Lake Storage Azure Data Factory Azure Event Hubs Azure Databricks Azure Machine Learning service Apps Model Serving Ad-hoc Analysis Operational Databases
  6. What is Azure Databricks? A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, ADLS, Azure Storage, Azure Data Factory, Azure AD, Event Hub, IoT Hub, HDInsight Kafka, SQL DB) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
  7. Optimized Databricks Runtime Engine DATABRICKS I/O High Concurrency Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits A Z U R E D A T A B R I C K S
  8. Infinite Scale, Lower Cost, Zero Management
  9. • SQL, Python, Scala & R Support • Code in your favorite language • Source data from File System, Object stores, HDFS, Database, Pub-Sub systems & Others • Read and write data from/to multiple sources • Optimized for Azure Blob Store, ADLS, SQLDW, Event Hubs & Cosmos DB • File Formats • CSV, JSON, Parquet, Text, ORC, XML & More Your Language, Your Data (Anywhere), Your Format
  10. Demo Azure Databricks
  11. PREP & TRAIN Collect and prepare data Train and evaluate model A B C Operationalize and manage Azure Databricks Azure Data Factory Azure Databricks Azure Databricks Azure ML Services
  12. Collect and prepare all of your data at scale Ingest Azure Data Factory Store Azure Blob Storage Understand and transform Azure Databricks • Leverage open source technologies • Collaborate within teams • Use ML on batch streams • Build in the language of your choice • Leverage scale out topology • Scale compute and storage separately • Integrate with all of your data sources • Create hybrid pipelines • Orchestrate in a code-free environment Leverage best-in-class analytics capabilities Scale without limits Connect to data from any source
  13. Demo EDA (exploratory-data-analysis) with Azure Databricks
  14. Train and evaluate machine learning models • Easily scale up or scale out • Autoscale on serverless infrastructure • Leverage commodity hardware • Determine the best algorithm • Tune hyperparameters to optimize models • Rapidly prototype in agile environments • Collaborate in interactive workspaces • Access a library of battle-tested models • Automate job execution Scale compute resources to meet your needs Quickly determine the right model for your data Simplify model development Automated ML capabilities Azure ML Services Automated ML Scale out clusters Infrastructure Azure Databricks Machine learning Tools Azure Databricks
  15. S P A R K M A C H I N E L E A R N I N G ( M L ) O V E R V I E W  Offers a set of parallelized machine learning algorithms (see next slide)  Supports Model Selection (hyperparameter tuning) using Cross Validation and Train-Validation Split.  Supports Java, Scala or Python apps using DataFrame-based API (as of Spark 2.0). Benefits include: • An uniform API across ML algorithms and across multiple languages • Facilitates ML pipelines (enables combining multiple algorithms into a single pipeline). • Optimizations through Tungsten and Catalyst • Spark MLlib comes pre-installed on Azure Databricks • 3rd Party libraries supported include: H20 Sparkling Water, SciKit- learn and XGBoost Enables Parallel, Distributed ML for large datasets on Spark Clusters
  16. Why use Azure Databricks for Machine learning? • Complete platform in one (Data ingestion, exploration, transformation, featurization, model building, model tuning, and even model serving). • No need to copy the data in our system to do ML on it. • DataScientists like the ease of use of our platform. • Deep learning algorithms are now available! • Productionization Features built in.
  17. ML Pipelines
  18. ML Pipelines Train model 1 Evaluate Datasource 1 Datasource 2 Datasource 2 Extract featuresExtract features Feature transform 1 Feature transform 2 Feature transform 3 Train model 2 Ensemble
  19. Demo – Spark ML Pipeline
  20. Operationalize and manage models with ease • Identify and promote your best models • Capture model telemetry • Retrain models with APIs • Deploy models anywhere • Scale out to containers • Infuse intelligence into the IoT edge • Build and deploy models in minutes • Iterate quickly on serverless infrastructure • Easily change environments Proactively manage model performance Deploy models closer to your data Bring models to life quickly Train and evaluate models Azure Databricks Model MGMT, experimentation, and run history Azure ML Services Containers AKS ACI IoT edge Docker
  21. • ML Model Export allows you to export models and full ML pipelines • Then imported into Spark and non-Spark platforms to do scoring, make predictions • Targeted at low-latency, lightweight ML-powered applications • We recommend using MLeap, an open source solution for ML Model Export that works well in Azure Databricks ML Export
  22. Build and deploy deep learning models • Choose VMs for your modeling needs • Process video using GPU-based VMs • Run experiments in parallel • Provision resources automatically • Leverage popular deep learning toolkits • Develop your language of choice Scale compute resources in any environment Quickly evaluate and identify the right model Streamline AI development efforts Azure ML Services Scale out clusters Azure Databricks Notebooks Azure Databricks Scale out clusters Batch AI MS Cognitive Toolkit Keras TensorFlow PyTorch
  23. Demo - MLeap
  24. Azure Databricks for deep learning modeling Tools InfrastructureFrameworks Leverage powerful GPU-enabled VMs pre-configured for deep neural network training Use HorovodEstimator via a native runtime to enable build deep learning models with a few lines of code Full Python and Scala support for transfer learning on images Automatically store metadata in Azure Database with geo-replication for fault tolerance Use built-in hyperparameter tuning via Spark MLLib to quickly optimize the model Simultaneously collaborate within notebooks environments to streamline model development Load images natively in Spark DataFrames to automatically decode them for manipulation at scale with distributed DNN training on Spark Improve performance 10x-100x over traditional Spark deployments with an optimized environment Seamlessly use TensorFlow, Microsoft Cognitive Toolkit, Caffe2, Keras, and more Ready-to-use clusters with Azure Databricks Runtime for ML
  25. Deep Learning  Supports Deep Learning Libraries/frameworks including:  Microsoft Cognitive Toolkit (CNTK). o Article explains how to install CNTK on Azure Databricks.  TensorFlowOnSpark  BigDL  Offers Spark Deep Learning Pipelines, a suite of tools for working with and processing images using deep learning using transfer learning. It includes high-level APIs for common aspects of deep learning so they can be done efficiently in a few lines of code: Azure Databricks supports and integrates with a number of Deep Learning libraries and frameworks to make it easy to build and deploy Deep Learning applications Distributed Hyperparameter Tuning Transfer Learning
  26. Fast, easy, and collaborative Apache Spark™-based analytics platform Azure Databricks Built with your needs in mind Role-based access controls Effortless autoscaling Live collaboration Enterprise-grade SLAs Best-in-class notebooks Simple job scheduling Seamlessly integrated with the Azure Portfolio Increase productivity Build on a secure, trusted cloud Scale without limits
  27. - - - - G r a c i a s - - -