Greg Werner, CEO & Founder, 3Blades.io at MLconf ATL 2017

•Download as PPTX, PDF•

0 likes•579 views

Productive Machine Learning and Deep Learning Projects Machine Learning (ML) and Deep Learning (DL), known holistically as Artificial Intelligence, are no longer luxuries but necessities if companies want to remain relevant n today’s market. Data driven organizations that encourage the development of ML and DL projects allow companies to create and deploy models to create predictions in real time. Even more exciting, these real time predictions allow organizations to trigger actions based on these predictions, which ultimately improves the bottom line. However, organizations struggle to incorporate ML and DL projects to create models that improve performance. This talk focuses on how companies can enable data science platforms so that data engineers, data scientists and business analysts can quickly explore data, create and test ML and DL models, and deploy to staging and production environments regardless of the language or framework used by the team and organization.

Technology

Data Science with Teams
Improve the efficiency of your data science teams
with platforms that enhance collaboration and
flexibility

Agenda
● Some Background
● Goals
● Data Science Project Teams
● Challenges
● Some Solutions
● Conclusions

Background
Integration experience with Oil & Gas, Financial, Insurance and Retail industries in
multiple geographies
What did these customers have in common? All had data science teams that
worked in Silos
Difficulties when taking a data science
course
Source: Wikipedia

Background (cont)
What’s going on here?
We started to do some digging!
Source: http://cesarsway.com

Data Teams - The Old Way
Department Teams Data Scientist
Data Analyst IT Manager

The Analytics Deliverable
A dashboard! An interactive dashboard is even cooler.

Data Science Teams - The New Way
Data ScientistFinance Manager
Accountant
Tax and Compliance
Treasury
Data Gurus:
- Analytics
- Data Engineers
- Business Intelligence
- Compliance
IT Manager

The Data Science Deliverable
A machine or deep learning model!

I Want GPUs - And I Just Want Them to Work
Work around for NVidia Docker Wrapper:
- nvidia-docker -d -p 8888:8888 tensorflow/tensorflow:latest-gpu
OR
- docker run -ti --rm `curl -s http://localhost:3476/docker/cli`
tensorflow/tensorflow:latest-gpu
OR
- docker run -ti --rm --volume-driver=nvidia-docker --
volume=nvidia_driver_375.82:/usr/local/nvidia:ro --device=/dev/nvidiactl --
device=/dev/nvidia-uvm --device=/dev/nvidia0 nvidia/cuda nvidia-smi

The Need for DevOps Chops
Registrator
Docker
Container
EC2 Instance
Reverse Proxy with consul-template
The old way... The new way...
Docker
Container
EC2 Instance
Reverse Proxy with static upstream location
name
$$$ $

Infrastructure Management
Uh-oh, someone has to manage this stuff!

Our Architecture - API First and Microservices

Solutions
Provide flexibility with the tools that data scientists use for exploratory
data analysis and visualizations
One central source for project files with support for version control
Share visualizations from EDA
Train and save Machine Learning and Deep Learning models with
multiple frameworks, from within the same project
Streamline deployment pipelines

Thank You!!
Email: hello@3blades.io
Web: https://3blades.io
Twitter: @3bladesio
GitHub: https://github.com/3blades
Email: gwerner@3blades.io
Twitter: @gwerner
LinkedIn: https://www.linkedin.com/in/wernergreg
GitHub: https://github.com/jgwerner

Viewers also liked

Ashfaq Munshi, ML7 Fellow, PepperdataMLconf

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017MLconf

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017MLconf

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017MLconf

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017MLconf

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...MLconf

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf

Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf

Viewers also liked (11)

Ashfaq Munshi, ML7 Fellow, Pepperdata

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Recently uploaded

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

MS Copilot expands with MS Graph connectorsNanddeep Nachan

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

DBX First Quarter 2024 Investor PresentationDropbox

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

FWD Group - Insurer Innovation Award 2024The Digital Insurer

CNIC Information System with Pakdata Cf In Pakistandanishmna97

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

MINDCTI Revenue Release Quarter One 2024MIND CTI

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

MS Copilot expands with MS Graph connectors

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

presentation ICT roal in 21st century education

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

DBX First Quarter 2024 Investor Presentation

How to Troubleshoot Apps for the Modern Connected Worker

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Strategies for Landing an Oracle DBA Job as a Fresher

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

FWD Group - Insurer Innovation Award 2024

CNIC Information System with Pakdata Cf In Pakistan

Six Myths about Ontologies: The Basics of Formal Ontology

Artificial Intelligence Chap.5 : Uncertainty

MINDCTI Revenue Release Quarter One 2024

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Greg Werner, CEO & Founder, 3Blades.io at MLconf ATL 2017

1. Data Science with Teams Improve the efficiency of your data science teams with platforms that enhance collaboration and flexibility

2. Agenda ● Some Background ● Goals ● Data Science Project Teams ● Challenges ● Some Solutions ● Conclusions

3. Background Integration experience with Oil & Gas, Financial, Insurance and Retail industries in multiple geographies What did these customers have in common? All had data science teams that worked in Silos Difficulties when taking a data science course Source: Wikipedia

4. Background (cont) What’s going on here? We started to do some digging! Source: http://cesarsway.com

5. Data Teams - The Old Way Department Teams Data Scientist Data Analyst IT Manager

6. The Analytics Deliverable A dashboard! An interactive dashboard is even cooler.

7. Conway’s Data Scientist Venn Diagram

8. Data Science Teams - The New Way Data ScientistFinance Manager Accountant Tax and Compliance Treasury Data Gurus: - Analytics - Data Engineers - Business Intelligence - Compliance IT Manager

9. The Data Science Deliverable A machine or deep learning model!

10. I Want GPUs - And I Just Want Them to Work Work around for NVidia Docker Wrapper: - nvidia-docker -d -p 8888:8888 tensorflow/tensorflow:latest-gpu OR - docker run -ti --rm `curl -s http://localhost:3476/docker/cli` tensorflow/tensorflow:latest-gpu OR - docker run -ti --rm --volume-driver=nvidia-docker -- volume=nvidia_driver_375.82:/usr/local/nvidia:ro --device=/dev/nvidiactl -- device=/dev/nvidia-uvm --device=/dev/nvidia0 nvidia/cuda nvidia-smi

11. The Need for DevOps Chops Registrator Docker Container EC2 Instance Reverse Proxy with consul-template The old way... The new way... Docker Container EC2 Instance Reverse Proxy with static upstream location name $$$ $

12. Infrastructure Management Uh-oh, someone has to manage this stuff!

13. Our Architecture - API First and Microservices

14. 3Blades Hub for Data Scientists

15. Solutions Provide flexibility with the tools that data scientists use for exploratory data analysis and visualizations One central source for project files with support for version control Share visualizations from EDA Train and save Machine Learning and Deep Learning models with multiple frameworks, from within the same project Streamline deployment pipelines

16. Thank You!! Email: hello@3blades.io Web: https://3blades.io Twitter: @3bladesio GitHub: https://github.com/3blades Email: gwerner@3blades.io Twitter: @gwerner LinkedIn: https://www.linkedin.com/in/wernergreg GitHub: https://github.com/jgwerner

Editor's Notes

Talking points: Siloed data initiatives is a common denominator Data scientists were segregated from the rest of the organization Tooling was disparate Initially, the need to streamline Jupyter Notebook deployments with a class of students came up after many students were complaining about the time and effort involved in using specific dependencies to complete their tasks. Package managers were not enough: users also needed an integrated solution to access a consistent and reliable solution to complete their assignments using Jupyter Notebooks. We also noticed that companies, in general, did not provide a homogeneous environment for their data science teams. This led to many headaches but was considered business as usual.
Talking points: Issues encountered with the education vertical were common across industry, i.e. too much time spent con configuration Basic ROI calculations justified the implementation and support of a data science hub Data science platform “a ha” moment came when pitching a solution to consolidate project workspace environments for different people across different organizations, in particular for Exploratory Data Analysis (EDA). Educational institutions are usually constrained by budget requirements, however, after providing ROI numbers on how much time and effort Teachers Assistants (TA’s) spent on providing technical support for their users, the decision to implement a data science platform was a no brainer. Nevertheless, we had the suspicion that the enterprise (SMB’s and large enterprises alike) were encountering the same challenges but were exacerbated due to the fact that more personas were involved within data science and analytics teams.
Talking points: Disparate teams Data scientists siloed from the rest of the organization Ultimate goal is to automate certain processes within the organization Automation helps improve the top and bottom lines, improves competitiveness Organizations struggle to become ‘data driven’. What does that mean? Data driven organizations are those that wish to use the data they have available to improve insights and allow their business to become more competitive. Assuming the organization has successfully consolidated their data into central data warehouses or data lakes, and assuming this data is defined with standard schemas, data science and data analytics teams have the power to analyze the data, obtain valuable insights and start improving the agility of their organizations with ‘prescriptive analytics’ and ‘predictive analytics’. Prescriptive analytics involves creating machine learning and deep learning models that automate certain business processes, such as: Automatically tag images with classification types (cat or not a cat) Automatically classifying a customer with the probability that the customer will churn Recommendations for value added products to improve the checkout dollar amount at an e-commerce site Spam or not spam However, organizations have struggled to integrate data scientists into their organizations. Data science teams that just ‘do the math’ and create visualizations on an organization's data sets to not provide much value in and of itself. Creating a machine learning model that automatically recommends a product that is not strategic to the organization does not provide much value.
Talking points: Dashboards democratize data so that team members can quickly absorb meaningful insights and key performance indicators. Exploratory data analysis (EDA) and model creation/deployment not really a part of the picture. Traditional Business Intelligence tools have been around for years. Some tools offer specific integrations into a variety of data sources and allow users to quickly create rich and interactive visualizations of their data. SQL, a language made popular by relational databases, is a very popular language for analytics. New developments help accelerate the time from data source to dashboard with in memory calculations, GPU powered databases, among others. Big data tools, such as Hadoop and Spark allow users to create dashboards from large data sets. However, BI tools rely traditionally on structured data. Also, traditional dashboards don’t take into account how to create machine learning and deep learning models.
Just a review of a Data Scientist’s skill set.
Talking points: Organizations realize they need to automate their processes and that automation must come from real time analysis of data points The deliverable is not just a BI dashboard anymore, the deliverable is a deployable machine learning and deep learning model Embedding a data science team member into the group increases value As mentioned, historically data science teams have been isolated from the rest of the organization. Successful data driven organizations embed their data scientists into various business groups. For example: data extraction and loading into a warehouse table are done by engineering teams, however, a data science liaison, embedded within a certain department or relevant company wide project, can help data engineers improve the schema definition for the data being exposed which could save valuable time during the exploratory data analysis phase. Data engineers can create tables using their favorite Extract Transform and Load (ETL) tools to remove not-a-number (NaN) rows, remove columns that are irrelevant such as data base PK/FK’s, etc. Inversely, the data scientist could help the person telling the data story (could be anyone in the group, including herself) what features are relevant, how the certain normalizations were completed without delving into the technical details, etc. “This was the only customer that bought a widget in Atlanta so the attributes for this person were adjusted to not skew the dataset in their favor”.
Talking points: Move from prescriptive to predictive analytics Deliver a machine learning or deep learning model that will allow organizations to automate processes Visualizations are still important, but used for telling a data story for EDA and also for visualizing how models are behaving in real time Predictive analytics looks at the historical trends in data to provide insights. Organization members are then tasked to optimize processes to improve organizational results based on trends. However, companies need to automate tasks (remove the human from the actual task execution) based on certain indicators. In this case, visualizations are used in EDA to better understand the data with the goal of creating and deploying machine learning and deep learning models that can automate certain organization processes.
Talking points: Support data source imports from multiple sources EDA needed as first step to build and deliver artifacts to automate business processes. Artifacts in this context are machine learning and deep learning models. Data engineers and DevOps need access to data science hub to streamline their own processes Traditional teams use Excel spread sheets, among other tools, and are flying back and forth with emails, chat applications or external project management solutions. Even if all users work within shared environments such as Google Docs or Office 365, teams had no way of sharing all files and tools within one common environment, particularly for exploratory data analysis, since viewing and editing files within these environments are constrained to a certain set of file formats. Nevertheless, certain organizations and individuals prefer one language over the other. For example, a data science team involved with the Finance department may be more involved with using the R programming language, and the data science team involved with the marketing department may be more involved with the Python programming environment. In both cases, users may use multiple tools for one language. For example, some individuals may prefer RStudio for R, and others amy prefer using R with Jupyter Notebooks. Server management is important to optimize compute resources.
Traditional teams use Excel spread sheets, among other tools, and are flying back and forth with emails, chat applications or external project management solutions. Even if all users work within shared environments such as Google Docs or Office 365, teams had no way of sharing all files and tools within one common environment, particularly for exploratory data analysis, since viewing and editing files within these environments are constrained to a certain set of file formats. Nevertheless, certain organizations and individuals prefer one language over the other. For example, a data science team involved with the Finance department may be more involved with using the R programming language, and the data science team involved with the marketing department may be more involved with the Python programming environment. In both cases, users may use multiple tools for one language. For example, some individuals may prefer RStudio for R, and others amy prefer using R with Jupyter Notebooks. A central source for project files alleviates compliance requirements. Usually, data engineers (either due to security requirements or simply that they don’t want to surface multiple schemas/formats for different users) would rather deliver the data product to a ‘clean’ table, so data scientists can do their work using self service approaches. Having a centrally managed set of files for specific projects also helps keeps things organized when different users are accessing project files, so version control becomes important as well.

Greg Werner, CEO & Founder, 3Blades.io at MLconf ATL 2017

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Greg Werner, CEO & Founder, 3Blades.io at MLconf ATL 2017

Editor's Notes