SlideShare ist ein Scribd-Unternehmen logo
1 von 39
CONTEXT
SEMANTICS!
Danny : isBrotherOf : Nezih
food cart : uses : bicycles
Frank : isFriendsWith : Mohit
Frank : isFriendsWith : Ted
Frank : likes : bicycles
Frank : likes : food carts
Ivy : isFriendsWith : Kushal
Ivy : isFriendsWith : Ted
Ivy : likes : bicycles
Ivy : likes : food carts
Kushal : isFriendsWith : Mohit
Kushal : isFriendsWith : Nezih
Nezih : is FriendsWith : Ted
Ted : likes : bicycles
This model... ... infers this interest.
Ted Kushal
Mohit
Danny
Ivy
Frank
Nezih
friends
friends
friends
brothers
friends
friends
friends
friends
Food
Cart
likes
likes likesBicycles
likes likes
likes
uses
Likes?
Virtuous cycle of data
CLOUD
Richer data to
analyze
CLIENTS
Richer data
from devices
Richer
user experiences
INTELLIGENT
SYSTEMS
SEMANTIC INFORMATION
IS FUEL FOR THE CYCLE
1985 1995 2005 2015
enterprise
NoSQL
Docs
+
Semantics
RDF
WIDESPREAD
MACHINE LEARNING
ON THIS
IMAGINE THE POSSIBILITIES
Graph centrality
High
Program
Importance
(Centrality)
Low
Graph of
channel
viewing
behavior
Current popular
surfing patterns
SH002463130000 EP005544723744
Changes in surfing
behavior may predict
customer churn.
Preference and Similarity Recommendations
User
Movie
1.7MM Nodes
23.9MM Edges
similar cast
prefers
similar
topic
userId: A0A22A5
title: The Godfather
genre: Crime drama
cast: [M. Brando, Al Pacino]
title: Scarface
genre: Crime drama
cast: [Al Pacino, M. Pfeiffer]
title: The Departed
genre: Crime drama
cast: [L. DiCaprio, M. Damon]
weight=11.8
weight=0.67
weight=0.03
weight=14.98
Min-cost path search
10
URL Ground-Truth Data
IP/Domain Reputations
420MM Records
74.5MM Nodes
185MM Edges
URL
Domain
IP Address
Calculation of priors
LBP Messaging
Loopy Belief Propagation on the (semantic) web
84.231.82.93
86.39.155.137
forum.vsichko.com
hermansonskok.se
euskzzbz.nonetheups.com
keesenbep.spaces.live.com
Loopy Belief Propagation on the (semantic) web
A yoga
ball
graph.
Really!?!
You may actually need this
• When the problem is an information
network
• When a graph is a natural way of
expressing the algorithm
• When you want to study specific
relationships
• When you want faster machine learning
or solvers on sparse data
shortest path
central
influence
sub networks
triangle count
But there are challenges.
Handling all that
data.
Finding people good at both handling all
that data and data analysis.
Putting exploratory work into production
fast enough to keep up with the
competition.
14
Congratulations
! You
are a
data scientist!
It’s a demanding job
Ingest &
Clean
Engineer
Features
Structure
Model
Train
Model
Query &
Analyze
Learn
Visualize
Skills shortage at
intersection of
systems
engineering and
data analysis
Painful data
ingestion and
preparation
Workflows that are not designed
with loopbacks in mind
Few tools for analyzing
semantics at scale
Composing
pipeline is DIY
Decomposing
the “data
scientist”
Source: 2013 Report from Accenture Institute for High Performance
IMAGINE A PLATFORM FOR DATA SCIENTISTS
DOCS + SEMANTICS + MACHINE LEARNING
Ease-of-use: Making big data familiar
Python
R
Dataflow
GUI
...
Datacenter / CloudNetworkClient
BIG
DATA
API
Connec
tManag
e
Secure
Analyzedistributed and parallel
Manag
eSecure
Connec
t
Analyzelocal
Query
Big Data Java/Scala/C++
Computational Frameworks
Big Data Algorithms
Cluster Workload Mgmt
Cluster Storage
Machine Learning & Statistics
Data WranglingAnalyst
Skills
The
Other
Skills
Delivering it
FILESYSTEMS AND NOSQL STORAGE
HW PLATFORM
APACHE HADOOP APACHE SPARK
DATA WRANGLING
MACHINE LEARNING AND
STATISTICS
Graphical
Algorithms
Classical
Algorithms
Graph
Construction Tools
Useful String
Manipulation
Useful Math
Operators
BIG DATA API
DATA SCIENCE SERVER (Query and Scripting)
Intel Analytics Toolkit
A UNIFIED DOCUMENT + SEMANTIC STORE
The Ask
Approach Algorithm Category Applications/Use Cases
Loopy Belief Propagation (LBP) Structured Prediction Personalized recs, image de-noising
Label Propagation Structured Prediction Personalized recommendations
Alternating Least Squares (ALS) Collaborative Filtering Recommenders
Conjugate Gradient Descent (CGD) Collaborative Filtering Recommenders
Connected Components Graph Analytics Network manipulation, image
analysis
Latent Dirichlet Allocation (LDA) Topic Modeling Document Clustering
Structure Attribute Clustering Network analysis, consumer seg
K-Truss Clustering Social network analysis
KNN* Clustering Recommenders
Logistic Regression* Classification Fraud detection
Random Forest* Classification Fraud detection, consumer seg
Generalized Linear Model (Binomial,
Poisson)
Non-linear Curve Fitting Forecasting, pricing, market mix
models
Association Rule Mining Data Mining Market basket analysis,
recommenders
Frequent Pattern Mining* Data Mining Pattern Recognition
Bringing a full spectrum of possibilities
Graph
21
Article Tagging Problem
• Articles are tagged by experts with MeSH terms, drawn
from a hierarchical controlled vocabulary of 55,000
keywords
• Process is resource-intensive – can we automate it?
• Categorize articles into a hierarchy that matches the
same categorization from the MeSH controlled
vocabulary
Hierarchy Level
Article Count
Demo: Graph Analytics For Medical Journal
Analysis
INGEST
&
CLEAN
ENGINEER
FEATURES
STRUCTURE
GRAPH
QUERY &
ANALYZE
LEARN
VISUALIZE
PARSE AND
EXTRACT
WORDS
CREATE
ARTICLE/
WORD LIST
BUILD GRAPH
QUERY/
VISUALIZE DATA
DETECT
CLUSTERS
USING LDA
• Medline™ XML
• MeSH Ontology XML
• Create list of unique
words
• Stemming and
lemmatization
• Index word list
• Transform articles
into list of article/word
pairs
• Extract vertices
• Assign id columns to
vertex property
• Assign year and
count edge
properties
• Gremlin query for
each visual
• Python web server
and other libraries
• Select
optimization
parameters
• Invoke LDA
The Playbook?
PARSE AND
EXTRACT
WORDS
CREATE
ARTICLE/
WORD LIST
BUILD
GRAPH
QUERY/
VISUALIZE
DATA
DETECT
CLUSTERS
USING LDA
Parse Prepare graph data Basic analysis Run LDA
INSIGHTFUL
RESULT
This never happens!
The Real Playbook
PARSE AND
EXTRACT
WORDS
CREATE
ARTICLE/
WORD LIST
BUILD
GRAPH
QUERY/
VISUALIZE
DATA
DETECT
CLUSTERS
USING LDA
Parse
Correct mistake
Prepare graph data
Correct schema mistake
Correct aggregation mistake
Data validation
Correct dataset mistake
Guess LDA settings
Tune and re-run
Detect bias in dataset
WE NEED THE AGILITY OF INTERACTIVE SCRIPTING
AND
THE
BRAINS AND BRAWN OF
SCALABLE GRAPH ANALYTICS
Build Frame
28
Build Graph
29
Query
Vertices
30
LDA with 3 Topics
LDA with 5
Topics
LDA with 7 Topics
Query Vertices Again – Now with ML
Properties
34
Following Analysis
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
Wakefulness
Sleep
Animals
Electroencephalography
Circadian Rhythm
Arousal
Sleep Stages
REM
Mental Recall
Attention
Rats
Child
Evoked Potentials
Aged
Schizophrenia
Ocular
Conditioning
Infant
Psychophysics
Dreams
Top MeSH terms that predict which category an article will be assigned
Reimagining 2014
New partnerships in big data
Contributions to the open source community
The Intel Analytics Toolkit – COMING SOON
SEMANTICS + MACHINE LEARNING
TOGETHER AT LAST!
INTERESTED IN THE INTEL ANALYTICS
TOOLKIT?
THEODORE.L.WILLKE@INTEL
.COM
Legal Disclaimers
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without
notice.
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across
different processor families. Go to: http://www.intel.com/products/processor_number
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate
from published specifications. Current characterized errata are available on request.
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor
(VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may
not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization
No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer
system with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-
compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit
http://www.intel.com/technology/security
Intel, Intel Xeon, Intel Atom, Intel Xeon Phi, Intel Itanium, the Intel Itanium logo, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are
trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Other names and brands may be claimed as the property of others.
Copyright © 2013, Intel Corporation. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk
 
Predicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCASri Ambati
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile DataDataKitchen
 
Developing Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal EffortDeveloping Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal EffortTim Hobson
 
Nicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchMeetupDataScienceRoma
 
Customer Presentation - Financial Services Organization
Customer Presentation - Financial Services OrganizationCustomer Presentation - Financial Services Organization
Customer Presentation - Financial Services OrganizationSplunk
 
The Critical Missing Component in the Production ML Stack
The Critical Missing Component in the Production ML StackThe Critical Missing Component in the Production ML Stack
The Critical Missing Component in the Production ML StackDatabricks
 
Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Ophir Cohen
 
Join 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart CachingJoin 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart CachingLooker
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionSplunk
 
Advanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout SessionAdvanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout SessionSplunk
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data TestingQA InfoTech
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsWork-Bench
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
Got Open Source?
Got Open Source?Got Open Source?
Got Open Source?Quest
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...Work-Bench
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
Fast Data Intelligence in the IoT - real-time data analytics with Spark
Fast Data Intelligence in the IoT - real-time data analytics with SparkFast Data Intelligence in the IoT - real-time data analytics with Spark
Fast Data Intelligence in the IoT - real-time data analytics with SparkBas Geerdink
 

Was ist angesagt? (20)

Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Predicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCA
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile Data
 
Developing Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal EffortDeveloping Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal Effort
 
Nicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in Elasticsearch
 
Customer Presentation - Financial Services Organization
Customer Presentation - Financial Services OrganizationCustomer Presentation - Financial Services Organization
Customer Presentation - Financial Services Organization
 
The Critical Missing Component in the Production ML Stack
The Critical Missing Component in the Production ML StackThe Critical Missing Component in the Production ML Stack
The Critical Missing Component in the Production ML Stack
 
Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013
 
Join 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart CachingJoin 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Smart Caching
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout Session
 
Advanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout SessionAdvanced Use Cases for Analytics Breakout Session
Advanced Use Cases for Analytics Breakout Session
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Got Open Source?
Got Open Source?Got Open Source?
Got Open Source?
 
Data Testing
Data TestingData Testing
Data Testing
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Fast Data Intelligence in the IoT - real-time data analytics with Spark
Fast Data Intelligence in the IoT - real-time data analytics with SparkFast Data Intelligence in the IoT - real-time data analytics with Spark
Fast Data Intelligence in the IoT - real-time data analytics with Spark
 

Ähnlich wie MLconf NYC Ted Willke

What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks
 
How Will Going Virtual Impact Your Search Performance?
How Will Going Virtual Impact Your Search Performance?How Will Going Virtual Impact Your Search Performance?
How Will Going Virtual Impact Your Search Performance?IdeaEng
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offeringsSandeep Vyas
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Datapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT WorkshopDatapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT WorkshopAmazon Web Services
 
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam GreenAI Guild
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Microsoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenMicrosoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenGoDataDriven
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 

Ähnlich wie MLconf NYC Ted Willke (20)

What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Machine Learning at the Edge
Machine Learning at the EdgeMachine Learning at the Edge
Machine Learning at the Edge
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
How Will Going Virtual Impact Your Search Performance?
How Will Going Virtual Impact Your Search Performance?How Will Going Virtual Impact Your Search Performance?
How Will Going Virtual Impact Your Search Performance?
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offerings
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Datapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT WorkshopDatapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT Workshop
 
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
DataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopDataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT Workshop
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Microsoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenMicrosoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDriven
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Kürzlich hochgeladen (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

MLconf NYC Ted Willke

  • 2. Danny : isBrotherOf : Nezih food cart : uses : bicycles Frank : isFriendsWith : Mohit Frank : isFriendsWith : Ted Frank : likes : bicycles Frank : likes : food carts Ivy : isFriendsWith : Kushal Ivy : isFriendsWith : Ted Ivy : likes : bicycles Ivy : likes : food carts Kushal : isFriendsWith : Mohit Kushal : isFriendsWith : Nezih Nezih : is FriendsWith : Ted Ted : likes : bicycles
  • 3. This model... ... infers this interest. Ted Kushal Mohit Danny Ivy Frank Nezih friends friends friends brothers friends friends friends friends Food Cart likes likes likesBicycles likes likes likes uses Likes?
  • 4. Virtuous cycle of data CLOUD Richer data to analyze CLIENTS Richer data from devices Richer user experiences INTELLIGENT SYSTEMS
  • 6. 1985 1995 2005 2015 enterprise NoSQL Docs + Semantics RDF WIDESPREAD MACHINE LEARNING ON THIS
  • 8. Graph centrality High Program Importance (Centrality) Low Graph of channel viewing behavior Current popular surfing patterns SH002463130000 EP005544723744 Changes in surfing behavior may predict customer churn.
  • 9. Preference and Similarity Recommendations User Movie 1.7MM Nodes 23.9MM Edges similar cast prefers similar topic userId: A0A22A5 title: The Godfather genre: Crime drama cast: [M. Brando, Al Pacino] title: Scarface genre: Crime drama cast: [Al Pacino, M. Pfeiffer] title: The Departed genre: Crime drama cast: [L. DiCaprio, M. Damon] weight=11.8 weight=0.67 weight=0.03 weight=14.98 Min-cost path search
  • 10. 10 URL Ground-Truth Data IP/Domain Reputations 420MM Records 74.5MM Nodes 185MM Edges URL Domain IP Address Calculation of priors LBP Messaging Loopy Belief Propagation on the (semantic) web 84.231.82.93 86.39.155.137 forum.vsichko.com hermansonskok.se euskzzbz.nonetheups.com keesenbep.spaces.live.com
  • 11. Loopy Belief Propagation on the (semantic) web
  • 13. You may actually need this • When the problem is an information network • When a graph is a natural way of expressing the algorithm • When you want to study specific relationships • When you want faster machine learning or solvers on sparse data shortest path central influence sub networks triangle count
  • 14. But there are challenges. Handling all that data. Finding people good at both handling all that data and data analysis. Putting exploratory work into production fast enough to keep up with the competition. 14
  • 16. It’s a demanding job Ingest & Clean Engineer Features Structure Model Train Model Query & Analyze Learn Visualize Skills shortage at intersection of systems engineering and data analysis Painful data ingestion and preparation Workflows that are not designed with loopbacks in mind Few tools for analyzing semantics at scale Composing pipeline is DIY
  • 17. Decomposing the “data scientist” Source: 2013 Report from Accenture Institute for High Performance
  • 18. IMAGINE A PLATFORM FOR DATA SCIENTISTS DOCS + SEMANTICS + MACHINE LEARNING
  • 19. Ease-of-use: Making big data familiar Python R Dataflow GUI ... Datacenter / CloudNetworkClient BIG DATA API Connec tManag e Secure Analyzedistributed and parallel Manag eSecure Connec t Analyzelocal Query Big Data Java/Scala/C++ Computational Frameworks Big Data Algorithms Cluster Workload Mgmt Cluster Storage Machine Learning & Statistics Data WranglingAnalyst Skills The Other Skills
  • 20. Delivering it FILESYSTEMS AND NOSQL STORAGE HW PLATFORM APACHE HADOOP APACHE SPARK DATA WRANGLING MACHINE LEARNING AND STATISTICS Graphical Algorithms Classical Algorithms Graph Construction Tools Useful String Manipulation Useful Math Operators BIG DATA API DATA SCIENCE SERVER (Query and Scripting) Intel Analytics Toolkit A UNIFIED DOCUMENT + SEMANTIC STORE The Ask
  • 21. Approach Algorithm Category Applications/Use Cases Loopy Belief Propagation (LBP) Structured Prediction Personalized recs, image de-noising Label Propagation Structured Prediction Personalized recommendations Alternating Least Squares (ALS) Collaborative Filtering Recommenders Conjugate Gradient Descent (CGD) Collaborative Filtering Recommenders Connected Components Graph Analytics Network manipulation, image analysis Latent Dirichlet Allocation (LDA) Topic Modeling Document Clustering Structure Attribute Clustering Network analysis, consumer seg K-Truss Clustering Social network analysis KNN* Clustering Recommenders Logistic Regression* Classification Fraud detection Random Forest* Classification Fraud detection, consumer seg Generalized Linear Model (Binomial, Poisson) Non-linear Curve Fitting Forecasting, pricing, market mix models Association Rule Mining Data Mining Market basket analysis, recommenders Frequent Pattern Mining* Data Mining Pattern Recognition Bringing a full spectrum of possibilities Graph 21
  • 22. Article Tagging Problem • Articles are tagged by experts with MeSH terms, drawn from a hierarchical controlled vocabulary of 55,000 keywords • Process is resource-intensive – can we automate it? • Categorize articles into a hierarchy that matches the same categorization from the MeSH controlled vocabulary
  • 24. Demo: Graph Analytics For Medical Journal Analysis INGEST & CLEAN ENGINEER FEATURES STRUCTURE GRAPH QUERY & ANALYZE LEARN VISUALIZE PARSE AND EXTRACT WORDS CREATE ARTICLE/ WORD LIST BUILD GRAPH QUERY/ VISUALIZE DATA DETECT CLUSTERS USING LDA • Medline™ XML • MeSH Ontology XML • Create list of unique words • Stemming and lemmatization • Index word list • Transform articles into list of article/word pairs • Extract vertices • Assign id columns to vertex property • Assign year and count edge properties • Gremlin query for each visual • Python web server and other libraries • Select optimization parameters • Invoke LDA
  • 25. The Playbook? PARSE AND EXTRACT WORDS CREATE ARTICLE/ WORD LIST BUILD GRAPH QUERY/ VISUALIZE DATA DETECT CLUSTERS USING LDA Parse Prepare graph data Basic analysis Run LDA INSIGHTFUL RESULT This never happens!
  • 26. The Real Playbook PARSE AND EXTRACT WORDS CREATE ARTICLE/ WORD LIST BUILD GRAPH QUERY/ VISUALIZE DATA DETECT CLUSTERS USING LDA Parse Correct mistake Prepare graph data Correct schema mistake Correct aggregation mistake Data validation Correct dataset mistake Guess LDA settings Tune and re-run Detect bias in dataset
  • 27. WE NEED THE AGILITY OF INTERACTIVE SCRIPTING AND THE BRAINS AND BRAWN OF SCALABLE GRAPH ANALYTICS
  • 31. LDA with 3 Topics
  • 33. LDA with 7 Topics
  • 34. Query Vertices Again – Now with ML Properties 34
  • 35. Following Analysis 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Wakefulness Sleep Animals Electroencephalography Circadian Rhythm Arousal Sleep Stages REM Mental Recall Attention Rats Child Evoked Potentials Aged Schizophrenia Ocular Conditioning Infant Psychophysics Dreams Top MeSH terms that predict which category an article will be assigned
  • 36. Reimagining 2014 New partnerships in big data Contributions to the open source community The Intel Analytics Toolkit – COMING SOON SEMANTICS + MACHINE LEARNING TOGETHER AT LAST!
  • 37. INTERESTED IN THE INTEL ANALYTICS TOOLKIT? THEODORE.L.WILLKE@INTEL .COM
  • 38.
  • 39. Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer system with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT- compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security Intel, Intel Xeon, Intel Atom, Intel Xeon Phi, Intel Itanium, the Intel Itanium logo, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. Copyright © 2013, Intel Corporation. All rights reserved.