SlideShare a Scribd company logo
1 of 66
Download to read offline
Optimizing the

Data Supply Chain

for Data Science
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Marc Hadfield
CEO, Vital A.I.
about: vital ai
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Software Applications:

Artificial Intelligence,
Machine Learning,
Data Science.
Software Vendor & Consulting Services
agenda
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
• Data Models
• How A.I., Data Science, & Data Governance relate
• Data Supply Chain & the Data Product
• Problem: the “Telephone Game” across the DSC
• Architecture Transition from Data Warehouse to DSC
• Data Models and DSC; a Framework for Solutions
• Examples
• Collaboration & Visualization
note: general methodology, with some specific
examples from Vital AI implementations.
takeaways:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
• The Data Supply Chain is a supply
chain to deliver Data Products
• Data Models can capture the implicit
meaning of data (and that is the goal!)
• Data Models can help negotiate the
implicit differences across the DSC
• Data Models offer a means to
collaborate on data standards
(meaning) across the DSC partners
about data models:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Semantic Models
big data:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
volume, velocity,
variety, veracity
variety: data models
“Product”: different meaning in
Manufacturing vs Retail context
Healthcare, same entity: “Patient”,
“InsuredPerson”, “BillableEntity”
example:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Class: Person

Property: birthday
Standardized Unique Global Identifier (URI)
data type: date

relationship with property: age

allowed range of values (can’t be born in the future) 

typical (average/expected) value…

(Birthdays in Wikipedia vs Customer Database)
about: vital ai tech
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Vital AI Development Kit (VDK)
VitalSigns — Data Modeling &
Code Generation
VitalService — Common API for
Databases, Machine Learning,
Apache Spark, Data Transforms
about: vital ai tech
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
VitalService
Query
Executable
Query
Query Generator
Common Query API:

Relational DB (SQL)

Graph DB (Sparql)

Key/Value Store

NOSQL DB

Document DB

Apache Spark

Hive (Hadoop)

Predictive Models (a query for an unknown value)

Goal: Build A.I. applications across variety of
infrastructure with consistent API & Models.
example data:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Person:Recipient
Person:Sender Message
hasRecipient
hasSender
example “MetaQL” query:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
GRAPH {
value segments: ["mydata"]
ARC {
node_constraint { Message.class }
constraint { "?person1 != ?person2" }
ARC_AND {
ARC {
edge_constraint { Edge_hasSender.class }
node_constraint {
Person.props().emailAddress.equalTo(“john@example.org")

}
node_constraint { Person.class }
node_provides { "person1 = URI" }
}
ARC {
edge_constraint { Edge_hasRecipient.class }
node_constraint { Person.class }
node_provides { "person2 = URI" }
}
}
}
}
“Person” may have subtypes, like Student or Employee.
a.i. and data quality
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
data models &
machine learning:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
using the meaning of classes and
properties, automatically
generate predictive models.
predictive models features:

birthday, zip code, …
data governance =

defining the meaning of data = 

feature (pre)engineering
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
critical aspect of data science
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Progression of Analytics:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
where a.i. happens
Progression of Analytics:
Garbage In = Garbage Out
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
= Bad A.I.
data governance
required for Good A.I.
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
one more point on

data governance…
think outside the box

(data warehouse)
data governance: data in motion
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
vs.
inside data warehouse
outside data warehouse
supply chain
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
supply chain:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
product
data supply chain:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
data product
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Retail Recommendations…



Shipping/Logistics Optimization…



Compliance, Auditing, Security, Fraud Detection…

data product:
why data supply chain?
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Partner DW Your DW
"No matter who you are, most of the smartest
people work for someone else.” — Bill Joy.
why data supply chain?
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Partner DW Your DW
"No matter who you are, most of the smartest people
data works for someone else.” — Bill Joy. (revised)
data supply chain
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Partner DW
Your DW
why not ETL?
Partner DW
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Extract…
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
not quite as expected…
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Transform…
a bit extreme…
Load…
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
a bit messy…
Clean…
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
a lot of manual effort…
… your imported data
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Your DW
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Partner DW
Why?
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
what goes wrong?
telephone game…
You
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Partner
Model “A”
Model “B”
Implicit Model
Resolution:
Make explicit the implicit.

Align Data Models.
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Reason:

Implicit assumptions in the data.

ETL can’t see the forest for the trees.

(or it’s very difficult with missing
assumptions)
Example: Internet of Things
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Predictive Analytics
“Nest for Office Buildings”
Office Tower with Building
Management System (BMS)
containing 100,000 monitored
points (temperature, energy
usage of chiller, fan speed, etc.)
with significant missing data,
errors, and noise. Reconciliation
of data to produce predictive
models to minimize energy usage.
Rules for data correctness.
Sensor Data Validation:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Source data had temperature values of “0” (zero)
which meant either the temperature was 0 degrees
or that the sensor had an error.

Data Model “knows” that it’s rarely 0 degrees in
July (far from the standard deviation), and that the
temperature can be compared to weather data on a
day in December for reasonableness.
If Data Model also knows the maintenance schedule
for the sensors, then it “knows” when to expect 0
error values and exclude them.
Missing Maintenance Assumptions.

Fill in secondary (weather) data for validation.
how did we get here?
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Architecture Review:
a quick step back…
What is a Data Supply Chain
architecture?
“traditional” data warehouse:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
ETL within the organization.

Data Governance across the organization.
DW
tech co. “agile” data warehouse:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
storage
compute
HDFS
Spark
DataSets
Jobs
Batch/Streaming

Build Predictive Models

Realtime: Spark/Storm
hadoop cluster
enterprise: data lake
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
storage
compute
HDFS
Spark
X(save $)
“Data Swamp”
aside: Data Lake
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
better analogy: Scriptorium
library,

manuscript copying,
& book distribution.
but not as Pithy as “Lake”…
tech co. microservices (micro-SOA):
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
storage
compute
service
“Composed” App
external:
social data,
weather API
independent clusters,

local data expertise
optimize development
processes, scale up.
microservices example:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Amazon: product search uses
170 independent microservices
including services for predicting
customer characteristics, getting
product images, etc.
http://www.infoworld.com/article/2903144/application-
development/how-to-succeed-with-microservices-architecture.html
Netflix similar architecture
Data Supply Chain:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
storage
compute
service
Data Product
“ETL”
Owner “A” Owner “B”
optimize development
processes, scale up.
independent clusters,

local data, ownership
Interaction Points:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Data Product
service
compute
ETL
Owner “A” Owner “B”
Data Lineage: Cloudera Navigator
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
…within a Data Warehouse
trace back jobs that
produced every data field.
Data Supply Chain
with Provenance:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
include provenance data directly
in imported dataset.
use in rules to interpret the data.
entity-123 | hasSource | datasource-A
entity-123 | name | “John Doe”
Data Warehouse B
Interaction Points: Data Models
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Data Product
service
compute
ETL
Data Models: Gatekeepers & Transform
Owner “A” Owner “B”
Data Supply Chain
using Models:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
storage
compute
service
Data Product
ETL
Owner “A” Owner “B”
Model

Server
Data Models: focus of
data governance
Semantic Data Models:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Make explicit the meaning of data
Transformation and Validation Rules
leverage the Model and Meaning.

Such Rules may be packaged with the
Model, and managed together.
Protect against implicit assumptions
Example: Financial Services
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
A B C
Service Provider
Reconciliation of Corporate
Structure across 1,000’s of
organizations. Compliance
Rules barring communication
between “researchers” and
“traders”.

Rules to infer if “Mary” is a
“researcher” or “trader”.

Conflicting concepts of
“Branch-Office”, “Direct-
Report”, etc. across the Globe.
Example: Hospital Group
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
A B C
Data Analytics
Reconciliation across
Patient Records,
Insurance, & Billing
for Patient Predictive
Analytics.

Rules for identity:
“same person”
Data Models:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
OWL: Semantic Ontology Model

(W3C Standard, Various Standards for Rules)
VitalSigns: Generate Code
validation, transformation, …
VitalSigns: Versioning, Dependencies, Exchange,
Storage, Change Management (Semantic “Diff”)
Example: Personally Identifiable Information
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Data Governance determines that “Profession” and
“ZipCode” cannot be used together.

(Maybe a single “Dentist” in a small town…)

Within a single Data Warehouse we can bar these data
elements from being combined.

But:

Microservice A provides value of “Profession”

Microservice B provides value of “ZipCode”



How to enforce that these two microservices cannot be
combined?
Example: Personally Identifiable Information
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Validation code enforcing data usage:
Person person123 = get_person_details(“entity-123”)



// this call works:
person123.profession = get-profession(person123)



// this call blocks because of data model validation

// person123 already has “profession” property

person123.zipcode = get-zipcode(person123)
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Gatekeepers
Externally Managed.

Active not Passive, more like “code”.

Defining what should exist, not
cataloguing what exists.

Can decide when to be tolerant or strict.
Semantic Data Models:
Collaborative Conversations:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Semantic
Model
Collaborative Conversations:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Business +
Domain Experts
Semantic
Model
Business +
Domain Experts
Semantic
Model
Partner A Partner B
Model Alignment
What
Concepts to
combine, not
what Tables
to combine
(that comes
later).
Authoring Tool: OWL IDE Protege
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Visualization: Semantic Data
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Visualization: WebVOWL
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
in conclusion, takeaways:
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
• The Data Supply Chain is a supply
chain to deliver Data Products
• Data Models can capture the implicit
meaning of data (and that is the goal!)
• Data Models can help negotiate the
implicit differences across the DSC
• Data Models offer a means to
collaborate on data standards
(meaning) across the DSC partners
Questions?
61 Broadway Suite 1105
New York, NY 10006
info@vital.ai
http://www.vital.ai
Marc Hadfield
CEO, Vital A.I.
marc@vital.ai
Optimizing the
 Data Supply Chain
 for Data Science

More Related Content

What's hot

Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 
NoSQL Technology and Real-time, Accurate Predictive Analytics
NoSQL Technology and Real-time, Accurate Predictive AnalyticsNoSQL Technology and Real-time, Accurate Predictive Analytics
NoSQL Technology and Real-time, Accurate Predictive AnalyticsInfiniteGraph
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBMongoDB
 
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisCourse 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisBetacowork
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
 
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo
 
Scaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analyticsScaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analyticsConnected Data World
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceDavid Walker
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...Betacowork
 
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...DataWorks Summit
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j
 
Using Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESUsing Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESDATAVERSITY
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 

What's hot (20)

Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
NoSQL Technology and Real-time, Accurate Predictive Analytics
NoSQL Technology and Real-time, Accurate Predictive AnalyticsNoSQL Technology and Real-time, Accurate Predictive Analytics
NoSQL Technology and Real-time, Accurate Predictive Analytics
 
DataHub
DataHubDataHub
DataHub
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDB
 
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisCourse 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
 
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
 
Scaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analyticsScaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analytics
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
 
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
 
Using Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESUsing Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDES
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 

Similar to Optimizing the
 Data Supply Chain
 for Data Science

Exploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeExploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeProduct School
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’sMongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’sMongoDB
 
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsBlake Irvine
 
Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891Dennis Li
 
Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891Dennis Li
 
Connecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables DiscoveryConnecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables DiscoveryInside Analysis
 
Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationInside Analysis
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsInside Analysis
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsAutomated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsDatabricks
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger BrainsDenny Lee
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...DATAVERSITY
 

Similar to Optimizing the
 Data Supply Chain
 for Data Science (20)

Exploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeExploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks Like
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’sMongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
 
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of Analytics
 
Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891
 
Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891Netflix enablingacultureofanalytics-150528161427-lva1-app6891
Netflix enablingacultureofanalytics-150528161427-lva1-app6891
 
Connecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables DiscoveryConnecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables Discovery
 
Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
 
eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsAutomated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger Brains
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
 

Recently uploaded

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Recently uploaded (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Optimizing the
 Data Supply Chain
 for Data Science

  • 1. Optimizing the
 Data Supply Chain
 for Data Science 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Marc Hadfield CEO, Vital A.I.
  • 2. about: vital ai 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Software Applications:
 Artificial Intelligence, Machine Learning, Data Science. Software Vendor & Consulting Services
  • 3. agenda 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai • Data Models • How A.I., Data Science, & Data Governance relate • Data Supply Chain & the Data Product • Problem: the “Telephone Game” across the DSC • Architecture Transition from Data Warehouse to DSC • Data Models and DSC; a Framework for Solutions • Examples • Collaboration & Visualization note: general methodology, with some specific examples from Vital AI implementations.
  • 4. takeaways: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai • The Data Supply Chain is a supply chain to deliver Data Products • Data Models can capture the implicit meaning of data (and that is the goal!) • Data Models can help negotiate the implicit differences across the DSC • Data Models offer a means to collaborate on data standards (meaning) across the DSC partners
  • 5. about data models: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Semantic Models
  • 6. big data: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai volume, velocity, variety, veracity variety: data models “Product”: different meaning in Manufacturing vs Retail context Healthcare, same entity: “Patient”, “InsuredPerson”, “BillableEntity”
  • 7. example: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Class: Person
 Property: birthday Standardized Unique Global Identifier (URI) data type: date
 relationship with property: age
 allowed range of values (can’t be born in the future) 
 typical (average/expected) value…
 (Birthdays in Wikipedia vs Customer Database)
  • 8. about: vital ai tech 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Vital AI Development Kit (VDK) VitalSigns — Data Modeling & Code Generation VitalService — Common API for Databases, Machine Learning, Apache Spark, Data Transforms
  • 9. about: vital ai tech 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai VitalService Query Executable Query Query Generator Common Query API:
 Relational DB (SQL)
 Graph DB (Sparql)
 Key/Value Store
 NOSQL DB
 Document DB
 Apache Spark
 Hive (Hadoop)
 Predictive Models (a query for an unknown value)
 Goal: Build A.I. applications across variety of infrastructure with consistent API & Models.
  • 10. example data: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Person:Recipient Person:Sender Message hasRecipient hasSender
  • 11. example “MetaQL” query: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai GRAPH { value segments: ["mydata"] ARC { node_constraint { Message.class } constraint { "?person1 != ?person2" } ARC_AND { ARC { edge_constraint { Edge_hasSender.class } node_constraint { Person.props().emailAddress.equalTo(“john@example.org")
 } node_constraint { Person.class } node_provides { "person1 = URI" } } ARC { edge_constraint { Edge_hasRecipient.class } node_constraint { Person.class } node_provides { "person2 = URI" } } } } } “Person” may have subtypes, like Student or Employee.
  • 12. a.i. and data quality 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai
  • 13. data models & machine learning: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai using the meaning of classes and properties, automatically generate predictive models. predictive models features:
 birthday, zip code, …
  • 14. data governance =
 defining the meaning of data = 
 feature (pre)engineering 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai critical aspect of data science
  • 15. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Progression of Analytics:
  • 16. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai where a.i. happens Progression of Analytics:
  • 17. Garbage In = Garbage Out 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai = Bad A.I. data governance required for Good A.I.
  • 18. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai one more point on
 data governance… think outside the box
 (data warehouse)
  • 19. data governance: data in motion 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai vs. inside data warehouse outside data warehouse
  • 20. supply chain 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai
  • 21. supply chain: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai product
  • 22. data supply chain: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai data product
  • 23. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Retail Recommendations…
 
 Shipping/Logistics Optimization…
 
 Compliance, Auditing, Security, Fraud Detection…
 data product:
  • 24. why data supply chain? 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Partner DW Your DW "No matter who you are, most of the smartest people work for someone else.” — Bill Joy.
  • 25. why data supply chain? 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Partner DW Your DW "No matter who you are, most of the smartest people data works for someone else.” — Bill Joy. (revised)
  • 26. data supply chain 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Partner DW Your DW why not ETL?
  • 27. Partner DW 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai
  • 28. Extract… 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai not quite as expected…
  • 29. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Transform… a bit extreme…
  • 30. Load… 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai a bit messy…
  • 31. Clean… 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai a lot of manual effort…
  • 32. … your imported data 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai
  • 33. Your DW 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Partner DW Why?
  • 34. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai what goes wrong? telephone game…
  • 35. You 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Partner Model “A” Model “B” Implicit Model
  • 36. Resolution: Make explicit the implicit.
 Align Data Models. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Reason:
 Implicit assumptions in the data.
 ETL can’t see the forest for the trees.
 (or it’s very difficult with missing assumptions)
  • 37. Example: Internet of Things 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Predictive Analytics “Nest for Office Buildings” Office Tower with Building Management System (BMS) containing 100,000 monitored points (temperature, energy usage of chiller, fan speed, etc.) with significant missing data, errors, and noise. Reconciliation of data to produce predictive models to minimize energy usage. Rules for data correctness.
  • 38. Sensor Data Validation: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Source data had temperature values of “0” (zero) which meant either the temperature was 0 degrees or that the sensor had an error.
 Data Model “knows” that it’s rarely 0 degrees in July (far from the standard deviation), and that the temperature can be compared to weather data on a day in December for reasonableness. If Data Model also knows the maintenance schedule for the sensors, then it “knows” when to expect 0 error values and exclude them. Missing Maintenance Assumptions.
 Fill in secondary (weather) data for validation.
  • 39. how did we get here? 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Architecture Review: a quick step back… What is a Data Supply Chain architecture?
  • 40. “traditional” data warehouse: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai ETL within the organization.
 Data Governance across the organization. DW
  • 41. tech co. “agile” data warehouse: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai storage compute HDFS Spark DataSets Jobs Batch/Streaming
 Build Predictive Models
 Realtime: Spark/Storm hadoop cluster
  • 42. enterprise: data lake 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai storage compute HDFS Spark X(save $) “Data Swamp”
  • 43. aside: Data Lake 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai better analogy: Scriptorium library,
 manuscript copying, & book distribution. but not as Pithy as “Lake”…
  • 44. tech co. microservices (micro-SOA): 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai storage compute service “Composed” App external: social data, weather API independent clusters,
 local data expertise optimize development processes, scale up.
  • 45. microservices example: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Amazon: product search uses 170 independent microservices including services for predicting customer characteristics, getting product images, etc. http://www.infoworld.com/article/2903144/application- development/how-to-succeed-with-microservices-architecture.html Netflix similar architecture
  • 46. Data Supply Chain: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai storage compute service Data Product “ETL” Owner “A” Owner “B” optimize development processes, scale up. independent clusters,
 local data, ownership
  • 47. Interaction Points: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Data Product service compute ETL Owner “A” Owner “B”
  • 48. Data Lineage: Cloudera Navigator 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai …within a Data Warehouse trace back jobs that produced every data field.
  • 49. Data Supply Chain with Provenance: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai include provenance data directly in imported dataset. use in rules to interpret the data. entity-123 | hasSource | datasource-A entity-123 | name | “John Doe” Data Warehouse B
  • 50. Interaction Points: Data Models 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Data Product service compute ETL Data Models: Gatekeepers & Transform Owner “A” Owner “B”
  • 51. Data Supply Chain using Models: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai storage compute service Data Product ETL Owner “A” Owner “B” Model
 Server Data Models: focus of data governance
  • 52. Semantic Data Models: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Make explicit the meaning of data Transformation and Validation Rules leverage the Model and Meaning.
 Such Rules may be packaged with the Model, and managed together. Protect against implicit assumptions
  • 53. Example: Financial Services 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai A B C Service Provider Reconciliation of Corporate Structure across 1,000’s of organizations. Compliance Rules barring communication between “researchers” and “traders”.
 Rules to infer if “Mary” is a “researcher” or “trader”.
 Conflicting concepts of “Branch-Office”, “Direct- Report”, etc. across the Globe.
  • 54. Example: Hospital Group 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai A B C Data Analytics Reconciliation across Patient Records, Insurance, & Billing for Patient Predictive Analytics.
 Rules for identity: “same person”
  • 55. Data Models: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai OWL: Semantic Ontology Model
 (W3C Standard, Various Standards for Rules) VitalSigns: Generate Code validation, transformation, … VitalSigns: Versioning, Dependencies, Exchange, Storage, Change Management (Semantic “Diff”)
  • 56. Example: Personally Identifiable Information 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Data Governance determines that “Profession” and “ZipCode” cannot be used together.
 (Maybe a single “Dentist” in a small town…)
 Within a single Data Warehouse we can bar these data elements from being combined.
 But:
 Microservice A provides value of “Profession”
 Microservice B provides value of “ZipCode”
 
 How to enforce that these two microservices cannot be combined?
  • 57. Example: Personally Identifiable Information 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Validation code enforcing data usage: Person person123 = get_person_details(“entity-123”)
 
 // this call works: person123.profession = get-profession(person123)
 
 // this call blocks because of data model validation
 // person123 already has “profession” property
 person123.zipcode = get-zipcode(person123)
  • 58. 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Gatekeepers Externally Managed.
 Active not Passive, more like “code”.
 Defining what should exist, not cataloguing what exists.
 Can decide when to be tolerant or strict. Semantic Data Models:
  • 59. Collaborative Conversations: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Infrastructure DevOps Data Scientists Business + Domain Experts Developers Semantic Model
  • 60. Collaborative Conversations: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Business + Domain Experts Semantic Model Business + Domain Experts Semantic Model Partner A Partner B Model Alignment What Concepts to combine, not what Tables to combine (that comes later).
  • 61. Authoring Tool: OWL IDE Protege 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai
  • 62. Visualization: Semantic Data 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai
  • 63. Visualization: WebVOWL 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai
  • 64. in conclusion, takeaways: 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai • The Data Supply Chain is a supply chain to deliver Data Products • Data Models can capture the implicit meaning of data (and that is the goal!) • Data Models can help negotiate the implicit differences across the DSC • Data Models offer a means to collaborate on data standards (meaning) across the DSC partners
  • 65. Questions? 61 Broadway Suite 1105 New York, NY 10006 info@vital.ai http://www.vital.ai Marc Hadfield CEO, Vital A.I. marc@vital.ai