2. +300
%
100
+
Dataiku DSS platform growth (users) Clients Leaders in their industries
#1 Insurance Brand
#1 Pharma Brand
#1 US Construction Company
#1 Financial Information Company
#1 Flash Sales Company
#1 Car Sharing Company
#1 Parking Device Company
#1 Cosmetics Company
#3 CPG Company
A couple of metrics on Dataiku
8
0
Employees in London, SF, New York, Paris
Customer Analytics & Insights: churn, recommendation
Process Optimization: IoT, predictive maintenance, sales
force opt
2
3. Gartner, Inc., Magic Quadrant for Data Science Platforms, Alexander Linden, Peter Krensky, Jim Hare,
Carlie J. Idoine, Svetlana Sicular, Shubhangi Vashisth, 14 February, 2017.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from
Dataiku. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other
designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or
implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Dataiku named a “visionary”
in Gartner 2017 Magic Quadrant for Data Science Platforms
Dataiku made its debut on the 2017
Magic Quadrant as
highest in Completeness of Vision
Gartner, Inc., Magic Quadrant for Data Science Platforms, Alexander Linden, Peter Krensky, Jim Hare,
Carlie J. Idoine, Svetlana Sicular, Shubhangi Vashisth, 14 February, 2017.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from
Dataiku. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest
ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims
all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Dataiku named a “visionary”
in Gartner 2017 Magic Quadrant for Data Science Platforms
Dataiku made its debut on the 2017
Magic Quadrant as
highest in Completeness of Vision
5. Horizontal Collaboration vs. Vertical Collaboration
Data Engineer
Line-of-
business
Data
Consumer
Data EngineerData Engineer
Data AnalystData Analyst
Data ScientistData ScientistData Scientist
Data Analyst
Business
Leader
Data
Consumer
Line-of-
business
Data
Consumer
Data Engineer
Line-of-
business
Data
Consumer
Data Engineer
Data Analyst
Data ScientistData Scientist
Data Analyst
Business
Leader
Data
Consumer
Line-of-
business
Data
Consumer
Data Engineer
Data Analyst
Data Scientist
6. Conductor: Best tech, Best algos
Database Data
File System Data
Run in Database
Run in Memory
Python, R, …
Enterprise SQL,
Analytic SQLRun In Cluster
Spark, Impala, Hive, …
ML in Memory
Python Scikit-Learn, R, …
Distributed ML
Mllib, H2O, …
Vertica,
Greenplum,
Redshift,
PostgreSQL,
…
Data Lake
Cassandra,
HDFS, …
Host File System,
Remote File System,
…
S3
Abstract away from
technology
Designed in Post-
Hadoop/Spark world
7. Dataiku, For Production
Development Zone
Dev DWH Dev Hadoop /
data lake
DESIGN Node
Business
Analyst
Data
Scientist
Web
Developer
System
Administrator
Database
Administrator
Data Production Zone
AUTOMATION Node
Production
DWH / Hadoop
Deploy Workflow
SCORING Node
Web Production
Zone
Production
Databases
End
Users
Deploy Model
8. Data Science Studio 4.0 – February 2017
• Native, Multi-user Security
• Kerberos/Sentry/Ranger: Manage multiple user profiles in accordance with your Hadoop multi-user
security policy.
• Spark Pipelines
• Run consecutive Spark recipes in a single Spark job and avoid writing intermediate datasets, thus
dramatically improving run-time performance.
• Interactive Dashboards
• Allow for visibility at all stages of a project – not only at the reporting stage. Drag & drop to
integrate graphs, datasets, KPIs, etc, and get up to date reports in one click.
• Notifications/Integration
• Set-up notifications on objects you want to watch. Use GitHub, Slack, or HipChat to keep relevant
team members up-to-date on what goes on in specific projects.
• ML Enhancements
• Hierarchical clustering, faster scoring via in-database operations, more visualization for clustering
interpretation, more time series analyses
9.
10. Where is processing executed?
In DSS memory Streaming in DSS In Hadoop cluster In SQL Database
Visual Preparation
Design
YES N/A N/A N/A
Visual Preparation
Execution
N/A YES YES NO
Join / Group /
Stack / Window…
N/A
YES
Except Window
YES YES
Python recipe YES YES
YES
PySpark
Custom code
R recipe YES NO
YES
SparkR
NO
Spark-Scala recipe NO
Fetching of
non-HDFS DB
YES NO
SQL recipe N/A N/A
YES
Hive, Impala, Pig,
SparkSQL
YES
Machine Learning
train
YES
Scikit-learn, XGBoost
N/A
YES
MLLib, Sparkling Water
YES
Vertica ML
Machine Learning
score
N/A
YES
Scikit-learn, XGBoost
YES
MLLib, Sparkling Water
YES
Vertica ML