SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Big and Fast Data Strategy 2017
Jonathan Raspaud
AVP - Big Data Architecture
February, 2017
© Antuit 2016 Proprietary & Confidential; Not for circulation 2
Executive Summary
2017 Data Landscape
Vision
Strategy
Roadmap
Key Initiatives
High Level Architecture
High Level Data Flow
Data Validity Vendor Comparison
© Antuit 2016 Proprietary & Confidential; Not for circulation 3
About Jonathan Raspaud:
1998 2000
2006
2011
2012
2017
AVP-Big Data Architecture
Senior Principal Data Architect
Mobility Practice Lead
Manager Business Intelligence
Datawarehouse EngineerSoftware Engineer
Software Engineer
Teamlog
1999
IAE Grenoble
Master of Science in Management
of Information Systems
1997
© Antuit 2016 Proprietary & Confidential; Not for circulation 4
2017 Data Landscape (1): The Four V’s
Data Volume:
Billions of Rows
Data Validity:
Format
Process
Data Velocity:
Real time
Streaming
Weblogs
Clickstreams
IoT
Text
Call Center
Chat
Social
Sensors
Markets
Networks
Transportation
IoT
Social
Data Variety:
Structured
Semi-
Structured
Unstructured
© Antuit 2016 Proprietary & Confidential; Not for circulation 5
2017 Data Landscape (2): Legacy RDBMS Databases are
poor at:
• Scalability,
• Fast Streaming Data,
• Unstructured Data,
• Schema Flexibility,
• Search,
© Antuit 2016 Proprietary & Confidential; Not for circulation 6
2017 Data Landscape (3): MPP/Column-Store Databases:
The Good: The Bad:
SQL based, wide capability with
BI tools
Need to move the data from
operational systems
Good Performance Data loses Freshness
Full support for aggregation and
ad hoc filtering
Ultimate scale limitations
Hard to adapt schema
Can be expensive
© Antuit 2016 Proprietary & Confidential; Not for circulation 7
2017 Data Landscape (4): Hadoop:
The Good: The Bad:
Distributed storage and
processing of massive data sets
SQL interfaces are improving but
still not speed-of-thought
Low-cost clusters built from
commodity
hardware
© Antuit 2016 Proprietary & Confidential; Not for circulation 8
2017 Data Landscape (5): NoSQL Databases:
The Good: The Bad:
Storage and retrieval of data
which is modeled in means other
than the tabular relations used in
RDBMS
Traditional BI tools lack native
compatibility
More and more application
developers choose NoSQL
Databases as operational
databases
Not optimized for analytic queries
Scalability; schema-less
flexibility, and fast response time
for short-request queries
Some don’t support aggregation
or ad hoc filtering on arbitrary
field
© Antuit 2016 Proprietary & Confidential; Not for circulation 9
2017 Data Landscape (6): Search Databases:
The Good: The Bad:
Using a search index technology
is a great way to enable access to
big data in the enterprise
Lacks SQL interface – traditional
BI tools incompatibility
Deliver fast access to
unstructured or semi-structured
information: blog posts and
comments, customer product
reviews, machine logs, JSON
scripts…
Native APIs required to access
data
Very effective with structured
data too
© Antuit 2016 Proprietary & Confidential; Not for circulation 10
2017 Data Landscape (7): Cloud Big Data Stores:
The Good: The Bad:
Storing massive amounts of data
in the cloud
Traditional BI tools lack
performance optimized native
integration
Low cost
Easy to manage
Range of storage options: file
system, SQL database, Hadoop,
Spark…
© Antuit 2016 Proprietary & Confidential; Not for circulation 11
2017 Data Landscape (8): Fast Data:
The Good: The Bad:
Fast inserts/updates Traditional BI tools lack
integration
Fast analytics Traditional BI tools are not
architected for streaming data
Limited or Lacks SQL interface
© Antuit 2016 Proprietary & Confidential; Not for circulation 12
2017 Data Landscape (9): Conclusion
• Legacy BI not designed for Modern Data:
• Hard to use: designed in an age of specialized skills
– Focus on the power user
– Complicated workbench interfaces
– Require SQL coding quickly
• Cannot Scale: deployed on desktops or monolithic servers
– Limited user scalability
– Poor performance
– Not built for embedding in other applications
• Performance Problems: designed for relational data only
– Loss of functionality
– Poor performance
– Limited data scalability
© Antuit 2016 Proprietary & Confidential; Not for circulation 13
Modern Big and Fast Data Platform Requirements: 5 V’s
Data Requirement
Volume 1. Immediate visualization & interaction regardless of
size of data
2. Don’t move or copy data
Variety 1. Support a broad range of modern sources without
lock-in
2. Blend multi-source data on-the-fly
3. Extensible data connectors for different types of data
Velocity 1. Support fast data (streaming)
2. Integrate streaming & historical data in a single view
Veracity 1. Master Data Management
2. Definitions
Value 1. Business Insight, Monetization, Optimization, New
Customers
© Antuit 2016 Proprietary & Confidential; Not for circulation 14
Vision (Example):
“Business Insights at the Speed of Light”.
© Antuit 2016 Proprietary & Confidential; Not for circulation 15
Strategy (Example):
• Speed is our main strategic asset,
• Spark is the engine that powers all our data initiatives,
• Set the context and get out of the way,
• Build Proof of Concepts ready for Production,
• Public Cloud only,
• Leverage Key Vendors as needed: Paxata, Cloudera, ZoomData, Google,
Amazon…
© Antuit 2016 Proprietary & Confidential; Not for circulation 16
Roadmap (Example):
Insights
Infrastructure
Ingestion
Big BI
Strategy
Procurement
Q2 Q3
2017
Q1
Lambda
Architecture
Deskside
People
WorkDay
Oracle
FinancialServiceNow
Human
Resource
Q4
2018
Telecom
TEM
From BI
To Big Data
IOT
Real Time
Data Science
Training
EDL
Mobile BI
Q1
Data ScienceReal Time Self Healing AI Aware
Transportation
Real Time ML
ZoomData PrestoDB Paxata IBM
DS Platform
© Antuit 2016 Proprietary & Confidential; Not for circulation 17
Enterprise Data Lake – Ingestion (Example):
Q1 Q2 Q3
Data Ingestion
• Snapchat
Other Source Systems
• Billz
• Workday
“Near Real Time”
Update (Spark batch)
• Instagram
More than once per
day update
• Pinterest
Data Ingestion
• Facebook ✅
• Twitter ✅
• Pinterest ✅
• Youtube ✅
• Instagram ✅
• DCM ✅
Other Source Systems
• Adobe Analytics
• Salesforce Marketing
Near Real Time Update
(Spark Batch)
• Facebook
Data Ingestion
• LinkedIn ✅
• Google Maps ✅
• Waze
Other Source Systems
• GSA
• Salesforce✅
“Near Real Time” Update
(Spark batch)
• Youtube ✅
Data Ingestion
• Wikipedia
• STAT
Real Time Update
(Spark Streaming)
• Twitter
Q4
© Antuit 2016 Proprietary & Confidential; Not for circulation 18
Enterprise Data Lake – Infrastructure (Example):
Q1 Q2 Q3
Scalable Database for
Data Marts
• RedShift vs. BigQuery
Security
• Kerberos authentication
• Configure External Authentication for
Cloudera Manager using AD.
Cluster Scaling
DB migration for Hive
Metastore.
Configure high
availability for Hive.
Scalable Database for
Big BI Data Marts
• RedShift vs. BigQuery
Configuration Data
Base
Kafka Cluster
Cloudera Upgrade ✅
Disaster Recovery ✅
Configuration Data Base ✅
Kafka Cluster
• (Test Cluster complete Sprint 190 ✅)
Subnet Migration
Cluster resource upgrade –
scaled out ✅
Q4
Security
• Configure Sentry in Production cluster
Configure external
database for Cloudera
Manager
Hue DB migration to
External Database
© Antuit 2016 Proprietary & Confidential; Not for circulation 19
Key Initiatives (Example):
Focus on high impact/high dollar,
Machine Learning/Deep Learning,
Big BI,
Big MDM,
© Antuit 2016 Proprietary & Confidential; Not for circulation 20
High Level Streaming Architecture (Example):
Grid Data Visualization
& Reporting
Big and Fast Data Stream and Data Store
PivotReal Time Pipeline
Batch Pipeline
Device Events
© Antuit 2016 Proprietary & Confidential; Not for circulation 21
Data Sources Data Driven
Decision
Data Visualization
and Exploration
Ingestion Big Data Store Big BI
The Enterprise Data Lake is the one source of truth for all reports
SQL
Interactive
Reporting
High Level Data Flow (Example):
Relational
Data
(CSV)
Schema Free
Nested
Data
(JSON)
Tableau, PowerBI, Looker
ODBC
JDBC
© Antuit 2016 Proprietary & Confidential; Not for circulation 22
Vendor Alteryx Paxata Trifacta
Primary
user
Technical data developer Non-technical business analyst Technical data scientist
Strengths Data integration
Data mapping
Advanced analytics
Data integration and quality
Comprehensive governance model
Centralized collaboration workbench
No coding, scripting required
Visualization
Batch processing
Weaknesses Data cleansing
Data manipulation
Ease of use
Limited enrichment today Only works with information loaded into
Hadoop
Only works with samples of data
Feedback is not in real time
Minimal data quality capabilities
Analysis Alteryx is a full stack BI
tool, and it includes a layer
of data integration
capabilities. Introducing
another BI tool (in addition
to Tableau, Qlik, Excel) is
not ideal, particularly since
it would only be able to
address data migration use
cases. It overlaps with
Snaplogic which Yahoo!
already owns.
Paxata has the most robust
capabilities to address the broadest
set of data preparation use cases.
Their model for data governance is
far above anything else on the
market. They appear to also ingest
the widest range of data sources
and have the ability to scale to a
billion rows. True enterprise
capabilities for security and scale.
Trifacta is not a good fit for our users
since they are all business analysts
and it is very complex to make
changes. Also, the information for
these use cases are coming from
multiple data sources, many of which
are not Hadoop. Trifacta does not have
the data quality capabilities needed for
the broadest number of use cases.
Big and Fast Data Validity: Vendor Comparison

Weitere ähnliche Inhalte

Was ist angesagt?

DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Slides: The Automated Business Glossary
Slides: The Automated Business GlossarySlides: The Automated Business Glossary
Slides: The Automated Business GlossaryDATAVERSITY
 
Reveal the Intelligence in your Data with Talend Data Fabric
Reveal the Intelligence in your Data with Talend Data FabricReveal the Intelligence in your Data with Talend Data Fabric
Reveal the Intelligence in your Data with Talend Data FabricJean-Michel Franco
 
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive  and Prescriptive Analytics Expert Session Webinar Predictive  and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar ibi
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Enabling a Culture of Self-Service Analytics
Enabling a Culture of Self-Service AnalyticsEnabling a Culture of Self-Service Analytics
Enabling a Culture of Self-Service AnalyticsPrecisely
 
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramRWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramDATAVERSITY
 
Crowdsourcing Data Governance
Crowdsourcing Data GovernanceCrowdsourcing Data Governance
Crowdsourcing Data GovernancePaul Boal
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
Building Effective Data Visualizations
Building Effective Data VisualizationsBuilding Effective Data Visualizations
Building Effective Data VisualizationsDATAVERSITY
 
Sailing Toward Global Data Alignment with Carnival Corporation
 Sailing Toward Global Data Alignment with Carnival Corporation Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival CorporationTamrMarketing
 
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Craig Milroy
 
The Evolution of Self-Service Analytics
The Evolution of Self-Service AnalyticsThe Evolution of Self-Service Analytics
The Evolution of Self-Service AnalyticsEckerson Group
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Cloudera, Inc.
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?DATAVERSITY
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
 
Accelerate Your Move to the Cloud with Data Catalogs and Governance
Accelerate Your Move to the Cloud with Data Catalogs and GovernanceAccelerate Your Move to the Cloud with Data Catalogs and Governance
Accelerate Your Move to the Cloud with Data Catalogs and GovernanceDATAVERSITY
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics OverviewKevin Dingle
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects FailSense Corp
 

Was ist angesagt? (20)

DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Slides: The Automated Business Glossary
Slides: The Automated Business GlossarySlides: The Automated Business Glossary
Slides: The Automated Business Glossary
 
Reveal the Intelligence in your Data with Talend Data Fabric
Reveal the Intelligence in your Data with Talend Data FabricReveal the Intelligence in your Data with Talend Data Fabric
Reveal the Intelligence in your Data with Talend Data Fabric
 
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive  and Prescriptive Analytics Expert Session Webinar Predictive  and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Enabling a Culture of Self-Service Analytics
Enabling a Culture of Self-Service AnalyticsEnabling a Culture of Self-Service Analytics
Enabling a Culture of Self-Service Analytics
 
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramRWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance Program
 
Crowdsourcing Data Governance
Crowdsourcing Data GovernanceCrowdsourcing Data Governance
Crowdsourcing Data Governance
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Building Effective Data Visualizations
Building Effective Data VisualizationsBuilding Effective Data Visualizations
Building Effective Data Visualizations
 
Sailing Toward Global Data Alignment with Carnival Corporation
 Sailing Toward Global Data Alignment with Carnival Corporation Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival Corporation
 
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239
 
The Evolution of Self-Service Analytics
The Evolution of Self-Service AnalyticsThe Evolution of Self-Service Analytics
The Evolution of Self-Service Analytics
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Accelerate Your Move to the Cloud with Data Catalogs and Governance
Accelerate Your Move to the Cloud with Data Catalogs and GovernanceAccelerate Your Move to the Cloud with Data Catalogs and Governance
Accelerate Your Move to the Cloud with Data Catalogs and Governance
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics Overview
 
Top 10 BI Trends for 2013
Top 10 BI Trends for 2013Top 10 BI Trends for 2013
Top 10 BI Trends for 2013
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects Fail
 

Ähnlich wie Big and fast data strategy 2017 jr

Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Denodo
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopMigrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopDataWorks Summit
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsSonata Software
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementVictor Olex
 

Ähnlich wie Big and fast data strategy 2017 jr (20)

Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopMigrating legacy ERP data into Hadoop
Migrating legacy ERP data into Hadoop
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
 

Kürzlich hochgeladen

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 

Kürzlich hochgeladen (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 

Big and fast data strategy 2017 jr

  • 1. Big and Fast Data Strategy 2017 Jonathan Raspaud AVP - Big Data Architecture February, 2017
  • 2. © Antuit 2016 Proprietary & Confidential; Not for circulation 2 Executive Summary 2017 Data Landscape Vision Strategy Roadmap Key Initiatives High Level Architecture High Level Data Flow Data Validity Vendor Comparison
  • 3. © Antuit 2016 Proprietary & Confidential; Not for circulation 3 About Jonathan Raspaud: 1998 2000 2006 2011 2012 2017 AVP-Big Data Architecture Senior Principal Data Architect Mobility Practice Lead Manager Business Intelligence Datawarehouse EngineerSoftware Engineer Software Engineer Teamlog 1999 IAE Grenoble Master of Science in Management of Information Systems 1997
  • 4. © Antuit 2016 Proprietary & Confidential; Not for circulation 4 2017 Data Landscape (1): The Four V’s Data Volume: Billions of Rows Data Validity: Format Process Data Velocity: Real time Streaming Weblogs Clickstreams IoT Text Call Center Chat Social Sensors Markets Networks Transportation IoT Social Data Variety: Structured Semi- Structured Unstructured
  • 5. © Antuit 2016 Proprietary & Confidential; Not for circulation 5 2017 Data Landscape (2): Legacy RDBMS Databases are poor at: • Scalability, • Fast Streaming Data, • Unstructured Data, • Schema Flexibility, • Search,
  • 6. © Antuit 2016 Proprietary & Confidential; Not for circulation 6 2017 Data Landscape (3): MPP/Column-Store Databases: The Good: The Bad: SQL based, wide capability with BI tools Need to move the data from operational systems Good Performance Data loses Freshness Full support for aggregation and ad hoc filtering Ultimate scale limitations Hard to adapt schema Can be expensive
  • 7. © Antuit 2016 Proprietary & Confidential; Not for circulation 7 2017 Data Landscape (4): Hadoop: The Good: The Bad: Distributed storage and processing of massive data sets SQL interfaces are improving but still not speed-of-thought Low-cost clusters built from commodity hardware
  • 8. © Antuit 2016 Proprietary & Confidential; Not for circulation 8 2017 Data Landscape (5): NoSQL Databases: The Good: The Bad: Storage and retrieval of data which is modeled in means other than the tabular relations used in RDBMS Traditional BI tools lack native compatibility More and more application developers choose NoSQL Databases as operational databases Not optimized for analytic queries Scalability; schema-less flexibility, and fast response time for short-request queries Some don’t support aggregation or ad hoc filtering on arbitrary field
  • 9. © Antuit 2016 Proprietary & Confidential; Not for circulation 9 2017 Data Landscape (6): Search Databases: The Good: The Bad: Using a search index technology is a great way to enable access to big data in the enterprise Lacks SQL interface – traditional BI tools incompatibility Deliver fast access to unstructured or semi-structured information: blog posts and comments, customer product reviews, machine logs, JSON scripts… Native APIs required to access data Very effective with structured data too
  • 10. © Antuit 2016 Proprietary & Confidential; Not for circulation 10 2017 Data Landscape (7): Cloud Big Data Stores: The Good: The Bad: Storing massive amounts of data in the cloud Traditional BI tools lack performance optimized native integration Low cost Easy to manage Range of storage options: file system, SQL database, Hadoop, Spark…
  • 11. © Antuit 2016 Proprietary & Confidential; Not for circulation 11 2017 Data Landscape (8): Fast Data: The Good: The Bad: Fast inserts/updates Traditional BI tools lack integration Fast analytics Traditional BI tools are not architected for streaming data Limited or Lacks SQL interface
  • 12. © Antuit 2016 Proprietary & Confidential; Not for circulation 12 2017 Data Landscape (9): Conclusion • Legacy BI not designed for Modern Data: • Hard to use: designed in an age of specialized skills – Focus on the power user – Complicated workbench interfaces – Require SQL coding quickly • Cannot Scale: deployed on desktops or monolithic servers – Limited user scalability – Poor performance – Not built for embedding in other applications • Performance Problems: designed for relational data only – Loss of functionality – Poor performance – Limited data scalability
  • 13. © Antuit 2016 Proprietary & Confidential; Not for circulation 13 Modern Big and Fast Data Platform Requirements: 5 V’s Data Requirement Volume 1. Immediate visualization & interaction regardless of size of data 2. Don’t move or copy data Variety 1. Support a broad range of modern sources without lock-in 2. Blend multi-source data on-the-fly 3. Extensible data connectors for different types of data Velocity 1. Support fast data (streaming) 2. Integrate streaming & historical data in a single view Veracity 1. Master Data Management 2. Definitions Value 1. Business Insight, Monetization, Optimization, New Customers
  • 14. © Antuit 2016 Proprietary & Confidential; Not for circulation 14 Vision (Example): “Business Insights at the Speed of Light”.
  • 15. © Antuit 2016 Proprietary & Confidential; Not for circulation 15 Strategy (Example): • Speed is our main strategic asset, • Spark is the engine that powers all our data initiatives, • Set the context and get out of the way, • Build Proof of Concepts ready for Production, • Public Cloud only, • Leverage Key Vendors as needed: Paxata, Cloudera, ZoomData, Google, Amazon…
  • 16. © Antuit 2016 Proprietary & Confidential; Not for circulation 16 Roadmap (Example): Insights Infrastructure Ingestion Big BI Strategy Procurement Q2 Q3 2017 Q1 Lambda Architecture Deskside People WorkDay Oracle FinancialServiceNow Human Resource Q4 2018 Telecom TEM From BI To Big Data IOT Real Time Data Science Training EDL Mobile BI Q1 Data ScienceReal Time Self Healing AI Aware Transportation Real Time ML ZoomData PrestoDB Paxata IBM DS Platform
  • 17. © Antuit 2016 Proprietary & Confidential; Not for circulation 17 Enterprise Data Lake – Ingestion (Example): Q1 Q2 Q3 Data Ingestion • Snapchat Other Source Systems • Billz • Workday “Near Real Time” Update (Spark batch) • Instagram More than once per day update • Pinterest Data Ingestion • Facebook ✅ • Twitter ✅ • Pinterest ✅ • Youtube ✅ • Instagram ✅ • DCM ✅ Other Source Systems • Adobe Analytics • Salesforce Marketing Near Real Time Update (Spark Batch) • Facebook Data Ingestion • LinkedIn ✅ • Google Maps ✅ • Waze Other Source Systems • GSA • Salesforce✅ “Near Real Time” Update (Spark batch) • Youtube ✅ Data Ingestion • Wikipedia • STAT Real Time Update (Spark Streaming) • Twitter Q4
  • 18. © Antuit 2016 Proprietary & Confidential; Not for circulation 18 Enterprise Data Lake – Infrastructure (Example): Q1 Q2 Q3 Scalable Database for Data Marts • RedShift vs. BigQuery Security • Kerberos authentication • Configure External Authentication for Cloudera Manager using AD. Cluster Scaling DB migration for Hive Metastore. Configure high availability for Hive. Scalable Database for Big BI Data Marts • RedShift vs. BigQuery Configuration Data Base Kafka Cluster Cloudera Upgrade ✅ Disaster Recovery ✅ Configuration Data Base ✅ Kafka Cluster • (Test Cluster complete Sprint 190 ✅) Subnet Migration Cluster resource upgrade – scaled out ✅ Q4 Security • Configure Sentry in Production cluster Configure external database for Cloudera Manager Hue DB migration to External Database
  • 19. © Antuit 2016 Proprietary & Confidential; Not for circulation 19 Key Initiatives (Example): Focus on high impact/high dollar, Machine Learning/Deep Learning, Big BI, Big MDM,
  • 20. © Antuit 2016 Proprietary & Confidential; Not for circulation 20 High Level Streaming Architecture (Example): Grid Data Visualization & Reporting Big and Fast Data Stream and Data Store PivotReal Time Pipeline Batch Pipeline Device Events
  • 21. © Antuit 2016 Proprietary & Confidential; Not for circulation 21 Data Sources Data Driven Decision Data Visualization and Exploration Ingestion Big Data Store Big BI The Enterprise Data Lake is the one source of truth for all reports SQL Interactive Reporting High Level Data Flow (Example): Relational Data (CSV) Schema Free Nested Data (JSON) Tableau, PowerBI, Looker ODBC JDBC
  • 22. © Antuit 2016 Proprietary & Confidential; Not for circulation 22 Vendor Alteryx Paxata Trifacta Primary user Technical data developer Non-technical business analyst Technical data scientist Strengths Data integration Data mapping Advanced analytics Data integration and quality Comprehensive governance model Centralized collaboration workbench No coding, scripting required Visualization Batch processing Weaknesses Data cleansing Data manipulation Ease of use Limited enrichment today Only works with information loaded into Hadoop Only works with samples of data Feedback is not in real time Minimal data quality capabilities Analysis Alteryx is a full stack BI tool, and it includes a layer of data integration capabilities. Introducing another BI tool (in addition to Tableau, Qlik, Excel) is not ideal, particularly since it would only be able to address data migration use cases. It overlaps with Snaplogic which Yahoo! already owns. Paxata has the most robust capabilities to address the broadest set of data preparation use cases. Their model for data governance is far above anything else on the market. They appear to also ingest the widest range of data sources and have the ability to scale to a billion rows. True enterprise capabilities for security and scale. Trifacta is not a good fit for our users since they are all business analysts and it is very complex to make changes. Also, the information for these use cases are coming from multiple data sources, many of which are not Hadoop. Trifacta does not have the data quality capabilities needed for the broadest number of use cases. Big and Fast Data Validity: Vendor Comparison