Suche senden
Hochladen
Big and fast data strategy 2017 jr
•
Als PPTX, PDF herunterladen
•
1 gefällt mir
•
436 views
Jonathan Raspaud
Folgen
Big and fast data strategy 2017
Weniger lesen
Mehr lesen
Daten & Analysen
Melden
Teilen
Melden
Teilen
1 von 22
Jetzt herunterladen
Empfohlen
BI assessment template jr
BI assessment template jr
Jonathan Raspaud
Accelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data Virtualization
Denodo
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018
LoQutus
Journey to Cloud Analytics
Journey to Cloud Analytics
Datavail
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
Embedded Analytics Expert Session Webinar
Embedded Analytics Expert Session Webinar
ibi
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
TamrMarketing
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
ibi
Empfohlen
BI assessment template jr
BI assessment template jr
Jonathan Raspaud
Accelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data Virtualization
Denodo
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018
LoQutus
Journey to Cloud Analytics
Journey to Cloud Analytics
Datavail
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
Embedded Analytics Expert Session Webinar
Embedded Analytics Expert Session Webinar
ibi
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
TamrMarketing
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
ibi
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
Slides: The Automated Business Glossary
Slides: The Automated Business Glossary
DATAVERSITY
Reveal the Intelligence in your Data with Talend Data Fabric
Reveal the Intelligence in your Data with Talend Data Fabric
Jean-Michel Franco
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar
ibi
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Caserta
Enabling a Culture of Self-Service Analytics
Enabling a Culture of Self-Service Analytics
Precisely
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance Program
DATAVERSITY
Crowdsourcing Data Governance
Crowdsourcing Data Governance
Paul Boal
Getting down to business on Big Data analytics
Getting down to business on Big Data analytics
The Marketing Distillery
Building Effective Data Visualizations
Building Effective Data Visualizations
DATAVERSITY
Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival Corporation
TamrMarketing
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239
Craig Milroy
The Evolution of Self-Service Analytics
The Evolution of Self-Service Analytics
Eckerson Group
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...
Cloudera, Inc.
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
DATAVERSITY
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
Accelerate Your Move to the Cloud with Data Catalogs and Governance
Accelerate Your Move to the Cloud with Data Catalogs and Governance
DATAVERSITY
NLB Analytics Overview
NLB Analytics Overview
Kevin Dingle
Top 10 BI Trends for 2013
Top 10 BI Trends for 2013
Tableau Software
Why Data Science Projects Fail
Why Data Science Projects Fail
Sense Corp
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
Weitere ähnliche Inhalte
Was ist angesagt?
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
Slides: The Automated Business Glossary
Slides: The Automated Business Glossary
DATAVERSITY
Reveal the Intelligence in your Data with Talend Data Fabric
Reveal the Intelligence in your Data with Talend Data Fabric
Jean-Michel Franco
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar
ibi
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Caserta
Enabling a Culture of Self-Service Analytics
Enabling a Culture of Self-Service Analytics
Precisely
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance Program
DATAVERSITY
Crowdsourcing Data Governance
Crowdsourcing Data Governance
Paul Boal
Getting down to business on Big Data analytics
Getting down to business on Big Data analytics
The Marketing Distillery
Building Effective Data Visualizations
Building Effective Data Visualizations
DATAVERSITY
Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival Corporation
TamrMarketing
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239
Craig Milroy
The Evolution of Self-Service Analytics
The Evolution of Self-Service Analytics
Eckerson Group
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...
Cloudera, Inc.
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
DATAVERSITY
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
Accelerate Your Move to the Cloud with Data Catalogs and Governance
Accelerate Your Move to the Cloud with Data Catalogs and Governance
DATAVERSITY
NLB Analytics Overview
NLB Analytics Overview
Kevin Dingle
Top 10 BI Trends for 2013
Top 10 BI Trends for 2013
Tableau Software
Why Data Science Projects Fail
Why Data Science Projects Fail
Sense Corp
Was ist angesagt?
(20)
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Slides: The Automated Business Glossary
Slides: The Automated Business Glossary
Reveal the Intelligence in your Data with Talend Data Fabric
Reveal the Intelligence in your Data with Talend Data Fabric
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Enabling a Culture of Self-Service Analytics
Enabling a Culture of Self-Service Analytics
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance Program
Crowdsourcing Data Governance
Crowdsourcing Data Governance
Getting down to business on Big Data analytics
Getting down to business on Big Data analytics
Building Effective Data Visualizations
Building Effective Data Visualizations
Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival Corporation
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239
The Evolution of Self-Service Analytics
The Evolution of Self-Service Analytics
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
Accelerate Your Move to the Cloud with Data Catalogs and Governance
Accelerate Your Move to the Cloud with Data Catalogs and Governance
NLB Analytics Overview
NLB Analytics Overview
Top 10 BI Trends for 2013
Top 10 BI Trends for 2013
Why Data Science Projects Fail
Why Data Science Projects Fail
Ähnlich wie Big and fast data strategy 2017 jr
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
Denodo
Modern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
Denodo
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
Denodo
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
What is the future of data strategy?
What is the future of data strategy?
Denodo
Data Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into Hadoop
DataWorks Summit
Big Data in Azure
Big Data in Azure
DataWorks Summit/Hadoop Summit
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
Sonata Software
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
DATAVERSITY
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
MapR Technologies
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
Victor Olex
Ähnlich wie Big and fast data strategy 2017 jr
(20)
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
Modern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
What is the future of data strategy?
What is the future of data strategy?
Data Virtualization: An Introduction
Data Virtualization: An Introduction
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into Hadoop
Big Data in Azure
Big Data in Azure
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
Kürzlich hochgeladen
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
voginip
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
GQ Research
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
MYRABACSAFRA2
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Amil Baba Dawood bangali
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
chwongval
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
17djon017
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
dajasot375
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
Rafezzaman
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
jennyeacort
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
📊 Markus Baersch
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
vhwb25kk
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
yuu sss
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
e4aez8ss
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
208367051
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Jeremy Anderson
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
limedy534
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
John Sterrett
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
fhwihughh
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
Pramod Kumar Srivastava
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Seán Kennedy
Kürzlich hochgeladen
(20)
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Big and fast data strategy 2017 jr
1.
Big and Fast
Data Strategy 2017 Jonathan Raspaud AVP - Big Data Architecture February, 2017
2.
© Antuit 2016
Proprietary & Confidential; Not for circulation 2 Executive Summary 2017 Data Landscape Vision Strategy Roadmap Key Initiatives High Level Architecture High Level Data Flow Data Validity Vendor Comparison
3.
© Antuit 2016
Proprietary & Confidential; Not for circulation 3 About Jonathan Raspaud: 1998 2000 2006 2011 2012 2017 AVP-Big Data Architecture Senior Principal Data Architect Mobility Practice Lead Manager Business Intelligence Datawarehouse EngineerSoftware Engineer Software Engineer Teamlog 1999 IAE Grenoble Master of Science in Management of Information Systems 1997
4.
© Antuit 2016
Proprietary & Confidential; Not for circulation 4 2017 Data Landscape (1): The Four V’s Data Volume: Billions of Rows Data Validity: Format Process Data Velocity: Real time Streaming Weblogs Clickstreams IoT Text Call Center Chat Social Sensors Markets Networks Transportation IoT Social Data Variety: Structured Semi- Structured Unstructured
5.
© Antuit 2016
Proprietary & Confidential; Not for circulation 5 2017 Data Landscape (2): Legacy RDBMS Databases are poor at: • Scalability, • Fast Streaming Data, • Unstructured Data, • Schema Flexibility, • Search,
6.
© Antuit 2016
Proprietary & Confidential; Not for circulation 6 2017 Data Landscape (3): MPP/Column-Store Databases: The Good: The Bad: SQL based, wide capability with BI tools Need to move the data from operational systems Good Performance Data loses Freshness Full support for aggregation and ad hoc filtering Ultimate scale limitations Hard to adapt schema Can be expensive
7.
© Antuit 2016
Proprietary & Confidential; Not for circulation 7 2017 Data Landscape (4): Hadoop: The Good: The Bad: Distributed storage and processing of massive data sets SQL interfaces are improving but still not speed-of-thought Low-cost clusters built from commodity hardware
8.
© Antuit 2016
Proprietary & Confidential; Not for circulation 8 2017 Data Landscape (5): NoSQL Databases: The Good: The Bad: Storage and retrieval of data which is modeled in means other than the tabular relations used in RDBMS Traditional BI tools lack native compatibility More and more application developers choose NoSQL Databases as operational databases Not optimized for analytic queries Scalability; schema-less flexibility, and fast response time for short-request queries Some don’t support aggregation or ad hoc filtering on arbitrary field
9.
© Antuit 2016
Proprietary & Confidential; Not for circulation 9 2017 Data Landscape (6): Search Databases: The Good: The Bad: Using a search index technology is a great way to enable access to big data in the enterprise Lacks SQL interface – traditional BI tools incompatibility Deliver fast access to unstructured or semi-structured information: blog posts and comments, customer product reviews, machine logs, JSON scripts… Native APIs required to access data Very effective with structured data too
10.
© Antuit 2016
Proprietary & Confidential; Not for circulation 10 2017 Data Landscape (7): Cloud Big Data Stores: The Good: The Bad: Storing massive amounts of data in the cloud Traditional BI tools lack performance optimized native integration Low cost Easy to manage Range of storage options: file system, SQL database, Hadoop, Spark…
11.
© Antuit 2016
Proprietary & Confidential; Not for circulation 11 2017 Data Landscape (8): Fast Data: The Good: The Bad: Fast inserts/updates Traditional BI tools lack integration Fast analytics Traditional BI tools are not architected for streaming data Limited or Lacks SQL interface
12.
© Antuit 2016
Proprietary & Confidential; Not for circulation 12 2017 Data Landscape (9): Conclusion • Legacy BI not designed for Modern Data: • Hard to use: designed in an age of specialized skills – Focus on the power user – Complicated workbench interfaces – Require SQL coding quickly • Cannot Scale: deployed on desktops or monolithic servers – Limited user scalability – Poor performance – Not built for embedding in other applications • Performance Problems: designed for relational data only – Loss of functionality – Poor performance – Limited data scalability
13.
© Antuit 2016
Proprietary & Confidential; Not for circulation 13 Modern Big and Fast Data Platform Requirements: 5 V’s Data Requirement Volume 1. Immediate visualization & interaction regardless of size of data 2. Don’t move or copy data Variety 1. Support a broad range of modern sources without lock-in 2. Blend multi-source data on-the-fly 3. Extensible data connectors for different types of data Velocity 1. Support fast data (streaming) 2. Integrate streaming & historical data in a single view Veracity 1. Master Data Management 2. Definitions Value 1. Business Insight, Monetization, Optimization, New Customers
14.
© Antuit 2016
Proprietary & Confidential; Not for circulation 14 Vision (Example): “Business Insights at the Speed of Light”.
15.
© Antuit 2016
Proprietary & Confidential; Not for circulation 15 Strategy (Example): • Speed is our main strategic asset, • Spark is the engine that powers all our data initiatives, • Set the context and get out of the way, • Build Proof of Concepts ready for Production, • Public Cloud only, • Leverage Key Vendors as needed: Paxata, Cloudera, ZoomData, Google, Amazon…
16.
© Antuit 2016
Proprietary & Confidential; Not for circulation 16 Roadmap (Example): Insights Infrastructure Ingestion Big BI Strategy Procurement Q2 Q3 2017 Q1 Lambda Architecture Deskside People WorkDay Oracle FinancialServiceNow Human Resource Q4 2018 Telecom TEM From BI To Big Data IOT Real Time Data Science Training EDL Mobile BI Q1 Data ScienceReal Time Self Healing AI Aware Transportation Real Time ML ZoomData PrestoDB Paxata IBM DS Platform
17.
© Antuit 2016
Proprietary & Confidential; Not for circulation 17 Enterprise Data Lake – Ingestion (Example): Q1 Q2 Q3 Data Ingestion • Snapchat Other Source Systems • Billz • Workday “Near Real Time” Update (Spark batch) • Instagram More than once per day update • Pinterest Data Ingestion • Facebook ✅ • Twitter ✅ • Pinterest ✅ • Youtube ✅ • Instagram ✅ • DCM ✅ Other Source Systems • Adobe Analytics • Salesforce Marketing Near Real Time Update (Spark Batch) • Facebook Data Ingestion • LinkedIn ✅ • Google Maps ✅ • Waze Other Source Systems • GSA • Salesforce✅ “Near Real Time” Update (Spark batch) • Youtube ✅ Data Ingestion • Wikipedia • STAT Real Time Update (Spark Streaming) • Twitter Q4
18.
© Antuit 2016
Proprietary & Confidential; Not for circulation 18 Enterprise Data Lake – Infrastructure (Example): Q1 Q2 Q3 Scalable Database for Data Marts • RedShift vs. BigQuery Security • Kerberos authentication • Configure External Authentication for Cloudera Manager using AD. Cluster Scaling DB migration for Hive Metastore. Configure high availability for Hive. Scalable Database for Big BI Data Marts • RedShift vs. BigQuery Configuration Data Base Kafka Cluster Cloudera Upgrade ✅ Disaster Recovery ✅ Configuration Data Base ✅ Kafka Cluster • (Test Cluster complete Sprint 190 ✅) Subnet Migration Cluster resource upgrade – scaled out ✅ Q4 Security • Configure Sentry in Production cluster Configure external database for Cloudera Manager Hue DB migration to External Database
19.
© Antuit 2016
Proprietary & Confidential; Not for circulation 19 Key Initiatives (Example): Focus on high impact/high dollar, Machine Learning/Deep Learning, Big BI, Big MDM,
20.
© Antuit 2016
Proprietary & Confidential; Not for circulation 20 High Level Streaming Architecture (Example): Grid Data Visualization & Reporting Big and Fast Data Stream and Data Store PivotReal Time Pipeline Batch Pipeline Device Events
21.
© Antuit 2016
Proprietary & Confidential; Not for circulation 21 Data Sources Data Driven Decision Data Visualization and Exploration Ingestion Big Data Store Big BI The Enterprise Data Lake is the one source of truth for all reports SQL Interactive Reporting High Level Data Flow (Example): Relational Data (CSV) Schema Free Nested Data (JSON) Tableau, PowerBI, Looker ODBC JDBC
22.
© Antuit 2016
Proprietary & Confidential; Not for circulation 22 Vendor Alteryx Paxata Trifacta Primary user Technical data developer Non-technical business analyst Technical data scientist Strengths Data integration Data mapping Advanced analytics Data integration and quality Comprehensive governance model Centralized collaboration workbench No coding, scripting required Visualization Batch processing Weaknesses Data cleansing Data manipulation Ease of use Limited enrichment today Only works with information loaded into Hadoop Only works with samples of data Feedback is not in real time Minimal data quality capabilities Analysis Alteryx is a full stack BI tool, and it includes a layer of data integration capabilities. Introducing another BI tool (in addition to Tableau, Qlik, Excel) is not ideal, particularly since it would only be able to address data migration use cases. It overlaps with Snaplogic which Yahoo! already owns. Paxata has the most robust capabilities to address the broadest set of data preparation use cases. Their model for data governance is far above anything else on the market. They appear to also ingest the widest range of data sources and have the ability to scale to a billion rows. True enterprise capabilities for security and scale. Trifacta is not a good fit for our users since they are all business analysts and it is very complex to make changes. Also, the information for these use cases are coming from multiple data sources, many of which are not Hadoop. Trifacta does not have the data quality capabilities needed for the broadest number of use cases. Big and Fast Data Validity: Vendor Comparison
Jetzt herunterladen