SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Big Data Analytics:
      Beyond Beer and Diapers
      2012/2/22
      Kai Zhao @Teradata
      kingaim@gmail.com



  by Kai Zhao 2011.12
Disclaimer:
Any views or opinions presented in this article are solely those of the author and do NOT necessarily represent those of Teradata or other companies .
Content

Background:
    Traditional Business Intelligent(BI)
    What is Big Data
    What is Big Data Analytics
    Big Data Analytics: State of the Art
Big Data Analytics Technology Stack
    ETL/ELT/ETLT(Demo)
    MPP Data Warehouse
    Map Reduce
    NoSQL
    Web Service
    Data Analytics
    Data Visualization
    BI Tools(Demo)
Big Data Analytics Platform Architecture
云计算风起云涌,商业智能方兴未艾,大数据分析势在必行。
Cloud Computing storming, BI revolution and It is time for BIG DATA.

Shared-nothing Massively Parallel Processing(MPP)
Petabyte Scaling
In-database Analytics
Traditional Business Intelligent(BI)
What is Big Data
Volume: The increase in data volumes within
enterprise systems is caused by transaction volumes
and other traditional data types, as well as by new
types of data. Too much volume is a storage issue,
but too much data is also a massive analysis issue.

Variety: IT leaders have always had an issue
translating large volumes of transactional
information into decisions — now there are more
types of information to analyze — mainly coming
from social media and mobile (context-aware).
Variety includes tabular data (databases),
hierarchical data, documents, e-mail, metering data,
video, still images, audio, stock ticker data, financial
transactions and more.

Velocity: This involves streams of data, structured
record creation, and availability for access and
delivery. Velocity means both how fast data is being
produced and how fast the data must be processed
to meet demand.
What is Big Data (cont.)
Broadly speaking, Big Data is generated by a number of sources, including:
Social Networking and Media: There are currently over 700 million Facebook users, 250 million Twitter users and 156
million public blogs. Each Facebook update, Tweet, blog post and comment creates multiple new data points, both
structured, semi-structured and unstructured, sometimes called Data Exhaust.
Mobile Devices: There are over 5 billion mobile phones in use worldwide. Each call, text and instant message is
logged as data. Mobile devices, particularly smart phones and tablets, also make it easier to use social media and use
other data-generating applications. Mobile devices also collect and transmit location data.
Internet Transactions: Billions of online purchases, stock trades and other transactions happen every day, including
countless automated transactions. Each creates a number of data points collected by retailers, banks, credit cards,
credit agencies and others.
Networked Devices and Sensors: Electronic devices of all sorts – including servers and other IT hardware, smart
energy meters and temperature sensors -- all create semi-structured log data that record every action.
What is Big Data Analytics

See Video
    Big Data
    Visualization
Big Data Analytics: State of the Art

Acquisitions and Investments
Big Data Vendors and Their Productions
Forrester Report
Gartner Report
Acquisitions and Investments

 Acquirer   Acquiree(Est. date) Date of Acq. Deal                              Summary
 Teradata   AsterData - 2005      2011.3.3      $0.263 billion                 Traditional Data
 HP         Vertica – 2005        2011.2.14     $1.2 billion                   Warehouse Vendors
                                                                               needs Big Data
 IBM        Netezza – 2000        2010.11.11    $1.7 billion                   Analytics technology.
 EMC        Greenplum – 2003      2010.7.6      $0.1~0.15 billion
 SAP        Sybase                2010.5.12     $0.58 billion


Investee             Investment
Cloudera                     $76 million
MapR                         $29 million
Hortonworks                  $50 million
Datameer                     $10 million
Summary              New Big Data
                     Analytics Startups

                                               Source: http://www.leiphone.com/why-2012-the-year-of-hadoop.html
Big Data Vendors and Their Productions




                        Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond
Forrester Report
Hype Cycle




             Source: Gartner
Gartner Report: Hype Cycle 2011




                                  Source: Gartner
Big Data Analytics Technology Stack




Data Import

 Data Storage

   Data Computing

     Data Analytics

       XXX as a Service
ETL/ELT/ETLT


Extract – The process by which data is extracted from the data source
Transform – The transformation of the source data into a format relevant to the solution
Load – The loading of data into the warehouse



This approach to data warehouse development is the traditional and widely accepted approach.
The following diagram illustrates each of the individual stages in the process.
ETL


This approach to data warehouse development is the traditional and widely accepted approach.
The following diagram illustrates each of the individual stages in the process.




                                  Source: Robert J Davenport ETL vs ELT A Subjective View
ETL
Strengths
      Development Time
      Designing from the output backwards ensures that only data relevant to the solution is extracted and processed,
      potentially reducing development, extract, and processing overhead; and therefore time.
      Targeted data
      Due to the targeted nature of the load process, the warehouse contains only data relevant to the presentation.
      Administration Overhead
      Reduced warehouse content simplifies the security regime implemented and hence the administration overhead.
      Tools Availability
      The prolific number of tools available that implement ETL provides flexibility of approach and the opportunity to
      identify a most appropriate tool. The proliferation of tools has lead to a competitive functionality war, which
      often results in loss of maintainability.
Weaknesses
      Flexibility
      Targeting only relevant data for output means that any future requirements, that may need data that was not
      included in the original design, will need to be added to the ETL routines. Due to nature of tight dependency
      between the routines developed, this often leads to a need for fundamental re-design and development. As a
      result this increases the time and costs involved.
      Hardware
      Most third party tools utilize their own engine to implement the ETL process. Regardless of the size of the
      solution this can necessitate the investment in additional hardware to implement the tool’s ETL engine.
      Skills Investment
      The use of third party tools to implement ETL processes compels the learning of new scripting languages.
      Learning Curve
      Implementing a third party tool that uses foreign processes and languages results in the learning curve that is
      implicit in all technologies new to an organization and can often lead to following blind alleys in their use due to
      lack of experience.
ELT


Whilst this approach to the implementation of a warehouse appears on the surface to be
similar to ETL, it differs in a number of significant ways.
The following diagram illustrates the process.
ELT
Strengths
Project Management
Being able to split the warehouse process into specific and isolated tasks, enables a project to be designed on a smaller
task basis, therefore the project can be broken down into manageable chunks.
Flexible & Future Proof
In general, in an ELT implementation all data from the sources are loaded into the warehouse as part of the extract and
load process. This, combined with the isolation of the transformation process, means that future requirements can easily
be incorporated into the warehouse structure.
Risk minimization
Removing the close interdependencies between each stage of the warehouse build process enables the development
process to be isolated, and the individual process design can thus also be isolated. This provides an excellent platform for
change, maintenance and management.
Utilize Existing Hardware
In implementing ELT as a warehouse build process, the inherent tools provided with the database engine can be used.
Alternatively, the vast majority of the third party ELT tools available employ the use of the database engine’s capability
and hence the ELT process is run on the same hardware as the database engine underpinning the data warehouse, using
the existing hardware deployed.
Utilize Existing Skill sets
By using the functionality provided by the database engine, the existing investments in database skills are re-used to
develop the warehouse.
Weaknesses
Against the Norm
ELT is an emergent approach to data warehouse design and development. Whilst it has proven itself many times over
through its abundant use in implementations throughout the world, it does require a change in mentality and design
approach against traditional methods. To get the best from an ELT approach requires an open mind.
Tools Availability
Being an emergent technology approach, ELT suffers from a limited availability of tools.
ETL Demo - Kettle


  Demo of Pentaho Kettle.
Map Reduce: Hadoop




                                                Comparing with MPP Data Warehouse.


            Source: http://www.capgemini.com/technology-blog/2012/01/what-is-hadoop/
Map Reduce: Hadoop


                                                Professional
                                                  Service

                                                                    Enterprise-
                          Database OLTP                                grade
                                                                    Distribution




                                                                                       Hadoop
           Subscription                                                             replacements:
             Service                                                                  Teradata
                                                                                   Aster/MongoDB
                                                Hadoop




                 Cluster                                                   Data Integration
               Management                                                   with Hadoop




                                          EDW                  BI
MPP Data Warehouse



    Comparing MPP Data Warehouse with Hadoop stack.
    Draw a picture.
NoSQL
NoSQL/SQL/NewSQL
    Non-Relational                                                Relational

                            Analytics(OLAP)
                                                                SQL MPP
                                                               Teradata IBM Netezza EMC Greenplum HP Vertica
                              Hadoop                           Teradata Aster VectorWise


       Operational(OLTP)                                      Oracle IBM DB2 SQL Server




        NoSQL

       KeyValue            Graph              Cloud Service
   MongoDB
                           Neo4j
                                               Amazon         Amazon RDS SQL Azure
         BDB                                   SimpleDB
         Voldemort
         Toyko Cabinet
                            Document           Columnar

                           CouchDB            HBase             MySQL PostgreSQL Ingres Sybase EnterpriseDB
                                              Cassandra




         Redis             MongoDB                                                              Data Grid/Cache

                                              Memcached
Web Service



   There are a lot of Web Services.
Data Analytics



    A lot of…..
Data Visualization: It is VERY IMPORTANT to Attract User




                             Source:打破陈规-数据及信息的可视化 向怡宁
Data Visualization: It is VERY IMPORTANT to Compete




                            Source:打破陈规-数据及信息的可视化 向怡宁
Data Visualization: It is VERY IMPORTANT to User Experience




                            Source:打破陈规-数据及信息的可视化 向怡宁
BI Tools

BI Tools fall into three categories:
Query Tools
     A query tool is software setup for users to ask questions about the data. The user can
     search for patterns or details.
Multidimensional Analysis Tools
     A multidimensional analysis tool, also called Online Analytical Processing (OLAP),
      is software that allows the user to view the same data from different aspects.
     Eg: Business Objects, Hyperio, Cognos, MicroStrategy, Pentaho, Microsoft Analysis Services
     and Palo OLAP Server etc.
Data Mining Tools
     A data mining tool is software that is automated to search data, seeking out ways that
     the data correlates to other data.
     Eg: SPSS Clementine, Weka3, R and Apache Mahout etc.
BI Tools List




                Source: BI Tool Survey 2012 http://www.businessintelligencetoolbox.com/list-of-business-intelligence-bi-tools/
BI Tools: Gartner Evaluation




                               Business intelligence (BI) platforms
                               enable all types of users – from IT staff to
                               consultants to business users – to build
                               applications that help organizations learn
                               about and understand their business
BI Demo – JasperSoft iReport



   Demo Session: JasperSoft iReport
Big Data Analytics Platform Architecture
Any Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewNagaraj Yerram
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lakeCapgemini
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTIBCO Spotfire
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...DataWorks Summit
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data ServicesJeffrey T. Pollock
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsSeeling Cheung
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryDataWorks Summit
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityCloudera, Inc.
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
 
Third Nature - Open Source Data Warehousing
Third Nature - Open Source Data WarehousingThird Nature - Open Source Data Warehousing
Third Nature - Open Source Data Warehousingmark madsen
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHitachi Vantara
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems
 

Was ist angesagt? (20)

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made Easy
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
VenkatSubbaReddy_Resume
VenkatSubbaReddy_ResumeVenkatSubbaReddy_Resume
VenkatSubbaReddy_Resume
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricity
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Third Nature - Open Source Data Warehousing
Third Nature - Open Source Data WarehousingThird Nature - Open Source Data Warehousing
Third Nature - Open Source Data Warehousing
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On World
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & Analytics
 

Andere mochten auch

MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
 
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMongoDB
 
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationMatteo Dell'Amico
 
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy appdoc2005
 
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...MongoDB
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Kai Zhao
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingLucas Abrantes
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalakeKylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalakeKai Zhao
 
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2Kai Zhao
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTKai Zhao
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Jonathan Gray
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with CodeRi Liu
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionDeloitte United States
 

Andere mochten auch (20)

MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
 
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every Organization
 
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy app
 
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewing
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Beer industry
Beer industry Beer industry
Beer industry
 
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalakeKylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
 
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with Code
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 

Ähnlich wie Big data analytics beyond beer and diapers

What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsRhonda Cetnar
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINEIRJET Journal
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDavid Portnoy
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesCindy Irby
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 

Ähnlich wie Big data analytics beyond beer and diapers (20)

What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL Tools
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINE
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 

Kürzlich hochgeladen

Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSpanmisemningshen123
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecZurliaSoop
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannaBusinessPlans
 
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubaijaehdlyzca
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistanvineshkumarsajnani12
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Adnet Communications
 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Timegargpaaro
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Availablepr788182
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwaitdaisycvs
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165meghakumariji156
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Falcon Invoice Discounting
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...meghakumariji156
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAITim Wilson
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTSkajalroy875762
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowranineha57744
 
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book nowGUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book nowkapoorjyoti4444
 

Kürzlich hochgeladen (20)

Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
 
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book nowGUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
 

Big data analytics beyond beer and diapers

  • 1. Big Data Analytics: Beyond Beer and Diapers 2012/2/22 Kai Zhao @Teradata kingaim@gmail.com by Kai Zhao 2011.12 Disclaimer: Any views or opinions presented in this article are solely those of the author and do NOT necessarily represent those of Teradata or other companies .
  • 2. Content Background: Traditional Business Intelligent(BI) What is Big Data What is Big Data Analytics Big Data Analytics: State of the Art Big Data Analytics Technology Stack ETL/ELT/ETLT(Demo) MPP Data Warehouse Map Reduce NoSQL Web Service Data Analytics Data Visualization BI Tools(Demo) Big Data Analytics Platform Architecture
  • 3. 云计算风起云涌,商业智能方兴未艾,大数据分析势在必行。 Cloud Computing storming, BI revolution and It is time for BIG DATA. Shared-nothing Massively Parallel Processing(MPP) Petabyte Scaling In-database Analytics
  • 5. What is Big Data Volume: The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue. Variety: IT leaders have always had an issue translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more. Velocity: This involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand.
  • 6. What is Big Data (cont.) Broadly speaking, Big Data is generated by a number of sources, including: Social Networking and Media: There are currently over 700 million Facebook users, 250 million Twitter users and 156 million public blogs. Each Facebook update, Tweet, blog post and comment creates multiple new data points, both structured, semi-structured and unstructured, sometimes called Data Exhaust. Mobile Devices: There are over 5 billion mobile phones in use worldwide. Each call, text and instant message is logged as data. Mobile devices, particularly smart phones and tablets, also make it easier to use social media and use other data-generating applications. Mobile devices also collect and transmit location data. Internet Transactions: Billions of online purchases, stock trades and other transactions happen every day, including countless automated transactions. Each creates a number of data points collected by retailers, banks, credit cards, credit agencies and others. Networked Devices and Sensors: Electronic devices of all sorts – including servers and other IT hardware, smart energy meters and temperature sensors -- all create semi-structured log data that record every action.
  • 7. What is Big Data Analytics See Video Big Data Visualization
  • 8. Big Data Analytics: State of the Art Acquisitions and Investments Big Data Vendors and Their Productions Forrester Report Gartner Report
  • 9. Acquisitions and Investments Acquirer Acquiree(Est. date) Date of Acq. Deal Summary Teradata AsterData - 2005 2011.3.3 $0.263 billion Traditional Data HP Vertica – 2005 2011.2.14 $1.2 billion Warehouse Vendors needs Big Data IBM Netezza – 2000 2010.11.11 $1.7 billion Analytics technology. EMC Greenplum – 2003 2010.7.6 $0.1~0.15 billion SAP Sybase 2010.5.12 $0.58 billion Investee Investment Cloudera $76 million MapR $29 million Hortonworks $50 million Datameer $10 million Summary New Big Data Analytics Startups Source: http://www.leiphone.com/why-2012-the-year-of-hadoop.html
  • 10. Big Data Vendors and Their Productions Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond
  • 12. Hype Cycle Source: Gartner
  • 13. Gartner Report: Hype Cycle 2011 Source: Gartner
  • 14. Big Data Analytics Technology Stack Data Import Data Storage Data Computing Data Analytics XXX as a Service
  • 15. ETL/ELT/ETLT Extract – The process by which data is extracted from the data source Transform – The transformation of the source data into a format relevant to the solution Load – The loading of data into the warehouse This approach to data warehouse development is the traditional and widely accepted approach. The following diagram illustrates each of the individual stages in the process.
  • 16. ETL This approach to data warehouse development is the traditional and widely accepted approach. The following diagram illustrates each of the individual stages in the process. Source: Robert J Davenport ETL vs ELT A Subjective View
  • 17. ETL Strengths Development Time Designing from the output backwards ensures that only data relevant to the solution is extracted and processed, potentially reducing development, extract, and processing overhead; and therefore time. Targeted data Due to the targeted nature of the load process, the warehouse contains only data relevant to the presentation. Administration Overhead Reduced warehouse content simplifies the security regime implemented and hence the administration overhead. Tools Availability The prolific number of tools available that implement ETL provides flexibility of approach and the opportunity to identify a most appropriate tool. The proliferation of tools has lead to a competitive functionality war, which often results in loss of maintainability. Weaknesses Flexibility Targeting only relevant data for output means that any future requirements, that may need data that was not included in the original design, will need to be added to the ETL routines. Due to nature of tight dependency between the routines developed, this often leads to a need for fundamental re-design and development. As a result this increases the time and costs involved. Hardware Most third party tools utilize their own engine to implement the ETL process. Regardless of the size of the solution this can necessitate the investment in additional hardware to implement the tool’s ETL engine. Skills Investment The use of third party tools to implement ETL processes compels the learning of new scripting languages. Learning Curve Implementing a third party tool that uses foreign processes and languages results in the learning curve that is implicit in all technologies new to an organization and can often lead to following blind alleys in their use due to lack of experience.
  • 18. ELT Whilst this approach to the implementation of a warehouse appears on the surface to be similar to ETL, it differs in a number of significant ways. The following diagram illustrates the process.
  • 19. ELT Strengths Project Management Being able to split the warehouse process into specific and isolated tasks, enables a project to be designed on a smaller task basis, therefore the project can be broken down into manageable chunks. Flexible & Future Proof In general, in an ELT implementation all data from the sources are loaded into the warehouse as part of the extract and load process. This, combined with the isolation of the transformation process, means that future requirements can easily be incorporated into the warehouse structure. Risk minimization Removing the close interdependencies between each stage of the warehouse build process enables the development process to be isolated, and the individual process design can thus also be isolated. This provides an excellent platform for change, maintenance and management. Utilize Existing Hardware In implementing ELT as a warehouse build process, the inherent tools provided with the database engine can be used. Alternatively, the vast majority of the third party ELT tools available employ the use of the database engine’s capability and hence the ELT process is run on the same hardware as the database engine underpinning the data warehouse, using the existing hardware deployed. Utilize Existing Skill sets By using the functionality provided by the database engine, the existing investments in database skills are re-used to develop the warehouse. Weaknesses Against the Norm ELT is an emergent approach to data warehouse design and development. Whilst it has proven itself many times over through its abundant use in implementations throughout the world, it does require a change in mentality and design approach against traditional methods. To get the best from an ELT approach requires an open mind. Tools Availability Being an emergent technology approach, ELT suffers from a limited availability of tools.
  • 20. ETL Demo - Kettle Demo of Pentaho Kettle.
  • 21. Map Reduce: Hadoop Comparing with MPP Data Warehouse. Source: http://www.capgemini.com/technology-blog/2012/01/what-is-hadoop/
  • 22. Map Reduce: Hadoop Professional Service Enterprise- Database OLTP grade Distribution Hadoop Subscription replacements: Service Teradata Aster/MongoDB Hadoop Cluster Data Integration Management with Hadoop EDW BI
  • 23. MPP Data Warehouse Comparing MPP Data Warehouse with Hadoop stack. Draw a picture.
  • 24. NoSQL
  • 25. NoSQL/SQL/NewSQL Non-Relational Relational Analytics(OLAP) SQL MPP Teradata IBM Netezza EMC Greenplum HP Vertica Hadoop Teradata Aster VectorWise Operational(OLTP) Oracle IBM DB2 SQL Server NoSQL KeyValue Graph Cloud Service MongoDB Neo4j Amazon Amazon RDS SQL Azure BDB SimpleDB Voldemort Toyko Cabinet Document Columnar CouchDB HBase MySQL PostgreSQL Ingres Sybase EnterpriseDB Cassandra Redis MongoDB Data Grid/Cache Memcached
  • 26. Web Service There are a lot of Web Services.
  • 27. Data Analytics A lot of…..
  • 28. Data Visualization: It is VERY IMPORTANT to Attract User Source:打破陈规-数据及信息的可视化 向怡宁
  • 29. Data Visualization: It is VERY IMPORTANT to Compete Source:打破陈规-数据及信息的可视化 向怡宁
  • 30. Data Visualization: It is VERY IMPORTANT to User Experience Source:打破陈规-数据及信息的可视化 向怡宁
  • 31. BI Tools BI Tools fall into three categories: Query Tools A query tool is software setup for users to ask questions about the data. The user can search for patterns or details. Multidimensional Analysis Tools A multidimensional analysis tool, also called Online Analytical Processing (OLAP), is software that allows the user to view the same data from different aspects. Eg: Business Objects, Hyperio, Cognos, MicroStrategy, Pentaho, Microsoft Analysis Services and Palo OLAP Server etc. Data Mining Tools A data mining tool is software that is automated to search data, seeking out ways that the data correlates to other data. Eg: SPSS Clementine, Weka3, R and Apache Mahout etc.
  • 32. BI Tools List Source: BI Tool Survey 2012 http://www.businessintelligencetoolbox.com/list-of-business-intelligence-bi-tools/
  • 33. BI Tools: Gartner Evaluation Business intelligence (BI) platforms enable all types of users – from IT staff to consultants to business users – to build applications that help organizations learn about and understand their business
  • 34. BI Demo – JasperSoft iReport Demo Session: JasperSoft iReport
  • 35. Big Data Analytics Platform Architecture