SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Beyond the buzz – what does “Big
Data” mean to your organization?
Attila Barta, Ph.D.
Head of Architecture at Private Client Group and BMO Insurance
1BIG DATA WORLD CANADA 2013
Introduction to this presentation
•This presentation covers the following topics:
There is more in Big Data than Hadoop.
To understand the Big Data buzz, one has to go to the beginnings and understand the forces
that brought Big Data to life.
Is Big Data another buzz world like Semantic Web, Web 2.0 or Cloud?
Where are Canadian companies on Big Data in comparison with the World?
How a reference Big Data architecture looks like.
Big Data at BMO Financial Group.
The road ahead, what needs to be done.
•Note: this presentation reflects the opinions of the author alone and by no means of BMO Financial Group.
2BIG DATA WORLD CANADA 2013
Big Data – How we got here
•In a 2001 research report[1] Gartner analyst Doug Laney defined data growth challenges and opportunities as
being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and
variety (range of data types and sources). Gartner, and now much of the industry, continue to use this "3Vs"
model for describing Big Data[2]. (source Wikipedia).
•What was happening in 2001? Three major trends:
 Sloan Digital Sky Survey began collecting astronomical data in 2000 at a rate of 200GB/night – volume
 Sensor networks (web of things) and streaming databases (Message Oriented Middleware) – velocity
 Semi-structured databases, XML native databases beside object-oriented, relational databases – variety
•What happened after 2001?
 Rise of search engines and portals - Yahoo and Google:
• Problem: how to store and query (cheaply) in real time large amounts of (semi-structured) data.
• Answer: Hadoop on commodity Linux farms.
 Memory got cheaper – in-memory data grids.
 Rise of Social Media – petabytes in pictures, unstructured and semi-structured data.
 Increased computational power and large memory – visual analytics.
3BIG DATA WORLD CANADA 2013
Big Data – Definitions and Examples
•In 2012, Gartner updated its definition as follows: "Big data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to enable enhanced decision making, insight discovery
and process optimization“[3].
• In 2012 IDC defines Big Data technologies as “a new generation of technologies and architectures designed
to extract value economically from very large volumes of a wide variety of data by enabling high-velocity
capture, discovery, and/or analysis”[4].
•In 2012 Forrester characterize Big Data as “increases in data volume, velocity, variety, and variability”[5].
•Big Data Characteristics:
1. Data Volume: data size in order of petabytes.
• Example: Facebook on June 13, 2012 announced that their had reached 100 PB of data. On
November 8, 2012 they announced that their warehouse grows by half a PB per day.
2. Data Velocity: real time processing of streaming data, including real time analytics.
• Example: a jet engine generates 20TB data/hour that has to be processed near real time.
3. Data Variety: structured, semi-structured, text, imagines, video, audio, etc.
• Example: 80% of enterprise data is unstructured. YouTube - 500TB of video uploaded per year
4. Data Variability: data flows can be inconsistent with periodic peaks.
• Example: blogs commenting the new Blackberry 10; stock market data that reacts to market events.
4BIG DATA WORLD CANADA 2013
Big Data – In Canada, where are we?
•In December 2012 IDC published a study of Big Data in Canada [4] by surveying 75 businesses with over
250MM in revenue. The conclusions of the survey are sobering:
 Less than one tenth of the respondents were familiar wit Hadoop (the Big Data framework) and slightly
more familiar with in memory data grids and in-memory analytics.
 Only half of Canadian organization already work with Big Data in comparison with more than three quarters
worldwide.
 The majority of Canadian companies use mainly internally produced data with less than a quarter of
Canadian organizations using data from non-traditional sources such as social media web data, RFID tags
and GPS.
 Big Data strategies are delegated to mid-level management level, while world-class companies integrate
technology decisions at the executive level.
5BIG DATA WORLD CANADA 2013
Big Data – What are we missing in Canada?
•McKinsey Global Institute published “Big Data: The next frontier for innovation, competition and productivity”
in May 2011. In the sectors that they examined they estimated opportunities of hundreds of billion/yearly in
savings or new businesses by unleashing the potential of Big Data [6].
•Big Data immediate business opportunities:
 Transparent omni-channel information environment – an evolution of multi-channel characterized by a
seamlessly approach to the consumer experience through all available interaction channels.
 Sentiment analysis – data from social media enable organizations to perceive and analyze client
sentiment in order to better tailor marketing campaigns, products and services.
 Predictive models – based on real-time data streams determine likelihood to churn and take pre-emptive
actions for customer retention.
 Social technologies – not only understand holistically the client (the 360-degree view), but understand the
clients network of family, friends and peers in order to build the client 720-degree view.
 Location data – better understand behaviour, better offers based on location.
 Operational improvement: RFI and sensor networks allows (retailers) to get insights into demand and
better manage inventory and supply chains.
6BIG DATA WORLD CANADA 2013
Big Data – Reference Architecture
•Typical architectures for Big Data address the following capabilities:
1.Real-time complex event processing (including sense and response).
2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID).
3.Parallel processing/fast loading, typically based on Hadoop.
4.High-performance query systems based on in-memory data architectures.
5.Advanced analytics, e.g. visual analytics, columnar databases.
Virtual Infrastructure Workload Management
Infrastructure Services
Event Mgmt.
Query
(SQL, non-SQL)
Processing
Advanced
Analytics
Shared nothing hwd,
massively parallel
Commodity;
own or rent
Massive load via
parallel processing
Data Stream
A variant of the Forrester architecture [5]
Stream Processing
Non-relational dbms
Data Management
Relational dbms
Distributed File System
In-Memory Data Grid
7BIG DATA WORLD CANADA 2013
Big Data – at BMO Financial Group
Virtual Infrastructure Workload Management
Infrastructure Services
Event Mgmt.
Query
(SQL, non-SQL)
Processing
Advanced
Analytics
Client Omni-Channel
Interactions
Tableau, SAS
Spotfire, HANA
Tibco
BusinessEvents
Stream Processing
Non-relational dbms
Data Management
Relational dbms
Distributed File System
In-Memory Data Grid
Tibco ActiveSpaces,
HANA
Sybase IQ
PaaS, IaaS
•Big Data is work in progress at BMO Financial Group with some areas more advanced then others:
 Event management and in-memory data grids are state of the art.
 Advanced analytics are in transition to mature.
 Infrastructure virtualization is in progress.
 Hadoop infrastructure not in scope yet.
 Non-relational capability is in its infancy.
• Operational
• Proof of Concept
Legend
Note: the vendor list is by no means exhaustive, these are some of the technologies in use or in PoC.
8BIG DATA WORLD CANADA 2013
Big Data – Capabilities at BMO Financial Group
•How the reference Big Data capabilities are reflected at BMO Financial Group:
1.Real-time complex event processing (including sense and response):
• Built a state of the art omni-channel sense and response capability based on a Tibco stack.
• Deployed real time in-bound lead management capability in 2011 that generated a significant increase
in up-sale and cross-sale – major new revenue for the Retail Bank.
2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID):
• Data volumes manageable within the current infrastructure.
• Location data is currently available and in plan to be harvested.
• Plans on using social media data for sentiment analysis.
3.Parallel processing/fast loading, typically based on Hadoop:
• Not in plan, the current ETL investment is performing well.
4.High-performance query architecture based on in-memory data architectures:
• Running a state of the art in-memory data grid for real time event processing as well as for client 360-
degree view.
• Currently evaluating in-memory data grids for real time risk management as well as several regulatory
requirements, like Anti-Money-Laundering and Client Risk Management.
5.Advanced analytics, i.e. visual analytics, columnar databases:
• There are several advanced analytics tools in use such as Tableau and Sybase IQ, while currently
evaluating Tibco Spotfire, HANA and others.
9BIG DATA WORLD CANADA 2013
Big Data – Impact on Enterprise Information Management
•Is the traditional MDM redundant?
 By no means; while there are in-memory MDM implementations it rather makes sense to keep the current
investment and load to in-memory databases only subsets of MDM data, e.g. client 360-degree view or any
other data elements needed for event management, sense and response or other capabilities.
•What will happen with the current EDW?
 Not much; transactional data will still be an important source for BI. However, the full power of parallel
query processing and the parallelism built into hardware should be harvested.
 EDWs should be augmented with social data, location data, either directly or via service providers in order
to provide the foundation for sentiment analysis and predictive modeling.
•Are ETLs tools done?
 Depends. This is the sweet spot where vendors are pitching Hadoop. Moreover, is your enterprise ready for
Hadoop? Are you ready to move to commodity hardware? Do you have the skills for both commodity
hardware and Hadoop?
•Time to retire current BI tools (e.g. Cognos, Business Objects, etc.)?
 Definitely not; continue to use the current management reports and dash-boards.
 Educate business on the new visual analytic tools and let them decide the way forward.
 Educate business on the new BI capabilities enabled by in-memory data bases.
•However be aware of the new competitor that is building it’s Information Management from scratch and with
the proper Big Data technology might compromise your established business advantage!
10BIG DATA WORLD CANADA 2013
Big Data – Organizational challenges
•What needs to be done:
 In Big Data initiatives business leaders have to take the initiative. The new role of the CIO team is to
educate business in Big Data and its opportunities versus defining and leading initiatives.
 CIOs have to take a holistic approach to Big Data by considering all Big Data capabilities and define
strategies accordingly, instead of focusing on some capabilities like fast ETL loading for which Hadoop is a
quick fix.
 Adapt the Information Management Strategy to include behavioral oriented data, like social data, as well as
location and sensor data.
 Change the BI strategy towards commoditization and massive parallel processing.
 Big Data requires new skill set for handling Hadoop environments as well as in-memory data and advanced
analytics. McKinsey predicts a current shortage of more than a hundred thousand Big Data professionals in
the US alone [6].
•Last but not least:
 Big Data is an evolution of many technologies around for the last decade or so. Although, with the potential
to be a technology disruptor, Big Data is rather an important augmentation to the current technologies and
if used properly it can provide significant business benefits as well as competitive advantage.
11BIG DATA WORLD CANADA 2013
Thank you for your time! Questions?
attila.barta@bmo.com
12BIG DATA WORLD CANADA 2013
Appendix
1. References
2. Hadoop – a Definition
13BIG DATA WORLD CANADA 2013
References
1. Douglas, Laney "3D Data Management: Controlling Data Volume, Velocity and Variety". Gartner, 2001.
2. Beyer, Mark "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of
Data“, Gartner, 2011.
3. Douglas, Laney "The Importance of 'Big Data': A Definition“, Gartner, 2012.
4. Wallis, Nigel “Big Data in Canada: Challenging Complacency for Competitive Advantage”, IDC, 2012.
5. Gogia, Sanchit “The Big Deal About Big Data For Customer Engagement”, Forrester, 2012.
6. James Manika et al. “Big Data: The next frontier for innovation, competition and productivity”, McKinsey
Global Institute, 2011.
14BIG DATA WORLD CANADA 2013
Hadoop – a Definition
•Apache Hadoop is an open-source software framework that supports data-intensive distributed applications,
licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity
hardware. The Hadoop framework transparently provides both reliability and data motion to applications.
•Hadoop implements a computational paradigm named MapReduce, where the application is divided into many
small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition,
it provides a distributed file system that stores data on the compute nodes, providing very high aggregate
bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node
failures are automatically handled by the framework. It enables applications to work with thousands of
computation-independent computers and petabytes of data. Hadoop was derived from Google's MapReduce
and Google File System (GFS) papers.
•The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel,
MapReduce and Hadoop Distributed File System (HDFS), as well as a number of related projects –
including Apache Hive, Apache HBase, and others.
•Hadoop is written in the Java programming language and is a top-level Apache project being built and used
by a global community of contributors. Hadoop and its related projects (Hive, HBase, Zookeeper, and so on)
have many contributors from across the ecosystem. Though Java code is most common, any programming
language can be used with "streaming" to implement the "map" and "reduce" parts of the system.
Source: Wikipedia

Weitere ähnliche Inhalte

Was ist angesagt?

Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...Denodo
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsAlan Morrison
 
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Brad Culbert
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
 
How to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics OrganizationHow to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics OrganizationDATAVERSITY
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...Neo4j
 
Death of the Dashboard
Death of the DashboardDeath of the Dashboard
Death of the DashboardDATAVERSITY
 
ADV Slides: Data Curation for Artificial Intelligence Strategies
ADV Slides: Data Curation for Artificial Intelligence StrategiesADV Slides: Data Curation for Artificial Intelligence Strategies
ADV Slides: Data Curation for Artificial Intelligence StrategiesDATAVERSITY
 
Data-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital TransformationData-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital TransformationLindaWatson19
 
Digital Transformation: How to Build an Analytics-Driven Culture
Digital Transformation: How to Build an Analytics-Driven CultureDigital Transformation: How to Build an Analytics-Driven Culture
Digital Transformation: How to Build an Analytics-Driven CultureAlexander Loth
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategySustainableEnergyAut
 
Financial Markets Data & Analytics Led Transformation
Financial Markets Data & Analytics Led TransformationFinancial Markets Data & Analytics Led Transformation
Financial Markets Data & Analytics Led TransformationGianpaolo Zampol
 
Data-centric market status, case studies and outlook
Data-centric market status, case studies and outlookData-centric market status, case studies and outlook
Data-centric market status, case studies and outlookAlan Morrison
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trendsAlan Morrison
 
Analytics driving innovation and efficiency in Banking
Analytics driving innovation and efficiency in BankingAnalytics driving innovation and efficiency in Banking
Analytics driving innovation and efficiency in BankingGianpaolo Zampol
 
Big Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference DiscussionBig Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference DiscussionBill Kohnen
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnIBM Danmark
 

Was ist angesagt? (20)

Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge Graphs
 
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
How to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics OrganizationHow to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics Organization
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
 
Death of the Dashboard
Death of the DashboardDeath of the Dashboard
Death of the Dashboard
 
ADV Slides: Data Curation for Artificial Intelligence Strategies
ADV Slides: Data Curation for Artificial Intelligence StrategiesADV Slides: Data Curation for Artificial Intelligence Strategies
ADV Slides: Data Curation for Artificial Intelligence Strategies
 
Data-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital TransformationData-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital Transformation
 
Digital Transformation: How to Build an Analytics-Driven Culture
Digital Transformation: How to Build an Analytics-Driven CultureDigital Transformation: How to Build an Analytics-Driven Culture
Digital Transformation: How to Build an Analytics-Driven Culture
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data Strategy
 
Financial Markets Data & Analytics Led Transformation
Financial Markets Data & Analytics Led TransformationFinancial Markets Data & Analytics Led Transformation
Financial Markets Data & Analytics Led Transformation
 
Data-centric market status, case studies and outlook
Data-centric market status, case studies and outlookData-centric market status, case studies and outlook
Data-centric market status, case studies and outlook
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
 
Big data basics
Big data basicsBig data basics
Big data basics
 
Analytics driving innovation and efficiency in Banking
Analytics driving innovation and efficiency in BankingAnalytics driving innovation and efficiency in Banking
Analytics driving innovation and efficiency in Banking
 
Big Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference DiscussionBig Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference Discussion
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren Ravn
 

Andere mochten auch

Nuevas formas de comunicacion (ntic's)
Nuevas formas de comunicacion (ntic's)Nuevas formas de comunicacion (ntic's)
Nuevas formas de comunicacion (ntic's)Santilore
 
Advocaat bij arbeidsconflicten
Advocaat bij arbeidsconflictenAdvocaat bij arbeidsconflicten
Advocaat bij arbeidsconflictenJos Kaldenhoven
 
Herramientas web2
Herramientas web2Herramientas web2
Herramientas web2Luis Duque
 
Tecnologías de la información.
Tecnologías de la información. Tecnologías de la información.
Tecnologías de la información. Mateo Vasquez
 
Senior Project Speech
Senior Project SpeechSenior Project Speech
Senior Project SpeechBechtel524
 
CV english finall 16_17 (1)
CV english finall 16_17 (1)CV english finall 16_17 (1)
CV english finall 16_17 (1)Blerim Zeqiri
 
Latest PPT.pptx
Latest PPT.pptxLatest PPT.pptx
Latest PPT.pptxDreamMalar
 
인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀
인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀
인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀진화 장
 
Utanhússklæðningar
UtanhússklæðningarUtanhússklæðningar
UtanhússklæðningarGisliG
 
SocialBro: Gestiona y Analiza tu comunidad en Twitter
SocialBro: Gestiona y Analiza tu comunidad en TwitterSocialBro: Gestiona y Analiza tu comunidad en Twitter
SocialBro: Gestiona y Analiza tu comunidad en TwitterLeticia Polese
 
¿Qué hacer en caso de quemaduras?
¿Qué hacer en caso de quemaduras?¿Qué hacer en caso de quemaduras?
¿Qué hacer en caso de quemaduras?P G
 

Andere mochten auch (20)

Nuevas formas de comunicacion (ntic's)
Nuevas formas de comunicacion (ntic's)Nuevas formas de comunicacion (ntic's)
Nuevas formas de comunicacion (ntic's)
 
Advocaat bij arbeidsconflicten
Advocaat bij arbeidsconflictenAdvocaat bij arbeidsconflicten
Advocaat bij arbeidsconflicten
 
Herramientas web2
Herramientas web2Herramientas web2
Herramientas web2
 
Tecnologías de la información.
Tecnologías de la información. Tecnologías de la información.
Tecnologías de la información.
 
Pandemias
PandemiasPandemias
Pandemias
 
Senior Project Speech
Senior Project SpeechSenior Project Speech
Senior Project Speech
 
Presentación1
Presentación1Presentación1
Presentación1
 
CV english finall 16_17 (1)
CV english finall 16_17 (1)CV english finall 16_17 (1)
CV english finall 16_17 (1)
 
Jose maria
Jose mariaJose maria
Jose maria
 
Latest PPT.pptx
Latest PPT.pptxLatest PPT.pptx
Latest PPT.pptx
 
Quem quer ser milionario blogger
Quem quer ser milionario bloggerQuem quer ser milionario blogger
Quem quer ser milionario blogger
 
Video juegos
Video juegosVideo juegos
Video juegos
 
Genesis
GenesisGenesis
Genesis
 
인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀
인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀
인터렉티브디자인2_ㅅㅏ용ㅈㅏㅌ ㅔ 스 트_고생하지않길바래 팀
 
Utanhússklæðningar
UtanhússklæðningarUtanhússklæðningar
Utanhússklæðningar
 
O Sermão da Montanha
O Sermão da MontanhaO Sermão da Montanha
O Sermão da Montanha
 
Tecnicas para la comunicacion
Tecnicas para la comunicacionTecnicas para la comunicacion
Tecnicas para la comunicacion
 
Adão, Eva e o pecado original!
Adão, Eva e o pecado original!Adão, Eva e o pecado original!
Adão, Eva e o pecado original!
 
SocialBro: Gestiona y Analiza tu comunidad en Twitter
SocialBro: Gestiona y Analiza tu comunidad en TwitterSocialBro: Gestiona y Analiza tu comunidad en Twitter
SocialBro: Gestiona y Analiza tu comunidad en Twitter
 
¿Qué hacer en caso de quemaduras?
¿Qué hacer en caso de quemaduras?¿Qué hacer en caso de quemaduras?
¿Qué hacer en caso de quemaduras?
 

Ähnlich wie What_BigData_means_to_your_organization

BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptxPentaTech
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data! B Spot
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris
 
Die Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AIDie Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AIDenodo
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDATAVERSITY
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxVaishnavGhadge1
 
Big data seminor
Big data seminorBig data seminor
Big data seminorberasrujana
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAkshata Humbe
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 

Ähnlich wie What_BigData_means_to_your_organization (20)

Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data!
 
Big data
Big dataBig data
Big data
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Die Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AIDie Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AI
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Bi orientations
Bi orientationsBi orientations
Bi orientations
 

What_BigData_means_to_your_organization

  • 1. Beyond the buzz – what does “Big Data” mean to your organization? Attila Barta, Ph.D. Head of Architecture at Private Client Group and BMO Insurance
  • 2. 1BIG DATA WORLD CANADA 2013 Introduction to this presentation •This presentation covers the following topics: There is more in Big Data than Hadoop. To understand the Big Data buzz, one has to go to the beginnings and understand the forces that brought Big Data to life. Is Big Data another buzz world like Semantic Web, Web 2.0 or Cloud? Where are Canadian companies on Big Data in comparison with the World? How a reference Big Data architecture looks like. Big Data at BMO Financial Group. The road ahead, what needs to be done. •Note: this presentation reflects the opinions of the author alone and by no means of BMO Financial Group.
  • 3. 2BIG DATA WORLD CANADA 2013 Big Data – How we got here •In a 2001 research report[1] Gartner analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, continue to use this "3Vs" model for describing Big Data[2]. (source Wikipedia). •What was happening in 2001? Three major trends:  Sloan Digital Sky Survey began collecting astronomical data in 2000 at a rate of 200GB/night – volume  Sensor networks (web of things) and streaming databases (Message Oriented Middleware) – velocity  Semi-structured databases, XML native databases beside object-oriented, relational databases – variety •What happened after 2001?  Rise of search engines and portals - Yahoo and Google: • Problem: how to store and query (cheaply) in real time large amounts of (semi-structured) data. • Answer: Hadoop on commodity Linux farms.  Memory got cheaper – in-memory data grids.  Rise of Social Media – petabytes in pictures, unstructured and semi-structured data.  Increased computational power and large memory – visual analytics.
  • 4. 3BIG DATA WORLD CANADA 2013 Big Data – Definitions and Examples •In 2012, Gartner updated its definition as follows: "Big data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization“[3]. • In 2012 IDC defines Big Data technologies as “a new generation of technologies and architectures designed to extract value economically from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis”[4]. •In 2012 Forrester characterize Big Data as “increases in data volume, velocity, variety, and variability”[5]. •Big Data Characteristics: 1. Data Volume: data size in order of petabytes. • Example: Facebook on June 13, 2012 announced that their had reached 100 PB of data. On November 8, 2012 they announced that their warehouse grows by half a PB per day. 2. Data Velocity: real time processing of streaming data, including real time analytics. • Example: a jet engine generates 20TB data/hour that has to be processed near real time. 3. Data Variety: structured, semi-structured, text, imagines, video, audio, etc. • Example: 80% of enterprise data is unstructured. YouTube - 500TB of video uploaded per year 4. Data Variability: data flows can be inconsistent with periodic peaks. • Example: blogs commenting the new Blackberry 10; stock market data that reacts to market events.
  • 5. 4BIG DATA WORLD CANADA 2013 Big Data – In Canada, where are we? •In December 2012 IDC published a study of Big Data in Canada [4] by surveying 75 businesses with over 250MM in revenue. The conclusions of the survey are sobering:  Less than one tenth of the respondents were familiar wit Hadoop (the Big Data framework) and slightly more familiar with in memory data grids and in-memory analytics.  Only half of Canadian organization already work with Big Data in comparison with more than three quarters worldwide.  The majority of Canadian companies use mainly internally produced data with less than a quarter of Canadian organizations using data from non-traditional sources such as social media web data, RFID tags and GPS.  Big Data strategies are delegated to mid-level management level, while world-class companies integrate technology decisions at the executive level.
  • 6. 5BIG DATA WORLD CANADA 2013 Big Data – What are we missing in Canada? •McKinsey Global Institute published “Big Data: The next frontier for innovation, competition and productivity” in May 2011. In the sectors that they examined they estimated opportunities of hundreds of billion/yearly in savings or new businesses by unleashing the potential of Big Data [6]. •Big Data immediate business opportunities:  Transparent omni-channel information environment – an evolution of multi-channel characterized by a seamlessly approach to the consumer experience through all available interaction channels.  Sentiment analysis – data from social media enable organizations to perceive and analyze client sentiment in order to better tailor marketing campaigns, products and services.  Predictive models – based on real-time data streams determine likelihood to churn and take pre-emptive actions for customer retention.  Social technologies – not only understand holistically the client (the 360-degree view), but understand the clients network of family, friends and peers in order to build the client 720-degree view.  Location data – better understand behaviour, better offers based on location.  Operational improvement: RFI and sensor networks allows (retailers) to get insights into demand and better manage inventory and supply chains.
  • 7. 6BIG DATA WORLD CANADA 2013 Big Data – Reference Architecture •Typical architectures for Big Data address the following capabilities: 1.Real-time complex event processing (including sense and response). 2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID). 3.Parallel processing/fast loading, typically based on Hadoop. 4.High-performance query systems based on in-memory data architectures. 5.Advanced analytics, e.g. visual analytics, columnar databases. Virtual Infrastructure Workload Management Infrastructure Services Event Mgmt. Query (SQL, non-SQL) Processing Advanced Analytics Shared nothing hwd, massively parallel Commodity; own or rent Massive load via parallel processing Data Stream A variant of the Forrester architecture [5] Stream Processing Non-relational dbms Data Management Relational dbms Distributed File System In-Memory Data Grid
  • 8. 7BIG DATA WORLD CANADA 2013 Big Data – at BMO Financial Group Virtual Infrastructure Workload Management Infrastructure Services Event Mgmt. Query (SQL, non-SQL) Processing Advanced Analytics Client Omni-Channel Interactions Tableau, SAS Spotfire, HANA Tibco BusinessEvents Stream Processing Non-relational dbms Data Management Relational dbms Distributed File System In-Memory Data Grid Tibco ActiveSpaces, HANA Sybase IQ PaaS, IaaS •Big Data is work in progress at BMO Financial Group with some areas more advanced then others:  Event management and in-memory data grids are state of the art.  Advanced analytics are in transition to mature.  Infrastructure virtualization is in progress.  Hadoop infrastructure not in scope yet.  Non-relational capability is in its infancy. • Operational • Proof of Concept Legend Note: the vendor list is by no means exhaustive, these are some of the technologies in use or in PoC.
  • 9. 8BIG DATA WORLD CANADA 2013 Big Data – Capabilities at BMO Financial Group •How the reference Big Data capabilities are reflected at BMO Financial Group: 1.Real-time complex event processing (including sense and response): • Built a state of the art omni-channel sense and response capability based on a Tibco stack. • Deployed real time in-bound lead management capability in 2011 that generated a significant increase in up-sale and cross-sale – major new revenue for the Retail Bank. 2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID): • Data volumes manageable within the current infrastructure. • Location data is currently available and in plan to be harvested. • Plans on using social media data for sentiment analysis. 3.Parallel processing/fast loading, typically based on Hadoop: • Not in plan, the current ETL investment is performing well. 4.High-performance query architecture based on in-memory data architectures: • Running a state of the art in-memory data grid for real time event processing as well as for client 360- degree view. • Currently evaluating in-memory data grids for real time risk management as well as several regulatory requirements, like Anti-Money-Laundering and Client Risk Management. 5.Advanced analytics, i.e. visual analytics, columnar databases: • There are several advanced analytics tools in use such as Tableau and Sybase IQ, while currently evaluating Tibco Spotfire, HANA and others.
  • 10. 9BIG DATA WORLD CANADA 2013 Big Data – Impact on Enterprise Information Management •Is the traditional MDM redundant?  By no means; while there are in-memory MDM implementations it rather makes sense to keep the current investment and load to in-memory databases only subsets of MDM data, e.g. client 360-degree view or any other data elements needed for event management, sense and response or other capabilities. •What will happen with the current EDW?  Not much; transactional data will still be an important source for BI. However, the full power of parallel query processing and the parallelism built into hardware should be harvested.  EDWs should be augmented with social data, location data, either directly or via service providers in order to provide the foundation for sentiment analysis and predictive modeling. •Are ETLs tools done?  Depends. This is the sweet spot where vendors are pitching Hadoop. Moreover, is your enterprise ready for Hadoop? Are you ready to move to commodity hardware? Do you have the skills for both commodity hardware and Hadoop? •Time to retire current BI tools (e.g. Cognos, Business Objects, etc.)?  Definitely not; continue to use the current management reports and dash-boards.  Educate business on the new visual analytic tools and let them decide the way forward.  Educate business on the new BI capabilities enabled by in-memory data bases. •However be aware of the new competitor that is building it’s Information Management from scratch and with the proper Big Data technology might compromise your established business advantage!
  • 11. 10BIG DATA WORLD CANADA 2013 Big Data – Organizational challenges •What needs to be done:  In Big Data initiatives business leaders have to take the initiative. The new role of the CIO team is to educate business in Big Data and its opportunities versus defining and leading initiatives.  CIOs have to take a holistic approach to Big Data by considering all Big Data capabilities and define strategies accordingly, instead of focusing on some capabilities like fast ETL loading for which Hadoop is a quick fix.  Adapt the Information Management Strategy to include behavioral oriented data, like social data, as well as location and sensor data.  Change the BI strategy towards commoditization and massive parallel processing.  Big Data requires new skill set for handling Hadoop environments as well as in-memory data and advanced analytics. McKinsey predicts a current shortage of more than a hundred thousand Big Data professionals in the US alone [6]. •Last but not least:  Big Data is an evolution of many technologies around for the last decade or so. Although, with the potential to be a technology disruptor, Big Data is rather an important augmentation to the current technologies and if used properly it can provide significant business benefits as well as competitive advantage.
  • 12. 11BIG DATA WORLD CANADA 2013 Thank you for your time! Questions? attila.barta@bmo.com
  • 13. 12BIG DATA WORLD CANADA 2013 Appendix 1. References 2. Hadoop – a Definition
  • 14. 13BIG DATA WORLD CANADA 2013 References 1. Douglas, Laney "3D Data Management: Controlling Data Volume, Velocity and Variety". Gartner, 2001. 2. Beyer, Mark "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data“, Gartner, 2011. 3. Douglas, Laney "The Importance of 'Big Data': A Definition“, Gartner, 2012. 4. Wallis, Nigel “Big Data in Canada: Challenging Complacency for Competitive Advantage”, IDC, 2012. 5. Gogia, Sanchit “The Big Deal About Big Data For Customer Engagement”, Forrester, 2012. 6. James Manika et al. “Big Data: The next frontier for innovation, competition and productivity”, McKinsey Global Institute, 2011.
  • 15. 14BIG DATA WORLD CANADA 2013 Hadoop – a Definition •Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware. The Hadoop framework transparently provides both reliability and data motion to applications. •Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers. •The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), as well as a number of related projects – including Apache Hive, Apache HBase, and others. •Hadoop is written in the Java programming language and is a top-level Apache project being built and used by a global community of contributors. Hadoop and its related projects (Hive, HBase, Zookeeper, and so on) have many contributors from across the ecosystem. Though Java code is most common, any programming language can be used with "streaming" to implement the "map" and "reduce" parts of the system. Source: Wikipedia