SlideShare ist ein Scribd-Unternehmen logo
1 von 21
De-Mystifying Big Data
Prasad Mavuduri
American Institute of Big Data Professionals
RIGHTFOCUSANDONTARGET
Agenda
Analyze &
Define
• Progression of Analytics
• The new phenomenon - Big Data
• Big Data Defined
Technology
Discussion
• Big Data Technology – Hadoop
• Big Data – Big Savings – Hadoop
Use Cases
• What can we solve with Big Data – example
• What is next ? Where are the opportunities
RIGHTFOCUSANDONTARGET
Progression of Analytics
Structured – Known
Data
Traditional – ETL, Data Marts,
DW, RDBMS
Growth – Normal
Incremental – Archive
Less Cross Functional Integration
More Tactical than
Strategic
Sizes GBs to TBs
Data Architects vs.
Functional
So Far…..
RIGHTFOCUSANDONTARGET
The new phenomenon - Big Data
Growing Pains ??!!!
Big Data ?!!!
Is it just data ?
RIGHTFOCUSANDONTARGET
The new phenomenon - Big Data
1. No to “fit-for-all” but Yes to “fit-for-purpose”
2. Proliferation of data sources – variety of data
3. Proliferation of volume of data
4. The demand for the speed (velocity) of data
5. Demand for high value & accuracy ( veracity)
of info
6. Massive Parallel processing
7. Commodity servers vs. Specialized servers
DATA DRIVEN BUSINESS
is
THE SMART BUSINESS
RIGHTFOCUSANDONTARGET
Big Data Definition
• High volume of
data which is
growing every
year more than
50 % every year
• High Speed
Streaming,
Machine
generated data
etc
• Different Data
sources In-the-
enterprise and
external data
around the
enterprise data
• Data collected
taking huge
memory (typically
100 TB or more)
where RDBMS is
inefficient
Value Variety
VolumeVelocity
VERACITY
Meaningful
RIGHTFOCUSANDONTARGET
Big Data Definition
VERACITY
Big Data is the new art and science, using Massive
Parallel Processing (MPP) technology, of
collection, storage, processing, distribution, and
analysis of data with any of the attributes – high
volume, high velocity, high variety to extract high
value and greater accuracy (veracity).
IBM Says, BIG DATA means
1.Volume (Terabytes --‐> Zettabytes) 2. Variety (Structured --‐>
Semi--‐structured --‐> Unstructured)
3. Velocity (Batch --‐> Streaming Data)
RIGHTFOCUSANDONTARGET Big Data Technologies – Typical Stack
Big Data Infrastructure
Data Manipulation & Management
Data Analysis & Mining
Predictive & Prescriptive Analysis
Process Automation& Decision Support Systems
Big Data Stack
RIGHTFOCUSANDONTARGET Big Data Technologies – SMAQ
User-friendly Analytics
1. PIG ( simple Query Language), 2. HIVE ( Similar to SQL)
3. Cascading ( Workflow) 4. Mahout ( Machine Learning)
5. Zookeeper (Coordination Service)
Data Distribution & Management across nodes in Batch Mode
1. Hadoop MapReduce
2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M),
Strom, HPCC (LexisNexis)
Distributed Non-Relational
1. HBase ( columnar DB)
2. HDFS – Hadoop Distributed File System
Query
Map Reduce
Storage
SMAQ Stack
RIGHTFOCUSANDONTARGET
Big Data – Big Savings – Economics
ROI on Big Data Approach (with Hadoop)
Source : American Institute for Analytics
1TB of RDBMS TCO
$37,000 - Traditional RDBMS
$2,000 only !!!! Hadoop
Source :American Institute for Analytics
RIGHTFOCUSANDONTARGET
Where is the market on Big Data
Infrastructure / Framework / Analytics software
Horizontal Solutions like EDW etc
HealthCare
RetailIndustry
Government/
Publicsector
Education&
HumanCapital
HealthSciences
/Genomics
Telecommunicat
ions/Services
Energy&
Utilities
E-Commerce/
Marketing
Media&
Entertainment
Source: IDC 2011
0
5
10
15
20
2010 2011 2012 2013 2014 2015
Big Data Market In $B
Current
State
RIGHTFOCUSANDONTARGET
Web Logs
Images &
Videos
Social
Media
Documents
Structured
Data
Big Data /
Hadoop
etc.
Existing
EDW
Prescriptive
Predictive
Reporting
OLAP
Modeling
Integrated Big data Implementation - Architecture
Coexistence of Big Data with existing EDW
Connectors
/ Adapters
RIGHTFOCUSANDONTARGET
Web Logs
Images &
Videos
Social Media
Documents
Structured
Data
Big Data /
Hadoop
etc.
Prescriptive
Predictive
Reporting
OLAP
Modeling
Pure Big data Implementation - Architecture
Pure Big Data
Connectors
/ Adapters
Barriers
Disruption to existing Analytics ?!
Roadmap / Methodology
Certainty of costs
HADOOP / Big Table can replace traditional EDWs !!
RIGHTFOCUSANDONTARGET
Big Data Landscape
RIGHTFOCUSANDONTARGET
Big Data Landscape
RIGHTFOCUSANDONTARGET
Applied BIG Data
RIGHTFOCUSANDONTARGET
BIG Data Opportunities
Some Gaps & opportunities
•Real-time Analysis ( may be use SAP HANA etc !!)
•User interface (UI) frameworks
•App development Big Data on Cloud (multi-Tenancy)
•Security & Data Governance
•Cross Application Integration
•Industry Standards
RIGHTFOCUSANDONTARGET
AIBDP – Contribution to Big Data
RIGHTFOCUSANDONTARGET
Business Focus
 Identify data needs
Identify Business Issues
 Layout data dependencies
between functions
 Resolve Competing priorities
 Clearly lay out the levels of
data, cross-functional
requirements
Stakeholder Focus
 Identify the stake holders
 Align best practices with the
project
 Plan out the
objectives, scope, and timelines
Identify the
KPIs, Reports, Dashboards, Predictiv
e & Prescriptive Analysis to be
delivered
Technology Focus
 Synergies in current technology
 Take stock of existing “technology
assets” towards Big Data
Assess your current capabilities and
architecture
 Identify the resources and minimize
“specialties” to exploit synergies with
existing resource pool
 Lay out a development methodology
to streamline delivery
Process Focus
 Establish clear data flows
 Identify Data Governance
execution process –
People, Processes, Mechanisms
 Design the process to be more
Business focused than IT
 Clearly establish measures to
achieve –
Accuracy, Repeatability, Agility, and
accountability ( reconcilability)
Our Big Data Strategy at a glance
RIGHTFOCUSANDONTARGET
Our Execution Approach – AGILE methodology
Agile Approach to reduce risks
• Close coordination
between the customer and
the developer
• Small incremental steps
makes testing easier and
manageable & avoid
surprises
• Early recovery from
expectation mismatch
• Clarity on Design
understanding and regular
communication with user.
• Early warning about risks
regular status reports.
• Full Knowledge Transfer
RIGHTFOCUSANDONTARGET
Thank You !!
Please contact us
for any enquiries at:
Prasad Mavuduri
prasad@aibdp.org
408 828 9909
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and GovernanceIMC Institute
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesInSemble
 
An exploration in analysis and visualization
An exploration in analysis and visualizationAn exploration in analysis and visualization
An exploration in analysis and visualizationDorai Thodla
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligencehktripathy
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 

Was ist angesagt? (20)

Data Preparation of Data Science
Data Preparation of Data ScienceData Preparation of Data Science
Data Preparation of Data Science
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big data 101
Big data 101Big data 101
Big data 101
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Big Data
Big DataBig Data
Big Data
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and Governance
 
Big Data
Big DataBig Data
Big Data
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
An exploration in analysis and visualization
An exploration in analysis and visualizationAn exploration in analysis and visualization
An exploration in analysis and visualization
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

Andere mochten auch

Big data connection overview by aibdp.org
Big data connection overview by aibdp.orgBig data connection overview by aibdp.org
Big data connection overview by aibdp.orgAIBDP
 
Everything is Possible
Everything is PossibleEverything is Possible
Everything is PossibleJames Puliatte
 
презентация млдв
презентация млдвпрезентация млдв
презентация млдвbestpediatr
 
1 to 1 Presentation 2015
1 to 1 Presentation 20151 to 1 Presentation 2015
1 to 1 Presentation 2015James Puliatte
 
Executive Profile of Roland Heller
Executive Profile of Roland HellerExecutive Profile of Roland Heller
Executive Profile of Roland HellerRoland Heller
 
15783 components vc_rs
15783 components vc_rs15783 components vc_rs
15783 components vc_rsShalya Kr
 
Hospital Management System
Hospital Management SystemHospital Management System
Hospital Management Systemidowume
 

Andere mochten auch (15)

Big data connection overview by aibdp.org
Big data connection overview by aibdp.orgBig data connection overview by aibdp.org
Big data connection overview by aibdp.org
 
Everything is Possible
Everything is PossibleEverything is Possible
Everything is Possible
 
Enfermedades
EnfermedadesEnfermedades
Enfermedades
 
презентация млдв
презентация млдвпрезентация млдв
презентация млдв
 
1 to 1 Presentation 2015
1 to 1 Presentation 20151 to 1 Presentation 2015
1 to 1 Presentation 2015
 
Prose poétique
Prose poétiqueProse poétique
Prose poétique
 
Executive Profile of Roland Heller
Executive Profile of Roland HellerExecutive Profile of Roland Heller
Executive Profile of Roland Heller
 
Beauty Equipment
Beauty EquipmentBeauty Equipment
Beauty Equipment
 
Inspire Action
Inspire ActionInspire Action
Inspire Action
 
15783 components vc_rs
15783 components vc_rs15783 components vc_rs
15783 components vc_rs
 
Hospital Management System
Hospital Management SystemHospital Management System
Hospital Management System
 
Subjonctif (1)
Subjonctif (1)Subjonctif (1)
Subjonctif (1)
 
Human body (1)
Human body (1)Human body (1)
Human body (1)
 
Subjonctif
SubjonctifSubjonctif
Subjonctif
 
Negation
NegationNegation
Negation
 

Ähnlich wie "Demystifying Big Data by AIBDP.org

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA Zeeshan Khan
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 

Ähnlich wie "Demystifying Big Data by AIBDP.org (20)

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Bigdata
BigdataBigdata
Bigdata
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Thilga
ThilgaThilga
Thilga
 
Big Data SE vs. SE for Big Data
Big Data SE vs. SE for Big DataBig Data SE vs. SE for Big Data
Big Data SE vs. SE for Big Data
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 

Kürzlich hochgeladen

Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...ssuserf63bd7
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdfChris Skinner
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesDoe Paoro
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...Hector Del Castillo, CPM, CPMM
 
BAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptxBAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptxran17april2001
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfDanny Diep To
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterJamesConcepcion7
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxRakhi Bazaar
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Associazione Digital Days
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFChandresh Chudasama
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...Operational Excellence Consulting
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 

Kürzlich hochgeladen (20)

Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic Experiences
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
 
BAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptxBAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptx
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare Newsletter
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptxThe Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
 
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDF
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
WAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdfWAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdf
 

"Demystifying Big Data by AIBDP.org

  • 1. De-Mystifying Big Data Prasad Mavuduri American Institute of Big Data Professionals
  • 2. RIGHTFOCUSANDONTARGET Agenda Analyze & Define • Progression of Analytics • The new phenomenon - Big Data • Big Data Defined Technology Discussion • Big Data Technology – Hadoop • Big Data – Big Savings – Hadoop Use Cases • What can we solve with Big Data – example • What is next ? Where are the opportunities
  • 3. RIGHTFOCUSANDONTARGET Progression of Analytics Structured – Known Data Traditional – ETL, Data Marts, DW, RDBMS Growth – Normal Incremental – Archive Less Cross Functional Integration More Tactical than Strategic Sizes GBs to TBs Data Architects vs. Functional So Far…..
  • 4. RIGHTFOCUSANDONTARGET The new phenomenon - Big Data Growing Pains ??!!! Big Data ?!!! Is it just data ?
  • 5. RIGHTFOCUSANDONTARGET The new phenomenon - Big Data 1. No to “fit-for-all” but Yes to “fit-for-purpose” 2. Proliferation of data sources – variety of data 3. Proliferation of volume of data 4. The demand for the speed (velocity) of data 5. Demand for high value & accuracy ( veracity) of info 6. Massive Parallel processing 7. Commodity servers vs. Specialized servers DATA DRIVEN BUSINESS is THE SMART BUSINESS
  • 6. RIGHTFOCUSANDONTARGET Big Data Definition • High volume of data which is growing every year more than 50 % every year • High Speed Streaming, Machine generated data etc • Different Data sources In-the- enterprise and external data around the enterprise data • Data collected taking huge memory (typically 100 TB or more) where RDBMS is inefficient Value Variety VolumeVelocity VERACITY Meaningful
  • 7. RIGHTFOCUSANDONTARGET Big Data Definition VERACITY Big Data is the new art and science, using Massive Parallel Processing (MPP) technology, of collection, storage, processing, distribution, and analysis of data with any of the attributes – high volume, high velocity, high variety to extract high value and greater accuracy (veracity). IBM Says, BIG DATA means 1.Volume (Terabytes --‐> Zettabytes) 2. Variety (Structured --‐> Semi--‐structured --‐> Unstructured) 3. Velocity (Batch --‐> Streaming Data)
  • 8. RIGHTFOCUSANDONTARGET Big Data Technologies – Typical Stack Big Data Infrastructure Data Manipulation & Management Data Analysis & Mining Predictive & Prescriptive Analysis Process Automation& Decision Support Systems Big Data Stack
  • 9. RIGHTFOCUSANDONTARGET Big Data Technologies – SMAQ User-friendly Analytics 1. PIG ( simple Query Language), 2. HIVE ( Similar to SQL) 3. Cascading ( Workflow) 4. Mahout ( Machine Learning) 5. Zookeeper (Coordination Service) Data Distribution & Management across nodes in Batch Mode 1. Hadoop MapReduce 2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M), Strom, HPCC (LexisNexis) Distributed Non-Relational 1. HBase ( columnar DB) 2. HDFS – Hadoop Distributed File System Query Map Reduce Storage SMAQ Stack
  • 10. RIGHTFOCUSANDONTARGET Big Data – Big Savings – Economics ROI on Big Data Approach (with Hadoop) Source : American Institute for Analytics 1TB of RDBMS TCO $37,000 - Traditional RDBMS $2,000 only !!!! Hadoop Source :American Institute for Analytics
  • 11. RIGHTFOCUSANDONTARGET Where is the market on Big Data Infrastructure / Framework / Analytics software Horizontal Solutions like EDW etc HealthCare RetailIndustry Government/ Publicsector Education& HumanCapital HealthSciences /Genomics Telecommunicat ions/Services Energy& Utilities E-Commerce/ Marketing Media& Entertainment Source: IDC 2011 0 5 10 15 20 2010 2011 2012 2013 2014 2015 Big Data Market In $B Current State
  • 12. RIGHTFOCUSANDONTARGET Web Logs Images & Videos Social Media Documents Structured Data Big Data / Hadoop etc. Existing EDW Prescriptive Predictive Reporting OLAP Modeling Integrated Big data Implementation - Architecture Coexistence of Big Data with existing EDW Connectors / Adapters
  • 13. RIGHTFOCUSANDONTARGET Web Logs Images & Videos Social Media Documents Structured Data Big Data / Hadoop etc. Prescriptive Predictive Reporting OLAP Modeling Pure Big data Implementation - Architecture Pure Big Data Connectors / Adapters Barriers Disruption to existing Analytics ?! Roadmap / Methodology Certainty of costs HADOOP / Big Table can replace traditional EDWs !!
  • 17. RIGHTFOCUSANDONTARGET BIG Data Opportunities Some Gaps & opportunities •Real-time Analysis ( may be use SAP HANA etc !!) •User interface (UI) frameworks •App development Big Data on Cloud (multi-Tenancy) •Security & Data Governance •Cross Application Integration •Industry Standards
  • 19. RIGHTFOCUSANDONTARGET Business Focus  Identify data needs Identify Business Issues  Layout data dependencies between functions  Resolve Competing priorities  Clearly lay out the levels of data, cross-functional requirements Stakeholder Focus  Identify the stake holders  Align best practices with the project  Plan out the objectives, scope, and timelines Identify the KPIs, Reports, Dashboards, Predictiv e & Prescriptive Analysis to be delivered Technology Focus  Synergies in current technology  Take stock of existing “technology assets” towards Big Data Assess your current capabilities and architecture  Identify the resources and minimize “specialties” to exploit synergies with existing resource pool  Lay out a development methodology to streamline delivery Process Focus  Establish clear data flows  Identify Data Governance execution process – People, Processes, Mechanisms  Design the process to be more Business focused than IT  Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability) Our Big Data Strategy at a glance
  • 20. RIGHTFOCUSANDONTARGET Our Execution Approach – AGILE methodology Agile Approach to reduce risks • Close coordination between the customer and the developer • Small incremental steps makes testing easier and manageable & avoid surprises • Early recovery from expectation mismatch • Clarity on Design understanding and regular communication with user. • Early warning about risks regular status reports. • Full Knowledge Transfer
  • 21. RIGHTFOCUSANDONTARGET Thank You !! Please contact us for any enquiries at: Prasad Mavuduri prasad@aibdp.org 408 828 9909 Q & A

Hinweis der Redaktion

  1. Progression of Analytics 3 minutes The new phenomenon - Big Data 4 minutesBig Data Defined 3 minutes 2 minutesWhere is the Technology 5 minutesWhat can we solve with Big Data – example Case Studies 5 minutesWhat is next ? Where are the opportunities ? 10 minutes
  2. Internal Information –Known questions and answers - Known structures, structured data types, known volumes, mostly transactional dataMaster data is very well defined - Storage Typical Data Warehouses, Data Marts using batch processing & traditional ETL, and relational databasesData growth is incremental and regular archivalJust reporting, a little bit of mining – mostly descriptive - predictive analysis is very light Cross functional integration of data is very limited, very structured around customers, services & products, logistics etc.Functional & Technical responsibilities are very clearly demarcated. Mostly Data engineers / architects at the backend supporting business analysts / users.Most of the reports are just a measurement of their tactics – more supporting the strategy than inducing a strategyData sizes are in Giga and Terra byte range, becomes inefficient and costly after a certain size limit.
  3. Narrow & focused business missions – not “fit-for-all” but “fit-for-purpose” The need to discover more - Facts, Relationships, Indicators, Patterns, Trends, Pointers which could not probably be discovered before by using cross integration of data from various sourcesNeed to capture & store data and just not collect Proliferation of data sources – variety of dataMulti-Dimensional Data Streaming Data Geo Spatial DataSocial Networking Data Internal Data (RDBMS) Video & Image dataText data (logs etc) Time series Data GenomicsProliferation of volume of data ( crossed to Petabytes and above)Internet / intranet Social networks ( FB & Twitter) Mobile DevicesSmart Home devices Smart systems (Utilities etc) Media & entertainmentThe demand for the speed (velocity) of the data collected, understood, processed, and distributedAccessibility - where when, who, and how Time value – Real Time or notIncreased speeds of consumption Increased speeds of data generation Demand for high value & accuracy ( veracity) of information Advent of Technology with Massive Parallel processing - Availability of Hadoop / Map reduce kind of open source & packaged technologiesAffordability of infrastructure – Commodity servers vs. Specialized serversHadoop enables a computing solution that is:Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.Cost effective– Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.
  4. The word of the hour is “SMART” !! Smart Business – Targeted value proposition Businesses are under pressure to maximize their investments ( focused approach, not one-fit-all methodology)Targeted value proposition Targeted advertisement, Tailored menu, Focused Initiatives, Individualized Attention, Non-impersonal Messaging, Efficient Governance, Greater AccuracyNarrow & focused business missions – not “fit-for-all” but “fit-for-purpose” The need to discover more - Facts, Relationships, Indicators, Patterns, Trends, Pointers which could not probably be discovered before by using cross integration of data from various sourcesNeed to capture & store data and just not collect Proliferation of data sources – variety of dataMulti-Dimensional Data Streaming Data Geo Spatial DataSocial Networking Data Internal Data (RDBMS) Video & Image dataText data (logs etc) Time series Data GenomicsProliferation of volume of data ( crossed to Petabytes and above)Internet / intranet Social networks ( FB & Twitter) Mobile DevicesSmart Home devices Smart systems (Utilities etc) Media & entertainmentThe demand for the speed (velocity) of the data collected, understood, processed, and distributedAccessibility - where when, who, and how Time value – Real Time or notIncreased speeds of consumption Increased speeds of data generation Demand for high value & accuracy ( veracity) of information Advent of Technology with Massive Parallel processing - Availability of Hadoop / Map reduce kind of open source & packaged technologiesAffordability of infrastructure – Commodity servers vs. Specialized serversHadoop enables a computing solution that is:Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.Cost effective– Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.
  5. Targeted advertisement, Tailored menu, focused initiatives, individualized attention, non-impersonal messaging, efficient governance, greater accuracyBusinesses want to gain competitive advantage by being able to take action based on timely, relevant, complete, and accurate information, ratherthan one-fit-for all solutionsThere is immense volume, variety and velocity of data that is produced today is new information, facts, relationships, indicators and pointers, that either could not be practically discovered in the past, or simply did not exist before
  6. Targeted advertisement, Tailored menu, focused initiatives, individualized attention, non-impersonal messaging, efficient governance, greater accuracyBusinesses want to gain competitive advantage by being able to take action based on timely, relevant, complete, and accurate information, ratherthan one-fit-for all solutionsThere is immense volume, variety and velocity of data that is produced today is new information, facts, relationships, indicators and pointers, that either could not be practically discovered in the past, or simply did not exist before
  7. Market has just started picking upThere is a lot of gap in vertical solutionsBiggest gap is in Big Data ServicesHardware & Software components seem to have been available already
  8. Adapting to Real-time Analysis ( may be use HANA !!)Development of industry standardsDevelopment of Universal Schema for metadata and catalogingTools to support security & data governanceSupport for Cloud-ification (multi-tenancy)Support for data lineageFramework for cross-application integrationSupport for testingAutomated & configurable monitoring and management console User interface (UI) frameworks
  9. Business Focus Identify data needs for strategic business functions Identify Business Issues that need to be solved by big Data Layout data dependencies between functions Resolve Competing priorities Clearly lay out the levels of data, cross-functional requirementsTechnology Focus Identify the right technology to align with the current landscape for synergies in technology Take stock of existing “technology assets” towards Big DataAssess your current capabilities and architecture to support your goals, and select the deployment strategy that best fits your Big Data questions Identify the resources and minimize “specialties” to exploit synergies with existing resource pool Lay out a development methodology to streamline deliveryStakeholder Focus Clearly identify the stake holders at all levels of data consumption Present best practices and align them with the project Plan out the objectives, scope, and timelinesIdentify the KPIs, Reports, Dashboards, Predictive & Prescriptive Analysis to be deliveredProcess Focus Establish clear data flows from collection of data to consumption of data Identify Data Governance execution process – People, Processes, Mechanisms Design the process to be more Business focused than IT Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability)