Bigdata

•Als PPTX, PDF herunterladen•

0 gefällt mir•1,029 views

In 10 slides explains bigData. It separates the hype from reality about BigData. Explains what it is and what was already from before. No big numbers, no big claims : just plain simple truth. The "red pill"

Technologie

Big Data
in 10
What’s real and what’s fluff
Abhishek Pamecha
Mar-2013

What is Big Data
• It is all about data
– But not about “how much”

– But about correlations and increased reach

BigData Architecture
It influences or changes your
• Data source choices
• Data storing choices
• Data analyzing/mining approaches

It helps
• Address highly focused use cases
• Correlate more data sources
• address scale and fault tolerance issues

Caution!

BigData is not a “substitute” for existing warehousing practices.
It complements existing practices.

Architectures – Data sources
• Traditional DW • BigData adds

– Production DB – Log files

– Dictionaries – Social graphs

– ETL/ELT pipelines – Streaming data

– External Data marts

Architectures – Data Storage
• Traditional DW • BigData adds

– Production DB – Distributed file storage
• Flatten hierarchies
• Resolved references – Distributed hash maps

– Columnar representations
– ROLAP or MOLAP databases
• Star schema
– Graph data bases
• Materialized views
• Virtual data marts
– Document collections
• Partitioned tables

– Still relational – Other NoSQL variants

Architectures – Analytic approaches
• Traditional DW • BigData adds

– Production DB – Distributed file storage
• Flatten hierarchies • Map reduce frameworks and chaining
• Resolved references
– Pre-generate results
– Distributed hash maps
• Single key predominant
– ROLAP databases
• Star schema
– Multidimensional queries
– Columnar representations
• Materialized views • Extracts select columns per row
– adhoc explorations on subsets
• Still relational – Graph data bases
• Virtual data marts • Navigate links
– adhoc explorations on subsets
• Partitioned tables
– Document collections
• Simplified schemas

– Other NoSQL approaches
• Stream pattern matching and pipelining

Big Data Architectures
Pros and Cons
• Pros

– Incorporate low value and social data in analysis
– Increase analysis reach to non-structured data
– Correlate across data sources on the same platform
– Very strong in their sweet spots.
– Efficiency in terms of
• data movement volume,
• scale
• fault tolerance and
• responsiveness.

• Cons

– Not relational. Gives up on some of the relational advantages.
• Joins
• Aggregations etc.
– Little standards – Non portable solutions
– Less support with end-user tools and applications [ though growing ]
– Not a replacement to DW but just an extension to it.
– Incompatible with different classes of use-cases. Have sweet spots.
– Heterogeneous setup in Development and Operations.

Challenges
• Architectural
– “Big” data management
– Data consistency
– Read heavy or write heavy
– Scaling
– Distributed deployment

• Functional
– data quality
– Problem set choice

• Organizational
– Data backed decisions
– Going overboard
– SLAs and operations management
– Data Privacy

Weitere ähnliche Inhalte

Was ist angesagt?

Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8amBarrett Peterson

Oracle hyperion essbaseTimothy J. Simkiss, CPA

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni

Evolved BIwith SQL Server 2012Andrew Brust

Big data architectures and the data lakeJames Serra

Should I move my database to the cloud?James Serra

Oltp vs olapMr. Fmhyudin

Is the traditional data warehouse dead?James Serra

Webinar - Introduction to Azure Data LakeJosh Lane

2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel AbdellaouiStatsCommunications

Data Vault Vs Data LakeCalum Miller

Intro to Big Data and NoSQLDon Demcsak

Sql Saturday Costa Rica-SSAS Tabular ModelJOSE AHIAS LOPEZ PORTILLO

NoSQL Architecture OverviewChristopher Foot

Emergent Distributed Data Storagehybrid cloud

Azure Analysis Services (Azure Bootcamp 2018)Turner Kunkel

Azure Data Lake Intro (SQLBits 2016)Michael Rys

Web miningSwarnaLatha177

Vertica Analytics Database general overviewStratebi

Introduction to Microsoft SQL Server 2008 R2 Analysis ServiceQuang Nguyễn Bá

Was ist angesagt? (20)

Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am

Oracle hyperion essbase

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...

Evolved BIwith SQL Server 2012

Big data architectures and the data lake

Should I move my database to the cloud?

Oltp vs olap

Is the traditional data warehouse dead?

Webinar - Introduction to Azure Data Lake

2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui

Data Vault Vs Data Lake

Intro to Big Data and NoSQL

Sql Saturday Costa Rica-SSAS Tabular Model

NoSQL Architecture Overview

Emergent Distributed Data Storage

Azure Analysis Services (Azure Bootcamp 2018)

Azure Data Lake Intro (SQLBits 2016)

Web mining

Vertica Analytics Database general overview

Introduction to Microsoft SQL Server 2008 R2 Analysis Service

Ähnlich wie Bigdata

Choosing the Right Big Data Tools for the Job - A Polyglot ApproachDATAVERSITY

Drill njhug -19 feb2013MapR Technologies

Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Cloudera, Inc.

Apache DrillTed Dunning

Gilbane Boston 2011 big dataPeter O'Kelly

Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel

How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences

NoSql - mayank singhMayank Singh

Evolution of Distributed Database Technologies in the Digital eraVishal Puri

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin

Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan

Software architecture & design patterns for MS CRM Developers sebedatalabs

Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Krishnan Parasuraman

SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti

No Sql MovementAjit Koti

Apache Drill at ApacheCon2014Neeraja Rentachintala

No SQL- The Future Of Data StorageBethmi Gunasekara

The Microsoft BigData StoryLynn Langit

Anti-social DatabasesWilliam LaForest

BigData, NoSQL & ElasticSearchSanura Hettiarachchi

Ähnlich wie Bigdata (20)

Choosing the Right Big Data Tools for the Job - A Polyglot Approach

Drill njhug -19 feb2013

Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...

Apache Drill

Gilbane Boston 2011 big data

Oracle Week 2016 - Modern Data Architecture

How to use Big Data and Data Lake concept in business using Hadoop and Spark...

NoSql - mayank singh

Evolution of Distributed Database Technologies in the Digital era

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...

Big data Intro - Presentation to OCHackerz Meetup Group

Software architecture & design patterns for MS CRM Developers

Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...

SQL, NoSQL, BigData in Data Architecture

No Sql Movement

Apache Drill at ApacheCon2014

No SQL- The Future Of Data Storage

The Microsoft BigData Story

Anti-social Databases

BigData, NoSQL & ElasticSearch

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

GenCyber Cyber Security Day PresentationMichael W. Hawkins

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

A Call to Action for Generative AI in 2024Results

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Presentation on how to chat with PDF using ChatGPT code interpreter

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

08448380779 Call Girls In Friends Colony Women Seeking Men

GenCyber Cyber Security Day Presentation

The Codex of Business Writing Software for Real-World Solutions 2.pptx

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Breaking the Kubernetes Kill Chain: Host Path Mount

How to Troubleshoot Apps for the Modern Connected Worker

Exploring the Future Potential of AI-Enabled Smartphone Processors

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

A Domino Admins Adventures (Engage 2024)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Driving Behavioral Change for Information Management through Data-Driven Gree...

Automating Google Workspace (GWS) & more with Apps Script

A Call to Action for Generative AI in 2024

Bigdata

1. Big Data in 10 What’s real and what’s fluff Abhishek Pamecha Mar-2013

2. What is Big Data • It is all about data – But not about “how much” – But about correlations and increased reach

3. BigData Architecture It influences or changes your • Data source choices • Data storing choices • Data analyzing/mining approaches It helps • Address highly focused use cases • Correlate more data sources • address scale and fault tolerance issues

4. Caution! BigData is not a “substitute” for existing warehousing practices. It complements existing practices.

5. Architectures – Data sources • Traditional DW • BigData adds – Production DB – Log files – Dictionaries – Social graphs – ETL/ELT pipelines – Streaming data – External Data marts

6. Architectures – Data Storage • Traditional DW • BigData adds – Production DB – Distributed file storage • Flatten hierarchies • Resolved references – Distributed hash maps – Columnar representations – ROLAP or MOLAP databases • Star schema – Graph data bases • Materialized views • Virtual data marts – Document collections • Partitioned tables – Still relational – Other NoSQL variants

7. Architectures – Analytic approaches • Traditional DW • BigData adds – Production DB – Distributed file storage • Flatten hierarchies • Map reduce frameworks and chaining • Resolved references – Pre-generate results – Distributed hash maps • Single key predominant – ROLAP databases • Star schema – Multidimensional queries – Columnar representations • Materialized views • Extracts select columns per row – adhoc explorations on subsets • Still relational – Graph data bases • Virtual data marts • Navigate links – adhoc explorations on subsets • Partitioned tables – Document collections • Simplified schemas – Other NoSQL approaches • Stream pattern matching and pipelining

8. Big Data Architectures Pros and Cons • Pros – Incorporate low value and social data in analysis – Increase analysis reach to non-structured data – Correlate across data sources on the same platform – Very strong in their sweet spots. – Efficiency in terms of • data movement volume, • scale • fault tolerance and • responsiveness. • Cons – Not relational. Gives up on some of the relational advantages. • Joins • Aggregations etc. – Little standards – Non portable solutions – Less support with end-user tools and applications [ though growing ] – Not a replacement to DW but just an extension to it. – Incompatible with different classes of use-cases. Have sweet spots. – Heterogeneous setup in Development and Operations.

9. Challenges • Architectural – “Big” data management – Data consistency – Read heavy or write heavy – Scaling – Distributed deployment • Functional – data quality – Problem set choice • Organizational – Data backed decisions – Going overboard – SLAs and operations management – Data Privacy

10. Thank you!

Bigdata

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Bigdata

Ähnlich wie Bigdata (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Bigdata