SlideShare ist ein Scribd-Unternehmen logo
1 von 29
BigData Shankar Radhakrishnan July, 2011
Big Data in the News Savings American Health-Care: $300 Billion/Year European Public Sector: €250 Billion/Year Productivity Margins: 60% increase Sources: McKinsey Global Institute
Topics What do we collect today? DBMS Landscape The Disconnect The Need What is BigData? Characteristics Approach Architectural Requirements Techniques Challenges Solutions Issues Deep Dive – Practical Approaches to Big Data Hadoop Aster Data
What do we collect? In 2010, people stored data to fill 60,000 Library of Congress (LoC collected 235TB in Apr/2011) YouTube receives 24hours of video, every minute 5 Billion mobile phones in use in 2010 Tesco (British Retailer) collects 1.5 billion pieces of information to adjust prices and promotions Amazon.com: 30% of sales is out of its recommendation engine Planecast, Mobclix : Track & Target systems promotes contextual promotions A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements Sources: Forrester, The Economist,McKinsey Global Institute
Collect More Business Operations Transactions Registers Gateways Customer Information CRM Product Information Barcodes RFID Web Pages Web Repositories Unstructured Information Social Media Signals Mobile GPS, GeoSpatial
DBMS Solutions Legacy Faster Retrieval Efficient Storage Divide and Access Data Consolidation Broader Tables Access all as a row Fine Grain Access Security Rules and Policies Problems Data Growth When storage cost is not an issue Scalability Issues Performance Issues New types of requirements Deciding what to analyze, when and how? Cost of a change in the subject-area to analyze
The Disconnect Old DBMS vs. New Data Types/Structures Old DBMS vs. New volume Old DBMS vs. New Analysis Old DBMS vs. Data Retention Old DBMS vs. Data Element Striping Old DBMS vs. Data Infrastructure Old DBMS vs. One DB Platform for all
The Need System that can handle high volume data Perform complex operations Scalable Robust Highly Available Fault Tolerant Economic New Approach
Big Data “Tools and techniques to manage different types of data, in high volume, in high velocitywith varied requirements to mine them” Characteristics Size Scale up and scale out: Terabyte, Petabyte … Structure Structured Unstructured : Audio, Video, Text, GeoSpatial Schema Less Structures Stream Torrent of real-time information Operation Massively Parallel Processing (MPP)
Approach Hardware Commodity Hardware Appliance Dynamic Scaling Fault Tolerant Highly Available No constraints on Storage Cloud Virtual Environment, Storage Processing Models In-memory In-database Interfaces/Adapters Workload Management Distributed Data Processing Software Frameworks – Hadoop, MapReduce, Vrije, BOOM, Bloom Open Source Proprietary
Architectural Requirements Integration Framework Development Framework Management Framework Modeling Framework Processing Framework Data Management Framework
Challenges Volumetric Analysis Complexity Streaming Data/Real Time Data Network Topology Infrastructure Pattern-based Strategy
Techniques Controlled and Variate Testing Mining Machine Learning Natural Language Processing (NLP) Cohort Analysis Network or Path Analysis Predictive Models Crowd Sourcing Regression Models Sentiment Analysis Processing Signals Spatial Analytics Visualization Time-series Analysis
Solutions IBM: Infosphere BigInsights, Streams Teradata/Aster Data: nCluster, SQL-MR Frameworks Hadoop MapReduce Infobright* Splunk Cloudera* Cassandra NoSQL, NewSQL Google’s Big Table Appliance Teradata Netezza (IBM) Columnar Databases Vertica (HP) ParAccel Managed Services Available
Issues Latency Faultiness Accuracy ACID Atomicity Consistency Isolation Durability Setup Cost Development Cost Cost-to-fly
Deep Dive Hadoop
Top level Apache project Open source Software Framework - Java Inspired by Google’s white papers onMap/Reduce (MR)Google File System (GFS)Big Table Originally developed to support Apache Nutch Designed Large scale data processing For batch processing For sophisticated analysis To deal with structured and unstructured data DB Architect’s Hadoop : "Heck Another Darn Obscure Open-source Project"
Why Hadoop? Runs on commodity hardware Portability across heterogeneous hardwareand software platforms Shared-nothing architecture Scale hardware when ever you want System compensates for hardware scalingand issues (if any) Run large-scale, high volume data processes Scales well with complex analysis jobs (Hardware) “Failure is an option” Ideal to consolidate data from both new andlegacy data sources Highly Integrable Value to the business
Hadoop Ecosystem HDFS	Hadoop Distributed File System Map/Reduce		Software framework for 			Clustered, Distributed data 			processing ZooKeeper	Scheduler Avro		Data Serialization Chukwa	Data Collection System to			monitor Distributed Systems HBase 		Data storage for distributed			large tables Hive			Data warehouse Pig		High-Level Query Language Scribe		Log Collection UDF			User Defined Functions
Hadoop Flow (Example) Network Storage Web Servers Scribe Oracle MySQL Hadoop Hive DWH MySQL Oracle Apps Feeds
HDFS Hadoop Distributed File System Master/Slave Architecture Runs on commodity hardware Fault Tolerant Handle large volumes of data Provides High Throughput Streaming data-access Simple file coherency model Portable to heterogeneous hardware and software Robust Handles disk failures, replication (& re-replication) Performs cluster rebalancing, data integrity checks
HDFS Architecture Name node ,[object Object]
Maps data-nodesData node ,[object Object]
Handles Data-blocks
Replication,[object Object]
Mapper Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
Reduce Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
Who uses Hadoop?
Deep Dive Aster Data

Weitere ähnliche Inhalte

Was ist angesagt?

Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1RojaT4
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 

Was ist angesagt? (20)

Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Big data 101
Big data 101Big data 101
Big data 101
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 

Andere mochten auch

Rural urban partnerships - An integrated approach to economic development, by...
Rural urban partnerships - An integrated approach to economic development, by...Rural urban partnerships - An integrated approach to economic development, by...
Rural urban partnerships - An integrated approach to economic development, by...OECD Governance
 
Group 3 week 10 presentation
Group 3 week 10 presentationGroup 3 week 10 presentation
Group 3 week 10 presentationjessieawy
 
Rural urban partnership for economic development
Rural urban partnership for economic developmentRural urban partnership for economic development
Rural urban partnership for economic developmentOECD Governance
 
Rural-urban Partnerships and Quality of Life. OECD
Rural-urban Partnerships and Quality of Life. OECD Rural-urban Partnerships and Quality of Life. OECD
Rural-urban Partnerships and Quality of Life. OECD OECD Governance
 
Adriana Allen: A PERIscope on the PERI-urban
Adriana Allen: A PERIscope on the PERI-urbanAdriana Allen: A PERIscope on the PERI-urban
Adriana Allen: A PERIscope on the PERI-urbanSTEPS Centre
 
Transformationcoaching16 jan-16
Transformationcoaching16 jan-16Transformationcoaching16 jan-16
Transformationcoaching16 jan-16Ghazali Md. Noor
 
TASK Resilient Coders (1)
TASK Resilient Coders (1)TASK Resilient Coders (1)
TASK Resilient Coders (1)Kamala Loscocco
 
ONTAP - Paddling Techniques Part 2
ONTAP - Paddling Techniques Part 2ONTAP - Paddling Techniques Part 2
ONTAP - Paddling Techniques Part 2WRDSB
 
Ms. Jordan First Day of School
Ms. Jordan First Day of SchoolMs. Jordan First Day of School
Ms. Jordan First Day of SchoolCierca Jordan
 
Regional Outlook 2016 - Policy Highlights
Regional Outlook 2016 - Policy HighlightsRegional Outlook 2016 - Policy Highlights
Regional Outlook 2016 - Policy HighlightsOECD Governance
 
OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016
OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016
OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016OECD Governance
 
Cv davide rota_ita
Cv davide rota_itaCv davide rota_ita
Cv davide rota_itaDavide Rota
 

Andere mochten auch (20)

Rural urban partnerships - An integrated approach to economic development, by...
Rural urban partnerships - An integrated approach to economic development, by...Rural urban partnerships - An integrated approach to economic development, by...
Rural urban partnerships - An integrated approach to economic development, by...
 
Group 3 week 10 presentation
Group 3 week 10 presentationGroup 3 week 10 presentation
Group 3 week 10 presentation
 
Rural urban partnership for economic development
Rural urban partnership for economic developmentRural urban partnership for economic development
Rural urban partnership for economic development
 
Rural-urban Partnerships and Quality of Life. OECD
Rural-urban Partnerships and Quality of Life. OECD Rural-urban Partnerships and Quality of Life. OECD
Rural-urban Partnerships and Quality of Life. OECD
 
Adriana Allen: A PERIscope on the PERI-urban
Adriana Allen: A PERIscope on the PERI-urbanAdriana Allen: A PERIscope on the PERI-urban
Adriana Allen: A PERIscope on the PERI-urban
 
Rural urban linkages and public private partnership [compatibility mode]
Rural urban linkages and  public private partnership [compatibility mode]Rural urban linkages and  public private partnership [compatibility mode]
Rural urban linkages and public private partnership [compatibility mode]
 
Integrating Rural Urban Linkages for Regional Development in the Province of ...
Integrating Rural Urban Linkages for Regional Development in the Province of ...Integrating Rural Urban Linkages for Regional Development in the Province of ...
Integrating Rural Urban Linkages for Regional Development in the Province of ...
 
SSD
SSDSSD
SSD
 
Baby Jaws!
Baby Jaws!Baby Jaws!
Baby Jaws!
 
sport tourism
sport tourismsport tourism
sport tourism
 
Transformationcoaching16 jan-16
Transformationcoaching16 jan-16Transformationcoaching16 jan-16
Transformationcoaching16 jan-16
 
TASK Resilient Coders (1)
TASK Resilient Coders (1)TASK Resilient Coders (1)
TASK Resilient Coders (1)
 
Rural Urban Relationship
Rural Urban RelationshipRural Urban Relationship
Rural Urban Relationship
 
ONTAP - Paddling Techniques Part 2
ONTAP - Paddling Techniques Part 2ONTAP - Paddling Techniques Part 2
ONTAP - Paddling Techniques Part 2
 
Ms. Jordan First Day of School
Ms. Jordan First Day of SchoolMs. Jordan First Day of School
Ms. Jordan First Day of School
 
sensor jarak
sensor jaraksensor jarak
sensor jarak
 
Lampu otomatis
Lampu otomatisLampu otomatis
Lampu otomatis
 
Regional Outlook 2016 - Policy Highlights
Regional Outlook 2016 - Policy HighlightsRegional Outlook 2016 - Policy Highlights
Regional Outlook 2016 - Policy Highlights
 
OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016
OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016
OECD Regional Outlook 2016 - Presentation, Brussels, Belgium 11 October 2016
 
Cv davide rota_ita
Cv davide rota_itaCv davide rota_ita
Cv davide rota_ita
 

Ähnlich wie Bigdata

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalknzhang
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 

Ähnlich wie Bigdata (20)

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Hdf5
Hdf5Hdf5
Hdf5
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 

Kürzlich hochgeladen

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Bigdata

  • 2. Big Data in the News Savings American Health-Care: $300 Billion/Year European Public Sector: €250 Billion/Year Productivity Margins: 60% increase Sources: McKinsey Global Institute
  • 3. Topics What do we collect today? DBMS Landscape The Disconnect The Need What is BigData? Characteristics Approach Architectural Requirements Techniques Challenges Solutions Issues Deep Dive – Practical Approaches to Big Data Hadoop Aster Data
  • 4. What do we collect? In 2010, people stored data to fill 60,000 Library of Congress (LoC collected 235TB in Apr/2011) YouTube receives 24hours of video, every minute 5 Billion mobile phones in use in 2010 Tesco (British Retailer) collects 1.5 billion pieces of information to adjust prices and promotions Amazon.com: 30% of sales is out of its recommendation engine Planecast, Mobclix : Track & Target systems promotes contextual promotions A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements Sources: Forrester, The Economist,McKinsey Global Institute
  • 5. Collect More Business Operations Transactions Registers Gateways Customer Information CRM Product Information Barcodes RFID Web Pages Web Repositories Unstructured Information Social Media Signals Mobile GPS, GeoSpatial
  • 6. DBMS Solutions Legacy Faster Retrieval Efficient Storage Divide and Access Data Consolidation Broader Tables Access all as a row Fine Grain Access Security Rules and Policies Problems Data Growth When storage cost is not an issue Scalability Issues Performance Issues New types of requirements Deciding what to analyze, when and how? Cost of a change in the subject-area to analyze
  • 7. The Disconnect Old DBMS vs. New Data Types/Structures Old DBMS vs. New volume Old DBMS vs. New Analysis Old DBMS vs. Data Retention Old DBMS vs. Data Element Striping Old DBMS vs. Data Infrastructure Old DBMS vs. One DB Platform for all
  • 8. The Need System that can handle high volume data Perform complex operations Scalable Robust Highly Available Fault Tolerant Economic New Approach
  • 9. Big Data “Tools and techniques to manage different types of data, in high volume, in high velocitywith varied requirements to mine them” Characteristics Size Scale up and scale out: Terabyte, Petabyte … Structure Structured Unstructured : Audio, Video, Text, GeoSpatial Schema Less Structures Stream Torrent of real-time information Operation Massively Parallel Processing (MPP)
  • 10. Approach Hardware Commodity Hardware Appliance Dynamic Scaling Fault Tolerant Highly Available No constraints on Storage Cloud Virtual Environment, Storage Processing Models In-memory In-database Interfaces/Adapters Workload Management Distributed Data Processing Software Frameworks – Hadoop, MapReduce, Vrije, BOOM, Bloom Open Source Proprietary
  • 11. Architectural Requirements Integration Framework Development Framework Management Framework Modeling Framework Processing Framework Data Management Framework
  • 12. Challenges Volumetric Analysis Complexity Streaming Data/Real Time Data Network Topology Infrastructure Pattern-based Strategy
  • 13. Techniques Controlled and Variate Testing Mining Machine Learning Natural Language Processing (NLP) Cohort Analysis Network or Path Analysis Predictive Models Crowd Sourcing Regression Models Sentiment Analysis Processing Signals Spatial Analytics Visualization Time-series Analysis
  • 14. Solutions IBM: Infosphere BigInsights, Streams Teradata/Aster Data: nCluster, SQL-MR Frameworks Hadoop MapReduce Infobright* Splunk Cloudera* Cassandra NoSQL, NewSQL Google’s Big Table Appliance Teradata Netezza (IBM) Columnar Databases Vertica (HP) ParAccel Managed Services Available
  • 15. Issues Latency Faultiness Accuracy ACID Atomicity Consistency Isolation Durability Setup Cost Development Cost Cost-to-fly
  • 17. Top level Apache project Open source Software Framework - Java Inspired by Google’s white papers onMap/Reduce (MR)Google File System (GFS)Big Table Originally developed to support Apache Nutch Designed Large scale data processing For batch processing For sophisticated analysis To deal with structured and unstructured data DB Architect’s Hadoop : "Heck Another Darn Obscure Open-source Project"
  • 18. Why Hadoop? Runs on commodity hardware Portability across heterogeneous hardwareand software platforms Shared-nothing architecture Scale hardware when ever you want System compensates for hardware scalingand issues (if any) Run large-scale, high volume data processes Scales well with complex analysis jobs (Hardware) “Failure is an option” Ideal to consolidate data from both new andlegacy data sources Highly Integrable Value to the business
  • 19. Hadoop Ecosystem HDFS Hadoop Distributed File System Map/Reduce Software framework for Clustered, Distributed data processing ZooKeeper Scheduler Avro Data Serialization Chukwa Data Collection System to monitor Distributed Systems HBase Data storage for distributed large tables Hive Data warehouse Pig High-Level Query Language Scribe Log Collection UDF User Defined Functions
  • 20. Hadoop Flow (Example) Network Storage Web Servers Scribe Oracle MySQL Hadoop Hive DWH MySQL Oracle Apps Feeds
  • 21. HDFS Hadoop Distributed File System Master/Slave Architecture Runs on commodity hardware Fault Tolerant Handle large volumes of data Provides High Throughput Streaming data-access Simple file coherency model Portable to heterogeneous hardware and software Robust Handles disk failures, replication (& re-replication) Performs cluster rebalancing, data integrity checks
  • 22.
  • 23.
  • 25.
  • 26. Mapper Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
  • 27. Reduce Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
  • 30. Aster Data Now part of Teradata Massively Parallel SQL Layer on MR (MapReduce) In-Database Analytics Appliance vs. Software Stack Model Cloud Options nPath and Statistical Options Data Integration
  • 32. Thank You "You either scale to where your customer base takes you or you die" Jim Starkey – Founder and CTO NimbusDB "Our philosophy is to build infrastructure using thebest tools available for the job and we areconstantly evaluating better ways to do thingswhen and where it matters."Facebook "In any year we probably generate more data than the Walt Disney Co. did in the first 80 years of existence" Bud Albers - Disney