SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Blackvard Management Consulting
Introduction to Big Data & Hadoop
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Agenda
What Will Be Covered:
1. What Is Big Data?
2. Business Intelligence
3. Big Data Analytics
4. Existing Database Technology
5. What is Hadoop?
6. Data Warehouse Appliances vs. Hadoop
7. Hadoop & SAP HANA
8. Business Use Cases
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Is “Big Data” Simply “Too Much Data?”
 Is the term “Big Data” just about “big?”
 Big Data is often called “new black gold”
with a lot of undiscovered insights.
http://dilbert.com/strip/2006-11-11
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
3 Vs of
Big Data
- Tera- and Petabytes
- Transactions
- Tables, files
- Structured
- Semistructured
- Unstructerd
- Batch
- (Near) Real-Time
- Streams
Variety Velocity
Volume
Source: Philip Russom: BIG DATA ANALYTICS – TDWI Best Practice Report
The term “BIG DATA” is defined by the steadily increasing need for VARIETY, VOLUME,
and processing VELOCITY of available data.
Big Data is about 3 “V’s”:
Volume: massive amounts of data to
process with:
Velocity: the speed at which the data
comes into the system
Variety: the variety of structuredness
increases
Big Data Defined
VARIETY:
Most data is unstructured.
Partner data,
reference data,
CRM, ERP, Production,
Finance, HR,
Procurement,
Machine sensor data,
etc.
Documents
email,
Contact center
calls,
Presentations,
security images,
Medical scans
unstructuredstructured
internal
BI + data connections
Social media monitoring
tools
Search,
ECM
Traditional BI
Social media content
channel content
external
Business Intelligence & Variety
In Business Intelligence (BI) systems, data is mostly internal & structured.
Including social media content, digitalization, and a global supply chain
requirement shift to support the broadening variety of structuredness.
Business Intelligence is the
set of techniques and tools
required for the
transformation of raw data
into meaningful and useful
information for business an
alysis purposes.
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Analytical appliances
• Tightly integrated hardware-
software combinations
• Analytical bundles: Standalone SW
+ HW combinations
Analytical services
• Systems are stored in an off-site
hosted environment or public cloud
• File-based analytical system
File-based analytical system
• Hadoop
• NoSQL (although it’s not File-
based in a common sense)
Analytical databases
• Software-only analytical platforms
• Most Multi Parallel Processing
(MPP), Columnar and In-Memory
databases
Big Data
Analytics
Big Data Analytics
Big Data Analytics Platforms can be classified into four major categories:
1) Analytical Databases
2) Analytical Appliances
3) Analytical Services
4) File-based analytical systems ( Main Focus)
Several platforms embrace existing database technologies in order to optimize
analytical applications on large data volumes.
Technology Description Vendor / Product
Massively parallel processing (MPP)
Row-based databases designed to scale out on a cluster of
commodity servers.
Also known as “shared-nothing”-architecture
Teradata Active Data Warehouse, Greenplum (EMC),
Microsoft Parallel Data Warehouse, Aster Data
(Teradata), Kognitio
Columnar Databases
DBMS that store data in columns, not rows.
Support high data compression and analytical query performance
Sybase IQ (SAP), ParAccel, Infobright, Vertica (HP),
1010data
Analytical appliances Pre-configured hardware-software systems
Netezza (IBM), Teradata Appliances, Oracle Exadata,
Greenplum Data Computing Appliance (EMC)
In-memory databases Systems load data into memory to execute complex queries SAP HANA, Cognos TM1 (IBM), QlikView, Membase
Distributed file-based systems
Systems designed for storing, manipulating and querying large
volumes of unstructured and semi-structured data.
Hadoop (Apache, Cloudera, MapR, IBM, HortonWorks),
Apache Hive, Apache Pig
Analytical services (Cloud)
Analytical platforms delivered as hosted or public-cloud-based
services
1010data, Kognitio
Nonrelational (NoSQL)
Nonrelational databases optimized for querying unstructured and
structured data
MongoDB, Apache Cassandra, Apache Hbase
Complex Event Processing (CEP)
Systems optimized for calculation and correlation of large volumes
of discrete events and application of conditions
IBM, Tibco, Streambase, Sybase (Aleri), Informatica
Source: Wayne Eckerson: BIG DATA ANALYTICS: PROFILING THE USE OF ANALYTICAL PLATFORMS IN USER ORGANIZATIONS
Existing Database Technology
• Google published a paper, which described
• a MapReduce algorithm for processing large
amounts of data
• Doug Cutting, who worked at Yahoo, read
that paper and initiated Hadoop
• Hadoop was the name of the yellow elephant
toy from his son
• Hadoop become an Apache top level project,
• which is supported, among others, by
Facebook, IBM & Yahoo
• Open source project
• Written in Java
• Optimized to handle:
• Massive amounts of data through parallelism
• Using inexpensive commodity hardware
• A variety of data (structured, unstructured, semi-
structured)
• Great performance (on large data volumes)
• Reliability provided through replication
• Not for OLTP, not for OLAP, good for Big Data (1)
FactsHistory
(1)
OLTP: Online Transaction Processing (CRM, ERP)
OLAP: Online Analytical Processing (Data Mining, complex queries over multidimensional data)
What is Hadoop?
Hadoop
Core  HDFS stores data on
several nodes in the cluster,
with the goal of providing
greater bandwidth across
the cluster as well as higher
reliability.
Hadoop consists mainly of two components:
Hadoop Distributed
Filesystem
 It is a computational
paradigm called
Map/Reduce, which
takes an application and
divides it into multiple
fragments of work, each
of which can be
executed on any node in
the cluster.
Hadoop MapReduce
http://mohamednabeel.blogspot.de/2011/03/starting-sub-sandwitch-business.html
Block A Block B Block C
File1.txt
Data
Node 1
Data
Node 2
Data
Node 3
Data
Node 4
Block C
Block ABlock B Block ABlock C
Block A Block B Block B Block C
MAP
1
1
1
1
1
1
1
SORT REDUCE
3
1
1
1
2
2
2
Give every
shape the
value of1
Sort
the
Shapes
For each
shape
type,
count the
vaules
Hadoop Core
Data Warehouse Appliances
▪ Expensive dedicated HW
▪ Built for performance
▪ Designed for high volumes (eg. 10s of TB)
▪ High availability
▪ Initially developed using Relational Database Systems like
Oracle, IBM DB2
▪ Designed for modeled and structured data
▪ Business As Usual ways to design, build and deliver
▪ Teradata, Exadata, Netezza, HANA, ... are examples
Hadoop Infrastructure
▪ Uses commodity PCs
▪ Built for extreme scalability
▪ Designed for extreme volumes (10s of PB and more)
▪ Very high availability
▪ Initially developed for web ranking
▪ Hadoop = Data is distributed over many machines
▪ MapReduce = Computing is distributed and executed
where data is (grid solution)
Data Warehouse Appliances vs. Hadoop
“Classical” Data Warehouse Appliances (DWH) differ in the technical basis and the use of
them, compared to a Hadoop infrastructure. This does not mean that DWH Appliances are
now irrelevant, but rather a combination of both is the basis for being future ready.
 Data import/export (Flume, Sqoop)
 Libraries, algorithms (Mahout, Lzo compression)
 Tools – monitoring, user experience (Hue, Ambari, White
Elephant)
 Data stores (HBase, HCatalog)
 Workflow management, job scheduling (Oozie,
Cascading)
 Data querying (Hive, Pig, Impala, Drill)
 Cluster provisioning & management (Whirr)
 … many more
The Hadoop ecosystem uses several tools to solve individual tasks. For example, Sqoop or
Flume are used to import and export data from/into Hadoop or Hive, as data querying tools.
Most of these tools are combined into distributions Cloudera, Pivotal or Hortonworks to
reduce the managing overhead for customers. Again, a combination of both is the basis for
being future ready.
Hadoop Provides Rich Ecosystems For Tasks
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Predictive
Analytics
Reporting,
Dashboarding
Ad-hoc-
Analysis
Data
Exploration
Which data describes my business?
What chances and risks in
business do we see?
Why did our business
run in this way?
How did our
business run?
Customers Get In Touch w/ Big Data
Customers get in touch with Big Data through: visualization
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
 Find answers to the questions:
• What chances and risks in
business do we see?
• How can we classify our
customers?
• How will sales be in the next
two weeks?
• Based on predictive
algorithms
 Find answers to the questions:
• Why did our business run in
this way?
• What were the key points?
• Can we find obvious „gaps“
in our business?
 No or less pre-defined reports
 Visualization of data and
corellation is important
 Only historical data
 Find answers to the questions:
• How did our business run
the last X periods?
• How well did it run?
 Dashboards focus on
management visualization
Reporting, Dashboarding Ad-hoc-Analysis Predictive Analytics
Visualization
 The three types of visualization are as follows:
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Leverage The Power Of Hadoop w/ HANA®
HANA®
1) http://www.sap.com/solution/big-data/software/platform.html
SAP promotes (1) Hadoop as THE solution to improve business performance
in real-time, and to leverage the power of Big Data.
HANA® (High Performance ANalytic Appliance) is an SAP product which
allows for rapid analysis of large amounts of data in real-time.
Using Hadoop with HANA®, allows users to take advantage of powerful In-
Memory Analysis, as well as gain insights to undiscovered data (Machine
sensors, Geo-information, social media, etc.) and mine the new black gold
(2).
2) http://www.wired.com/2013/02/is-big-data-the-new-black-gold/
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Existing Sources
(ERP, CRM, Logs)
Emerging Sources
(Sensors, Geo, Unstructured)
Sources
Data System
HANA
Applications
NON-SAP
Enterprise
Applications
Mobile
SAP HANA® & Hadoop Integration
Hadoop can be integrated in an SAP HANA® -System to extend the power of In-Memory
computing and the flexibility of SAP HANA® to easy-to-use and cost efficient storage.
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Existing Sources
(ERP, CRM, Logs)
Emerging Sources
(Sensors, Geo, Unstructured)
Sources
Data System
HANA
Applications
NON-SAP
Enterprise
Applications Mobile
1
2
34
4 Main Uses For Hadoop With SAP HANA®
1
2
3
4
Data Analytics
Flexible Data Store
Simple Database
Processing Engine
• Mining data held in Hadoop for business
intelligence & analytics.
• Using Hadoop as a flexible store of data
captured from multiple sources, including SAP
and non-SAP software, enterprise software &
externally sourced data.
• Using Hadoop as a simple database for storing &
retrieving data in very large data sets.
• Using computation engine in Hadoop to execute
business logic or other business processes.
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Telecommunications Data traffic, retail patterns, geo-location data...
Utilities Smart meter, consumer behavior, network loads.
Cities
People movement, emissions, produce flows,
demographics.
Transportation Product flow, route optimization, hazard location.
Business Use Cases Across All Sectors
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
Have Additional Questions?
Want To Set Up A Consultation?
Email: info@blackvard.com
Require A Consultation?
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
 Technical project lead and ABAP architect responsible for quality in technical scope and budget in a global
roll-out of SAP Logistics applications (SAP LE / LO)
 Conducting multiple SAP ABAP and SAP HANA® trainings for various US companies
 Implementation of a standard SAP software solution for Spend Management within SAP AG & ARIBA (annual
spend volume 3 Bill. EUR) which can be used in all SAP systems
 Improved claims management using SAP FS-CM which is generating annual savings of 15 Mio € for a huge
German public healthcare organization
 Implemented a global solution for procurement processes at BMW AG using SAP SRM / B2B
 Blueprinting and implementation of SAP software for banking credit cancelations for VOLKSWAGEN
Key Achievements of Blackvard Management Consulting in Previous Projects
What We’ve Accomplished
Blackvard Management Consultants
www.blackvard.comCopyright © Blackvard Management Consulting – All rights reserved
Short Bio:
Lukas M. Dietzsch is managing director at Blackvard
Management Consulting, LLC. He is holding a Master’s
degree in Information Technology and is an experienced IT
solution architect and project lead.
His strong background in adapting to requirements and
standards in different industries and on various platforms are
valuable assets for Blackvard customers.
He is repeatedly commended by customers for driving
efficient solutions for complex problems in globally
distributed team environments and meeting tough deadlines.
For further information please visit:
www.blackvard.com
Lukas M. Dietzsch
lukas@blackvard.com
Copyright © Blackvard Management Consulting- All rights reserved www.blackvard.com
Managing Director
Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
An overview of current and previous customers:
Customers That Recommend Blackvard

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data DiscoveryHarald Erb
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 

Was ist angesagt? (20)

Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 

Andere mochten auch

SAP Persistence - Creating Source Code Automatically
SAP Persistence - Creating Source Code AutomaticallySAP Persistence - Creating Source Code Automatically
SAP Persistence - Creating Source Code AutomaticallyBlackvard
 
Scrum vs Kanban
Scrum vs KanbanScrum vs Kanban
Scrum vs KanbanBlackvard
 
Structuring An ABAP Report In An Optimal Way
Structuring An ABAP Report In An Optimal WayStructuring An ABAP Report In An Optimal Way
Structuring An ABAP Report In An Optimal WayBlackvard
 
Introduction Into SAP Fiori
Introduction Into SAP FioriIntroduction Into SAP Fiori
Introduction Into SAP FioriBlackvard
 
Agile Software Development with Scrum – Introduction
Agile Software Development with Scrum – IntroductionAgile Software Development with Scrum – Introduction
Agile Software Development with Scrum – IntroductionBlackvard
 
Predictive Analytics 3.1 – Adding a Dataset & Visualization
Predictive Analytics 3.1 – Adding a Dataset & VisualizationPredictive Analytics 3.1 – Adding a Dataset & Visualization
Predictive Analytics 3.1 – Adding a Dataset & VisualizationBlackvard
 
HANA XS Web Service
HANA XS Web ServiceHANA XS Web Service
HANA XS Web ServiceBlackvard
 
How to Create "Hello, World!" in Fiori
How to Create "Hello, World!" in FioriHow to Create "Hello, World!" in Fiori
How to Create "Hello, World!" in FioriBlackvard
 
Introduction to Design Thinking
Introduction to Design ThinkingIntroduction to Design Thinking
Introduction to Design ThinkingBlackvard
 
Consuming Data With HANA XS
Consuming Data With HANA XSConsuming Data With HANA XS
Consuming Data With HANA XSBlackvard
 

Andere mochten auch (10)

SAP Persistence - Creating Source Code Automatically
SAP Persistence - Creating Source Code AutomaticallySAP Persistence - Creating Source Code Automatically
SAP Persistence - Creating Source Code Automatically
 
Scrum vs Kanban
Scrum vs KanbanScrum vs Kanban
Scrum vs Kanban
 
Structuring An ABAP Report In An Optimal Way
Structuring An ABAP Report In An Optimal WayStructuring An ABAP Report In An Optimal Way
Structuring An ABAP Report In An Optimal Way
 
Introduction Into SAP Fiori
Introduction Into SAP FioriIntroduction Into SAP Fiori
Introduction Into SAP Fiori
 
Agile Software Development with Scrum – Introduction
Agile Software Development with Scrum – IntroductionAgile Software Development with Scrum – Introduction
Agile Software Development with Scrum – Introduction
 
Predictive Analytics 3.1 – Adding a Dataset & Visualization
Predictive Analytics 3.1 – Adding a Dataset & VisualizationPredictive Analytics 3.1 – Adding a Dataset & Visualization
Predictive Analytics 3.1 – Adding a Dataset & Visualization
 
HANA XS Web Service
HANA XS Web ServiceHANA XS Web Service
HANA XS Web Service
 
How to Create "Hello, World!" in Fiori
How to Create "Hello, World!" in FioriHow to Create "Hello, World!" in Fiori
How to Create "Hello, World!" in Fiori
 
Introduction to Design Thinking
Introduction to Design ThinkingIntroduction to Design Thinking
Introduction to Design Thinking
 
Consuming Data With HANA XS
Consuming Data With HANA XSConsuming Data With HANA XS
Consuming Data With HANA XS
 

Ähnlich wie Introduction To Big Data & Hadoop

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 

Ähnlich wie Introduction To Big Data & Hadoop (20)

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 

Kürzlich hochgeladen

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Introduction To Big Data & Hadoop

  • 1. Blackvard Management Consulting Introduction to Big Data & Hadoop Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com
  • 2. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Agenda What Will Be Covered: 1. What Is Big Data? 2. Business Intelligence 3. Big Data Analytics 4. Existing Database Technology 5. What is Hadoop? 6. Data Warehouse Appliances vs. Hadoop 7. Hadoop & SAP HANA 8. Business Use Cases
  • 3. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Is “Big Data” Simply “Too Much Data?”  Is the term “Big Data” just about “big?”  Big Data is often called “new black gold” with a lot of undiscovered insights. http://dilbert.com/strip/2006-11-11
  • 4. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com 3 Vs of Big Data - Tera- and Petabytes - Transactions - Tables, files - Structured - Semistructured - Unstructerd - Batch - (Near) Real-Time - Streams Variety Velocity Volume Source: Philip Russom: BIG DATA ANALYTICS – TDWI Best Practice Report The term “BIG DATA” is defined by the steadily increasing need for VARIETY, VOLUME, and processing VELOCITY of available data. Big Data is about 3 “V’s”: Volume: massive amounts of data to process with: Velocity: the speed at which the data comes into the system Variety: the variety of structuredness increases Big Data Defined
  • 5. VARIETY: Most data is unstructured. Partner data, reference data, CRM, ERP, Production, Finance, HR, Procurement, Machine sensor data, etc. Documents email, Contact center calls, Presentations, security images, Medical scans unstructuredstructured internal BI + data connections Social media monitoring tools Search, ECM Traditional BI Social media content channel content external Business Intelligence & Variety In Business Intelligence (BI) systems, data is mostly internal & structured. Including social media content, digitalization, and a global supply chain requirement shift to support the broadening variety of structuredness. Business Intelligence is the set of techniques and tools required for the transformation of raw data into meaningful and useful information for business an alysis purposes.
  • 6. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Analytical appliances • Tightly integrated hardware- software combinations • Analytical bundles: Standalone SW + HW combinations Analytical services • Systems are stored in an off-site hosted environment or public cloud • File-based analytical system File-based analytical system • Hadoop • NoSQL (although it’s not File- based in a common sense) Analytical databases • Software-only analytical platforms • Most Multi Parallel Processing (MPP), Columnar and In-Memory databases Big Data Analytics Big Data Analytics Big Data Analytics Platforms can be classified into four major categories: 1) Analytical Databases 2) Analytical Appliances 3) Analytical Services 4) File-based analytical systems ( Main Focus)
  • 7. Several platforms embrace existing database technologies in order to optimize analytical applications on large data volumes. Technology Description Vendor / Product Massively parallel processing (MPP) Row-based databases designed to scale out on a cluster of commodity servers. Also known as “shared-nothing”-architecture Teradata Active Data Warehouse, Greenplum (EMC), Microsoft Parallel Data Warehouse, Aster Data (Teradata), Kognitio Columnar Databases DBMS that store data in columns, not rows. Support high data compression and analytical query performance Sybase IQ (SAP), ParAccel, Infobright, Vertica (HP), 1010data Analytical appliances Pre-configured hardware-software systems Netezza (IBM), Teradata Appliances, Oracle Exadata, Greenplum Data Computing Appliance (EMC) In-memory databases Systems load data into memory to execute complex queries SAP HANA, Cognos TM1 (IBM), QlikView, Membase Distributed file-based systems Systems designed for storing, manipulating and querying large volumes of unstructured and semi-structured data. Hadoop (Apache, Cloudera, MapR, IBM, HortonWorks), Apache Hive, Apache Pig Analytical services (Cloud) Analytical platforms delivered as hosted or public-cloud-based services 1010data, Kognitio Nonrelational (NoSQL) Nonrelational databases optimized for querying unstructured and structured data MongoDB, Apache Cassandra, Apache Hbase Complex Event Processing (CEP) Systems optimized for calculation and correlation of large volumes of discrete events and application of conditions IBM, Tibco, Streambase, Sybase (Aleri), Informatica Source: Wayne Eckerson: BIG DATA ANALYTICS: PROFILING THE USE OF ANALYTICAL PLATFORMS IN USER ORGANIZATIONS Existing Database Technology
  • 8. • Google published a paper, which described • a MapReduce algorithm for processing large amounts of data • Doug Cutting, who worked at Yahoo, read that paper and initiated Hadoop • Hadoop was the name of the yellow elephant toy from his son • Hadoop become an Apache top level project, • which is supported, among others, by Facebook, IBM & Yahoo • Open source project • Written in Java • Optimized to handle: • Massive amounts of data through parallelism • Using inexpensive commodity hardware • A variety of data (structured, unstructured, semi- structured) • Great performance (on large data volumes) • Reliability provided through replication • Not for OLTP, not for OLAP, good for Big Data (1) FactsHistory (1) OLTP: Online Transaction Processing (CRM, ERP) OLAP: Online Analytical Processing (Data Mining, complex queries over multidimensional data) What is Hadoop?
  • 9. Hadoop Core  HDFS stores data on several nodes in the cluster, with the goal of providing greater bandwidth across the cluster as well as higher reliability. Hadoop consists mainly of two components: Hadoop Distributed Filesystem  It is a computational paradigm called Map/Reduce, which takes an application and divides it into multiple fragments of work, each of which can be executed on any node in the cluster. Hadoop MapReduce http://mohamednabeel.blogspot.de/2011/03/starting-sub-sandwitch-business.html Block A Block B Block C File1.txt Data Node 1 Data Node 2 Data Node 3 Data Node 4 Block C Block ABlock B Block ABlock C Block A Block B Block B Block C MAP 1 1 1 1 1 1 1 SORT REDUCE 3 1 1 1 2 2 2 Give every shape the value of1 Sort the Shapes For each shape type, count the vaules Hadoop Core
  • 10. Data Warehouse Appliances ▪ Expensive dedicated HW ▪ Built for performance ▪ Designed for high volumes (eg. 10s of TB) ▪ High availability ▪ Initially developed using Relational Database Systems like Oracle, IBM DB2 ▪ Designed for modeled and structured data ▪ Business As Usual ways to design, build and deliver ▪ Teradata, Exadata, Netezza, HANA, ... are examples Hadoop Infrastructure ▪ Uses commodity PCs ▪ Built for extreme scalability ▪ Designed for extreme volumes (10s of PB and more) ▪ Very high availability ▪ Initially developed for web ranking ▪ Hadoop = Data is distributed over many machines ▪ MapReduce = Computing is distributed and executed where data is (grid solution) Data Warehouse Appliances vs. Hadoop “Classical” Data Warehouse Appliances (DWH) differ in the technical basis and the use of them, compared to a Hadoop infrastructure. This does not mean that DWH Appliances are now irrelevant, but rather a combination of both is the basis for being future ready.
  • 11.  Data import/export (Flume, Sqoop)  Libraries, algorithms (Mahout, Lzo compression)  Tools – monitoring, user experience (Hue, Ambari, White Elephant)  Data stores (HBase, HCatalog)  Workflow management, job scheduling (Oozie, Cascading)  Data querying (Hive, Pig, Impala, Drill)  Cluster provisioning & management (Whirr)  … many more The Hadoop ecosystem uses several tools to solve individual tasks. For example, Sqoop or Flume are used to import and export data from/into Hadoop or Hive, as data querying tools. Most of these tools are combined into distributions Cloudera, Pivotal or Hortonworks to reduce the managing overhead for customers. Again, a combination of both is the basis for being future ready. Hadoop Provides Rich Ecosystems For Tasks
  • 12. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Predictive Analytics Reporting, Dashboarding Ad-hoc- Analysis Data Exploration Which data describes my business? What chances and risks in business do we see? Why did our business run in this way? How did our business run? Customers Get In Touch w/ Big Data Customers get in touch with Big Data through: visualization
  • 13. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com  Find answers to the questions: • What chances and risks in business do we see? • How can we classify our customers? • How will sales be in the next two weeks? • Based on predictive algorithms  Find answers to the questions: • Why did our business run in this way? • What were the key points? • Can we find obvious „gaps“ in our business?  No or less pre-defined reports  Visualization of data and corellation is important  Only historical data  Find answers to the questions: • How did our business run the last X periods? • How well did it run?  Dashboards focus on management visualization Reporting, Dashboarding Ad-hoc-Analysis Predictive Analytics Visualization  The three types of visualization are as follows:
  • 14. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Leverage The Power Of Hadoop w/ HANA® HANA® 1) http://www.sap.com/solution/big-data/software/platform.html SAP promotes (1) Hadoop as THE solution to improve business performance in real-time, and to leverage the power of Big Data. HANA® (High Performance ANalytic Appliance) is an SAP product which allows for rapid analysis of large amounts of data in real-time. Using Hadoop with HANA®, allows users to take advantage of powerful In- Memory Analysis, as well as gain insights to undiscovered data (Machine sensors, Geo-information, social media, etc.) and mine the new black gold (2). 2) http://www.wired.com/2013/02/is-big-data-the-new-black-gold/
  • 15. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Existing Sources (ERP, CRM, Logs) Emerging Sources (Sensors, Geo, Unstructured) Sources Data System HANA Applications NON-SAP Enterprise Applications Mobile SAP HANA® & Hadoop Integration Hadoop can be integrated in an SAP HANA® -System to extend the power of In-Memory computing and the flexibility of SAP HANA® to easy-to-use and cost efficient storage.
  • 16. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Existing Sources (ERP, CRM, Logs) Emerging Sources (Sensors, Geo, Unstructured) Sources Data System HANA Applications NON-SAP Enterprise Applications Mobile 1 2 34 4 Main Uses For Hadoop With SAP HANA® 1 2 3 4 Data Analytics Flexible Data Store Simple Database Processing Engine • Mining data held in Hadoop for business intelligence & analytics. • Using Hadoop as a flexible store of data captured from multiple sources, including SAP and non-SAP software, enterprise software & externally sourced data. • Using Hadoop as a simple database for storing & retrieving data in very large data sets. • Using computation engine in Hadoop to execute business logic or other business processes.
  • 17. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Telecommunications Data traffic, retail patterns, geo-location data... Utilities Smart meter, consumer behavior, network loads. Cities People movement, emissions, produce flows, demographics. Transportation Product flow, route optimization, hazard location. Business Use Cases Across All Sectors
  • 18. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com Have Additional Questions? Want To Set Up A Consultation? Email: info@blackvard.com Require A Consultation?
  • 19. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com  Technical project lead and ABAP architect responsible for quality in technical scope and budget in a global roll-out of SAP Logistics applications (SAP LE / LO)  Conducting multiple SAP ABAP and SAP HANA® trainings for various US companies  Implementation of a standard SAP software solution for Spend Management within SAP AG & ARIBA (annual spend volume 3 Bill. EUR) which can be used in all SAP systems  Improved claims management using SAP FS-CM which is generating annual savings of 15 Mio € for a huge German public healthcare organization  Implemented a global solution for procurement processes at BMW AG using SAP SRM / B2B  Blueprinting and implementation of SAP software for banking credit cancelations for VOLKSWAGEN Key Achievements of Blackvard Management Consulting in Previous Projects What We’ve Accomplished
  • 20. Blackvard Management Consultants www.blackvard.comCopyright © Blackvard Management Consulting – All rights reserved Short Bio: Lukas M. Dietzsch is managing director at Blackvard Management Consulting, LLC. He is holding a Master’s degree in Information Technology and is an experienced IT solution architect and project lead. His strong background in adapting to requirements and standards in different industries and on various platforms are valuable assets for Blackvard customers. He is repeatedly commended by customers for driving efficient solutions for complex problems in globally distributed team environments and meeting tough deadlines. For further information please visit: www.blackvard.com Lukas M. Dietzsch lukas@blackvard.com Copyright © Blackvard Management Consulting- All rights reserved www.blackvard.com Managing Director
  • 21. Copyright © Blackvard Management Consulting – All rights reserved www.blackvard.com An overview of current and previous customers: Customers That Recommend Blackvard

Hinweis der Redaktion

  1. Is the term „BigData“ just about „big“? BigData is often called „new black gold“ with a lot of undiscovered insights
  2. Big Data is about 3 „V‘s“: Volume: massive amount of data to handle with Velocity: the speed at which the data come into the system Variety: The variety of structuredness increases
  3. In traditional Business Intelligence (BI) Systems data are mostly internal and structured. With the rise of social media content, digitalization and a global supply chain requirement shift to support the broadening variety of structuredness Business intelligence (BI) is the set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes.
  4. Big Data Analytics Platform can be classified in four major categories: Analytical Databases Analytical Appliances Analytical Services File-based analytical systems Focus of these slides is on 4) File-based analytical systems
  5. „classical“ Data Warehouse appliances differ in the technical basis and the use of them compared to a Hadoop infrastructure But that does not mean DWH Appliances are not needed any more a combination of both is the basis for beeing future ready
  6. The Hadoop ecosystem uses several tools to solve individual tasks. For example Sqoop or Flume do import and export data from/into Hadoop or Hive as an data querying tool. Most of these tools are combined into distributions Cloudera, Pivotal or Hortonworks to reduce the managing overhead for the customers
  7. Hadoop can be integrated in a SAP HANA-System to extend the power of In-Memory computing and the flexibility of SAP HANA to easy to use and cost efficient storage
  8. 1) data analytics – Mining data held in Hadoop for business intelligence and analytics 2) Flexible data store – Using Hadoop as a flexible store of data captured from multiple sources, including SAP and non-SAP software, enterprise software, and externally sourced data 3) Simple database – Using Hadoop as a simple database for storing and retrieving data in very large data sets 4) Processing engine – Using the computation engine in Hadoop to execute business logic or some other process