SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Big Data and You
2015
May
Edition
Objectives
This document is designed to introduce big Data
and Analytics . Instead of being deep dive
technical paper or product portfolio details,
friendly educational presentation (easily and
quickly read) for specialists, architects, PMs and
managers (*). One simple goal (but complex and
time consuming exercise): is you read this paper,
you learn something and then you would like to get
more details to become an expert. Yes, You can
Big Data
Table of Contents
1. Introduction
2. Definition
3. BI principles
4. Chronology
5. Hadoop I
6. Hadoop II
7. Hadoop Ecosystem
8. BI vs Big Data
9. Hadoop patterns
10. Hadoop Market
Introduction
2012 was the big data marketing buzz, 2013 was the big
data technical enablement, 2014 was the big data projects.
Now European customers are massively deploying big data
(and still analytics) projects. It is time to become an expert
to guide our customers and talk with Big Data ecosystem
to fill the Big Data skills gap
(*) This paper doesn’t pretend to be exhaustive on the Big Data subject, nor it is intended to recommend precise and specific architecture for architects,
recommend performance and technical details for specialists or marketing campaign. It doesn’t assume, or require any (or few) knowledge of Big Data
11. BD&A vendors
12. Competition
13. In Memory
14. Streams
15. BigInsights
16. Architecture
17. Positioning
18. Why Power ?
19. Contacts
20. New !
Author # Christophe.menichetti@fr.ibm.com
# 1
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Introduction Definition
What is Data Analysis ? Why Analysing Data ?
Analysis of data is a process of inspecting, cleaning, transforming,
and modelling data with the goal of discovering useful information,
suggesting conclusions, and supporting decision-making.
Data analysis has multiple facets and approaches, encompassing
diverse techniques under a variety of names, in different business,
science, and social science domains, such as :
Business Intelligence/Analytics
Data Mining / predictive Tools
Big Data
Data integration/ Data visualisation
And so on …
IT technologies and computer sciences are evolving. Yesterday,
when IBM, Honeywell, Sperry, ICL, Xerox,Digital or Olivetti were
the IT leaders, CPU and Memory were the key differentiators.
Today, when IBM, Google,SAP, Oracle are the IT leaders, the
ultimate differentiator is being able to make more informed
choices with confidence, to anticipate and shape business
outcomes.
As company and industry leaders, you absolutely need deeper
insight from their information, to beat your competitors :
• Which customers are thinking of leaving?
• Which transactions are fraudulent?
• Detect life-threatening conditions in time to intervene
Let’s make it simpler – An example
Analytics = transforming
data into (sexy)
information to make
(intelligent) decision
Weather Forecast : You should decide
which boot you’ll take to go to Paris.
You are not expert at all (temperature,
pressure, cyclone = RAW data) but you
can decide based on weather map
(report/analysis)
!message : Data is the new oil requiring Mining, Refining and Delivering
BI Principles Chronology Hadoop I Hadoop II
Big data and You
# 2
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Definition
What is Business Intelligence ?
Business analytics (BA) refers to the skills, technologies,
practices for continuous iterative exploration and investigation of
past business performance to gain insight and drive business
planning.
Business analytics focuses on developing new insights and
understanding of business performance based on data and
statistical methods.
In contrast, business intelligence (BI) traditionally focuses on
using a consistent set of metrics to both measure past
performance and guide business planning, which is also based on
data and statistical methods
Big Data is a broad term for data sets so large or complex that they
are difficult (or too expensive) to process using traditional data
processing applications. Challenges include analysis, capture, curation,
search, sharing, storage, transfer, visualization, and information
privacy.
What is Big Data ?
!message : Big Data creates new opportunities to extend Analytics for higher value
BI Principles Hadoop I Hadoop IIIntroduction Hadoop Ecosystem
Big data and You
4th V: Value
5th V: Veracity
For more information/technical details, feel free to contact us
OLTP versus OLAP
# 3
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
BI reference Architecture
Reporting solutions
display data in a either synthesized or
detailed view, easy to understand for
the end user (data mining: discovering
Interesting/useful patterns
/relationships in large volumes of
data – analyzing the past to predict
the future)
Data warehouse
central database in which data are
stored and can be restructured to
answer Business needs.
ETL
Unifies data from heterogeneous data
sources (extracting the useful data)
Consolidates them into a unique
destination database (cleansing,
modifying the data according to the
desired output)
Good to know !
People, very often, associate BI with reporting/data mining tool, because this is the “visible” part of the iceberg. But This is an
misnomer, BI refers to the full set of tools, such as Reporting, Data warehouse and ETL. For your information, ~70% of the costs and
efforts in BI projects is about the data warehouse, the most important (but hidden) part of the “iceberg”.
Star Schema
Optimized for SQL read requests. Fact
table (metrics of the reports) in the middle,
surrounded by dimension tables (Y axis)
= On Line Analytical Processing (OLAP)
3NF Schema
Optimized for flexibility and storage
space savings = On Line Transactional
Processing (OLTP)
How does Analytics work ? What does OLAP mean ?
!message : BI/Analytics is the way to transform raw data into decision/information
Definition BI Principles Hadoop IHadoop IIuction Hadoop EcosystemChronology BI vs B
Big data and YouAny Analytics Projects/ questions ? Do not hesitate to contact us
First steps - early1950
IBM newspaper : Article " A Business Intelligence System" (Hans Peter Luhn)
Birth of the wording “Business intelligence”
First tools for automatic methods, providing alert services (for scientists)
1970
First MIS solutions – Management Information System
Static, non flexible
No analysis features
1980
First EIS software – Executive Information System
More sophisticated MIS: simulations, report, forecast,
1990
BI concepts, is officially formalized by Howard Dresner, Gartner Group analyst
Birth of Business Performance Management (BPM / EPM)
2005 – 2010
BI market strong consolidation – big major IT acquisitions
Oracle acquired Siebel (Report - 6B$), Hyperion (EPM- 4B$), Sunopsis (ETL- 1 B$)
SAP acquired Business Objects (Report – 7B$), Sysbase (DW – 6B$), Fuzi (ETL),
IBM bought Cognos (Report – 5B$), Netezza (DW – 2B$), Ascential (ETL – 1B$)
-
Yahoo and Google faced terrible performance issues with DW architecture – Need
of rethinking data analysis approach – birth of Hadoop
2012 and +
Birth of Big data
# 4
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
A little bit of history ?
!message : Analytics has evolved from business initiative to business imperative
Definition BI Principles Hadoop I Hadoop IIHadoop EcosystemChronology BI vs BigData Hadoop
Big data and You
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Why Hadoop ?
1
2
Performance issue : Consider that over the past decade :
- CPU speed performance has increased 8 to 10 times
- DRAM speed performance has increased 7 to 9 times
- Network speed performance has increased 100 times
- Bus speed performance has increased 8 to 10 times
- Hard disk drive speed performance has increased ONLY 1.2 times
NoSQL: Not Only SQL
Mechanism for storage and retrieval of data that is modeled in means other than
the tabular relations used in relational databases.
 Motivations for this approach include simplicity of design, horizontal
scaling, finer control over availability and most importantly COST
!message : Hadoop meets the need of new scalable architectures providing a business
Efficiency and flexibility over the existing relational data model
ciples Hadoop I Hadoop II Hadoop EcosystemChronology BI vs BigData Hadoop Pattern Hadoop Market
# 5
Big data and YouWould like to bench/test ? Go to MOP Client Center
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
How does it work ?
Apache Hadoop is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets
(Big Data) on computer clusters built from commodity hardware.
The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (MapReduce). Hadoop splits files into large
blocks and distributes the blocks amongst the nodes in the cluster. To process the data, Hadoop Map/Reduce transfers code (specifically Jar files) to nodes that
have the required data, which the nodes then process in parallel.
This approach takes advantage of data locality to allow the data to be processed faster and more efficiently via distributed processing than by using a more
conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking
Would like to appear like an expert ?
HDFS default replication : 3 x, HDFS default blocks size = 128 MB, HDFS sits on top of a native Linux filesytem (ext4, ext3), Slave nodes : HDFS
(= data node), MapReduce (= task tracker) , Master nodes : HDFS (= name node), MR (= job tracker), secondary name node is for High Availability
!message : Volume and Variety challenges have led to the creation of new data
processing : Map Reduce and HDFS
Hadoop I Hadoop II Hadoop EcosystemChronology BI vs BigData Hadoop Pattern Hadoop Market BD&A
# 6
Big data and YouWould like briefing ? Go to MOP Client Center
YARN, “the hadoop 2 “ decouples MapReduce's resource management and
scheduling capabilities, enabling Hadoop to support more varied processing
approaches/applications (interactive SQL, real-time streaming, batch processing) # 7
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Flume was created to allow you to
flow data from a source into your
Hadoop® environment.
ZooKeeper provides a centralized
infrastructure and services that
enable synchronization across a
cluster. ZooKeeper maintains
common objects needed in large
cluster environments like
configuration information,
hierarchical naming space …
HBase is a column-oriented
database management system
that runs on top of HDFS. It is well
suited for sparse data sets, which
are common in many big data use
cases
Some folks at Facebook developed
Hive™, allowing SQL developers to
write Hive Query Language (HQL)
statements that are similar to
standard SQL statements
Oozie simplifies workflow and
coordina¬tion between jobs. It
provides users with the ability to
define actions and dependencies
between actions.
Pig initially developed at Yahoo!
allows people to focus more on
analyzing large data sets and spend
less time having to write mapper and
reducer programs.
Sqoop is a connectivity tool for
moving data from non-Hadoop
data stores – such as relational
databases and data warehouses –
into Hadoop
Mahout takes the most popular data mining algorithms
for performing clustering, regression testing and
statistical modeling and implements them using the
Map Reduce model
Ambari is a web-based set
of tools for deploying,
administering and
monitoring Apache Hadoop
clusters
!message : The HDFS file system is not restricted to MapReduce jobs. It can be used
for other applications, many of which are under development at Apache
Hadoop II Hadoop Ecosystem BI vs BigData Hadoop Pattern Hadoop Market BD&A Vendors Competition
Big data and You
# 8
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Different Approaches
Don’t take us wrong : there is no bad approach or
good approach, there is no magical approach.
There are different approaches, for different
needs and results.
With BI approach, Business Users determine what
question to ask (business hypothesis) and IT team
structures the data (specific selected data into
data warehouse) to answer to the question.
With Big Data approach, IT delivers (all data) a
platform to enable creative discovery and
Business Users Explores what questions could be
asked
Different Architectures
BI architecture: Application server and Database
server are separated, Network is still in the
middle, Data have to go through the network.
Big Data architecture: Analysis Program runs
where are the data : Functions have to go through
the network. This is highly scalable and flexible by
design
Different Objectives
Hadoop is one of the multiple facets of Big Data.
This facet (Hadoop) is designed to run huge
(Volume) “read” batch, in extreme costs savings
way for unstructured data (Variety)
!message : Do not compare apples and oranges : you should (still) need both
Hadoop Ecosystem BI vs BigData Hadoop Pattern Hadoop Market BD&A Vendors Competition In Memory
Big data and YouFor more information/technical details, feel free to contact us
# 9
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Technical Hadoop Patterns
Big Data
Exploration
Find, visualize,
understand all big
data to improve
decision making
Enhanced 360o
View
of the Customer
Extend existing
customer views
(MDM, CRM, etc) by
incorporating
additional internal
and external
information sources
Operations Analysis
Analyze a variety of machine
data for improved business results
Data Warehouse Augmentation
Integrate big data and data
warehouse capabilities to
increase operational efficiency
Security/Intelligence
Extension
Lower risk, detect
fraud and monitor
cyber security in real-
time
Big Data Business Use Cases
Keep in Mind
The term Big Data is a bit of a misnomer. Big data is not
only referring to huge volume of data or Hadoop, there are
many others patterns using streams or in memory solutions
!message : Big Data Analytics are applied Across all Industries, different use cases
BI vs BigData Hadoop Pattern Hadoop Market BD&A Vendors Competition In Memory Streams BigInsights
Big data and You
# 10
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Hadoop has been most rapidly adopted by the government,banking,finance,IT and ITES, and insurance sectors
Geographical analysis of the market seems to suggest that North Americais the leadingrevenuegenerating market and will continue to
remain so till 2020.
Hadoop hardware-based,solution providershave been the highest receivers of venture capital funding.The recent times have witnessed a steep
demandfor real-time,operationalanalytics
!message : In 1990’s new performing hardware was the differentiator for companies
to compete. Nowadays big data is the key competitive differentiator
Hadoop Pattern Hadoop Market BD&A Vendors Competition In Memory Streams BigInsights Architecture
Big data and You
Hortonworks study – 2014 wikibon figures - 2013
# 11
IBM Montpellier Client Center
The market for Big Data &
Analytics solutions has
exploded
The race is hot and complex:
 Every vendor is
jumping in
 Alternatives from
everywhere
 Startups proliferate
 Partnerships
No other vendor has what IBM
have
– Software/ Hardware
– Services / Research
– Cloud, Mobile, Social
Yet just having ‘everything’
does not make for a market
leader
Based primarily on 2012 Wikibon report/forcast http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2012-2017
!message : The race is hot, Every vendor is jumping in, Alternatives from everywhere,
Startups proliferate, how do we differentiate in such a crowded market?
Hadoop Market BD&A Vendors Competition In Memory Streams BigInsights Architecture Positioning
Big data and YouAny competitive big data questions ? feel free to contact us
# 12
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
4 major distributions
of Hadoop have
spawned ecosystems
of partners
developing data
management and
analytic solutions for
Big Data
!message : IBM is a global Big data and Analytics leaders, industry’s most comprehensive
and enterprise class solutions, broadest portfolio
BD&A Vendors Competition In Memory Streams BigInsights Architecture Positioning Why Power?
Big data and YouAny competitive big data questions ? feel free to contact us
# 13
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
In-Memory - good timing for an old idea
Largely driven by the big data phenomenon, In-memory computing is a powerful,
transformative IT trend to meet high-performance analytics expectations and
data visualization needs. In memory solution should not be confused with
conventional DBMS storing data in disk blocks cached in memory.
In-Memory” Database technology has been around for over a decade.
Traditionally in-memory technology was used in a limited number of operational
applications workloads (FSS trading, Telco Billing, HPC, embedded devices) but in
2011 we saw Inflection Point : Increased focus and ‘push’ by SAP
With in-memory database, all information is initially loaded into memory. This
eliminates the need for optimized databases, indexes, aggregates and designing of
cubes and star schemas. The arrival of column centric databases which stored similar information
together allowed storing data more efficiently with greater compression
and faster read access , reducing the amount of memory needed to
perform a query and increasing processing speed. That’s why column-
based technology is very often associated to in memory technology
Column Based Technology
Volume: users /data
increase, RAM needed also
increases = hardware
costs
Velocity : real time
analytics, operational
analytics
!message : Big Data analytics can benefit from these very large in memory
Systems for velocity (since Memory has become cheaper)
dors Competition In Memory Streams BigInsights Architecture Positioning Why Power? Contacts/info
Big data and YouDo you need Big Data Analytics Briefing ? Come to us in MOP
# 14
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
 Deal with Terabytes of data
each second
 Work with application,
sensor and internet data,
video/audio
 Deliver insight in
microseconds to analytical
applications
 Support complex scenarios
using C++ or Java code
Streams is tailor made for companies who need to process data from non-traditional sources, with huge volumes of
data, and need results very, very quickly, integrated with existing analytics investments
 Stream computing is a different paradigm – the left
shows the traditional way data is accessed using
queries to pull the data from a data storage device
such as a data warehouse or database – which is still
valid for many requirements
 The new stream computing paradigm brings data to
the query – data is pushed or flows through the
analytics. This is required for many new use cases in
big data
 Here’s a little more on how streams works and
what you can do with it.
 Each of these square represents an operator.
The data passes (input stream) through each
operator where some action is being performed
on the data (output stream)
 You can fuse data form multiple streams, you
can modify it, annotate it, perform an analytics
operation on it, fuse multiple streams or
classify it.
!message : Velocity challenges have led to the creation of new data computing paradigm
and solution: streaming to bring microseconds effective real time
In Memory Streams BigInsights Architecture Positioning Why Power? Contacts/info
Big data and YouDo you need Big Data Analytics Briefing ? Come to us in MOP
# 15
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Hadoop is an Open Source implementation and although very well maintained, doing the “job” for
companies it implies a risk. Like Linux, major IT companies provide Hadoop distributions.
IBM took this Hadoop and ruggedized it for enterprises, adding enterprises features such as
performance, resilience and IBM experiences, (bigsheets, bigsql,gpfs…) while maintaining the open
standards 100%. We call it Biginisghts, running on x86, Power Systems and Mainframe (linux)
2 editions : basic edition (100% open
source – free) and Enterprise Edition
BigSheets - a big data
visualization capability
that enables end users to
collect, explore and
uncover actionable
insights through a
commonly understood
spreadsheet experience
(drag and drop, clicks
without any Java or
Hadoop skills)
Adaptive Map Reduce –
Already proven product
from Platform Computing
(HPC acquisition) ,
rewriting Map Reduce
paradigm in C++ (No
garbage collection, faster
memory management),
allowing :
• Optimized Shuffle, map
sort
• Resource management
and scheduling of jobs
is separated
• leverage shared
memory across JVMs,
eliminating data
movement
BigSQL – SQL on Hadoop
is challenging (wide variety
of data, MR is batch
oriented), BigSQL provides
Native full compliant SQL
access to data
stored in BigInsights, Real
JDBC/ODBC drivers, and
optimization based on
Massively Parallel
processing (MPP)
architecture, from DB2
experience
Spectrum Scale – GPFS
FPO (file placement
optimizer) scalable, high
performance, and highly
reliable, 20+ years
experienced product, has
many advantages over
HDFS:
• POSIX compliant
• No single point of
failure
• Multi tenant
• HA/DR solutions
IBM BigInsights for Apache Hadoop v4 has
been just released based on ODP initative
Version 3.0 – Enterprise Edition
!message : IBM Hadoop strategy : better analytics tooling that is easier to use +
commitment to Hadoop open source (ODP initiative)
In Memory Streams BigInsights Architecture Positioning Why Power? Contacts/info
Big data and You
# 16
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
How are leading companies transforming their data and analytics environment
to take advantage of Big Data and provide faster, better insights at reduced
costs within their existing Enterprise Data Warehouses ?
100010010101010
100010010101010
100010010101010
100001101010101100
100001101010101100
000111000010011
000111000010011
!message : The foundational schematic to bring analytics to all stages in the data
lifecycle can be overlaid with specific products that provide the functions
Streams BigInsights Architecture Positioning Why Power? Contacts/info
Big data and YouNeed Customer Enablement ? Education ? Send us an email
# 3
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
!message :
Systems of Record
Structured data from
operational systems
Transformational benefit / business outcomes come from integration of
new data sources with traditional corporate data to find new insights
Systems of Engagement
Data that “connects”
companies with their
customers, partners and
employees
Systems of Insight
Diverse data types that
combine
structured and
unstructured data
for business insight
Streams BigInsights Architecture Positioning Why Power? Contacts/info
In Memory
Hadoop
EDW
Appliance
# 17
Big data and YouNeed Architecture Workshop ? Sizing ? Send us an email
# 18
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
Important to keep in mind
Big Data (BigInsights, Cognos, SPSS, …) can run on IBM System z. Customers could take advantages of co-locating business data and OLAP
data, managing high speed transactions and complex queries for real time operational analyticson a single integrated platform and take benefits
of the performance, resiliency and quality of service of IBM Mainframe for critical businesses., as many banks/insurance customers
!message : The infrastructure is a foundational piece to IBM’s perspective of
delivering capabilities and offerings for BD&A
Hadoop is Linux – Linux is Power Hadoop is cheap - Power is cheap
Hadoop ecosystem – PowerLinux market acceptance
Power advantages for Big Data
Linux on Power – run the same commands as linux
on x86 – versions release as the same date
Linux on Power makes 17,6% of top 500 most
linux powerful systems (with 5 in top 10)
POWER8 increases performance, reliability and
availability lead over Intel, alternative to intel
OpenPower foundation brings Rapid innovation to
Power Platform for open linux
Little Endian support makes porting Linux on
x86 applications even easier
Power8 design point is for big data (more
threads, more cache , more bandwidth, CAPI …)
Intel design point is for multiple market
(smart phone, tablet desktop PC, servers …)
Streams BigInsights Architecture Positioning Why Power? Contacts/info
Big data and YouFeel free to contact MOP PowerLinux center for more details
# 20
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
IBM BigData RessourcesWw Competency Centers Big Data Analytics Links
Web sites
ibm.com/Hadoop
Information Management Acceleration Zone
PowerLinux Big Data
IBM communities
IBM Systems Big Data and Analytics
BDSC practitioner wiki
IBM Analytics Global
Big Data& Analytics Clients References
IBM Developper Works
https://www.ibm.com/developerworks/analytics/
Please, Please
Help us in improving this document – if any comments / ideas please feel free to send an email
http://bigdatauniversity.com/
http://wikibon.org/wiki/v/Category:Big_Data
http://en.wikipedia.org/wiki/Apache_Hadoop
http://www.slideshare.net/search/slideshow?
searchfrom=header&q=big+data
[INFO] Based on 3 experienced years of big data projects , after many weeks of intensive work for compiling several
presentations done to customers or conferences, synthetizing concepts, the objective of this educational paper is to
clarify some of the concepts and solutions around Big Data in order to better understand the related challenges and
opportunities. But There may be (so many) typing errors, mistakes, misleading words, missing concepts, so Please be kind 
Streams Biginsights Architecture Positioning Why Power? Contacts/info
Big data and YouIf we can not help you directly, we’ill point you to the right person
> Strong history of leadership in open source & standards : IBM has always been a believer in
standardization of interfaces to components of IT and application infrastructure (SQL, Eclipse,
OpenPower …)
> Supports our commitment to open source currency in all future releases
> Accelerates IBM innovation within Hadoop & surrounding applications
> Expecting Hortonworks, Pivotal distribution adoption on PowerLinux
> The current ecosystem is challenged and slowed by fragmented and duplicated efforts. The ODP
Core will take the guesswork out of the process and accelerate many use cases by running on a
common platform. Freeing up enterprises and ecosystem vendors to focus on building business
driven applications.
# 21
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
!message : ODP is clearly a major and strategic choice in Open community to accelerate
Hadoop adoption and grow BigInsights and PowerLinux ecosystem / ISV
NEW AND/OR HOT !!! OPEN DATA PLATFORM
Big data and You
What is Open Data Platform (ODP) ?
> It is an Open-source, non-profit entity, focused, committed in evolving the current state of
the platform, and delivering a Foundation certified, packaged, and tested Reference Distribution
Why Open Data Platform (ODP) ?
Where to position ODP vs Apache ?
> ODP supports the Apache (ASF) mission
> ASF provides a governance model around
individual projects without looking at ecosystem
> ODP aims to provide a vendor-led consistent
packaging model for core Apache components as
an ecosystem
Why IBM is involved in ODP ?
# 22
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
!message : IBM fundamental cloud strategy : Complete cloud offering, mixed between
control and simplicity.
Big data and You
NEW AND /OR HOT !!! Big Data/Analytics and Cloud
Customer Data
Center (On-Premises)
Cloud Data Center
(Off Premises)
SIMPLICITY
CONTROL
PureData for analytics
DB2 BLU
Infosphere Biginsights
Cloudant
DashDB
Softlayer
Cloudant
DashDB
Distributed NoSQL “Data Layer”, Powering
Web, mobile, & IoT since 2009
Available as a fully-managed DBaaS, managed
by you on-premises or hybrid
Transactional JSON “document” database
Spreads data across data centers & devices
Ideal for apps that require:
> Massive, elastic scalability
> High availability
> Geo-location services
> Full-text search
> Occasionally connected users
Data warehouse and analytics
as a service on the cloud
• Next Generation In-Memory
• Columnar
• SIMD Hardware Acceleration
• Actionable Compression
• Support for OLAP SQL extensions
• Connect common 3rd party BI tools
dashDB keeps data warehouse infrastructure out
of your way, allowing you to take benefits of :
# 23
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
!message : Spark is positioned as a fast and general engine for Big Data. It
generalizes the MapReduce model and (could?)is poised to replace MapReduce
Big data and You
NEW AND/OR HOT !!! SPARK
Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based
MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a
cluster's memory and query it repeatedly, Spark is well suited to machine learning algorithms.
Spark requires a cluster manager and a distributed storage system. For cluster management, Spark supports standalone (native Spark cluster), Hadoop YARN, or Apache
Mesos.For distributed storage, Spark can interface with a wide variety, including Hadoop Distributed File System (HDFS), Cassandra, OpenStack Swift, and Amazon S3.
Spark also supports a pseudo-distributed local mode, usually used only for development or testing purposes, where distributed storage is not required and the local file
system can be used instead; in this scenario, Spark is running on a single machine with one executor per CPU core.
Spark had over 465 contributors in 2014, making it the most active project in the Apache Software Foundation and among Big Data open source projects
# 24
IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com
!message : From application point of view, data lake challenge is to be an unique
and unified data repositories, queryable like a black box
Big data and You
NEW AND /OR HOT !!! DATA LAKE ARCHITECTURE
IDC in late 2014 stated “By 2017 unified data platform architecture will
become the foundation of BDA strategy. The unification will occur
across information management, analysis, and search technology.”
 A Data reservoir is a data lake that provides data to an
organization for a variety of analytics processing including:
• Discovery and exploration of data
• Simple ad hoc analytics
• Complex analysis for business decisions
• Reporting
• Real-time analytics
 It is possible to deploy analytics into the data reservoir to
generate additional insight from the data loaded into the data
reservoir.
 A data reservoir manages shared repositories of information for
analytical purposes.
 Each Data Reservoir Repository is optimized for a particular type
of processing.
• Real-time analytics, deep analytics (such as data mining), exploratory
analytics, OLAP, reporting, …
Example – Creating a logical warehouse
Information virtualization hides the complexities of where the
data is located. Here different repositories are being used to
host different workloads, but this complexity is hidden by the
information virtualization layer.

Weitere ähnliche Inhalte

Was ist angesagt?

Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
 
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesSlideTeam
 
DAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from RealityDAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from RealityDATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data ModelingAdam Doyle
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMDATAVERSITY
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
DataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapCCG
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data ArchitectureSammer Qader
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDATAVERSITY
 
Regulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseRegulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseDenodo
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMDATAVERSITY
 

Was ist angesagt? (20)

Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation Slides
 
DAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from RealityDAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from Reality
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
DataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management Technologies
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics Roadmap
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data Architecture
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance Framework
 
Regulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseRegulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven Enterprise
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
 

Ähnlich wie Big Data and You: An Introduction

Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapterRajiv Tiwari
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceAssignment Help
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business IntelligenceHGanesh
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosSenturus
 

Ähnlich wie Big Data and You: An Introduction (20)

Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big Data przt.pptx
Big Data przt.pptxBig Data przt.pptx
Big Data przt.pptx
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapter
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Big Data
Big DataBig Data
Big Data
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
 
Combining hadoop with big data analytics
Combining hadoop with big data analyticsCombining hadoop with big data analytics
Combining hadoop with big data analytics
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and Cognos
 

Kürzlich hochgeladen

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Big Data and You: An Introduction

  • 1. Big Data and You 2015 May Edition Objectives This document is designed to introduce big Data and Analytics . Instead of being deep dive technical paper or product portfolio details, friendly educational presentation (easily and quickly read) for specialists, architects, PMs and managers (*). One simple goal (but complex and time consuming exercise): is you read this paper, you learn something and then you would like to get more details to become an expert. Yes, You can Big Data Table of Contents 1. Introduction 2. Definition 3. BI principles 4. Chronology 5. Hadoop I 6. Hadoop II 7. Hadoop Ecosystem 8. BI vs Big Data 9. Hadoop patterns 10. Hadoop Market Introduction 2012 was the big data marketing buzz, 2013 was the big data technical enablement, 2014 was the big data projects. Now European customers are massively deploying big data (and still analytics) projects. It is time to become an expert to guide our customers and talk with Big Data ecosystem to fill the Big Data skills gap (*) This paper doesn’t pretend to be exhaustive on the Big Data subject, nor it is intended to recommend precise and specific architecture for architects, recommend performance and technical details for specialists or marketing campaign. It doesn’t assume, or require any (or few) knowledge of Big Data 11. BD&A vendors 12. Competition 13. In Memory 14. Streams 15. BigInsights 16. Architecture 17. Positioning 18. Why Power ? 19. Contacts 20. New ! Author # Christophe.menichetti@fr.ibm.com
  • 2. # 1 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Introduction Definition What is Data Analysis ? Why Analysing Data ? Analysis of data is a process of inspecting, cleaning, transforming, and modelling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains, such as : Business Intelligence/Analytics Data Mining / predictive Tools Big Data Data integration/ Data visualisation And so on … IT technologies and computer sciences are evolving. Yesterday, when IBM, Honeywell, Sperry, ICL, Xerox,Digital or Olivetti were the IT leaders, CPU and Memory were the key differentiators. Today, when IBM, Google,SAP, Oracle are the IT leaders, the ultimate differentiator is being able to make more informed choices with confidence, to anticipate and shape business outcomes. As company and industry leaders, you absolutely need deeper insight from their information, to beat your competitors : • Which customers are thinking of leaving? • Which transactions are fraudulent? • Detect life-threatening conditions in time to intervene Let’s make it simpler – An example Analytics = transforming data into (sexy) information to make (intelligent) decision Weather Forecast : You should decide which boot you’ll take to go to Paris. You are not expert at all (temperature, pressure, cyclone = RAW data) but you can decide based on weather map (report/analysis) !message : Data is the new oil requiring Mining, Refining and Delivering BI Principles Chronology Hadoop I Hadoop II Big data and You
  • 3. # 2 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Definition What is Business Intelligence ? Business analytics (BA) refers to the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. In contrast, business intelligence (BI) traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods Big Data is a broad term for data sets so large or complex that they are difficult (or too expensive) to process using traditional data processing applications. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. What is Big Data ? !message : Big Data creates new opportunities to extend Analytics for higher value BI Principles Hadoop I Hadoop IIIntroduction Hadoop Ecosystem Big data and You 4th V: Value 5th V: Veracity For more information/technical details, feel free to contact us
  • 4. OLTP versus OLAP # 3 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com BI reference Architecture Reporting solutions display data in a either synthesized or detailed view, easy to understand for the end user (data mining: discovering Interesting/useful patterns /relationships in large volumes of data – analyzing the past to predict the future) Data warehouse central database in which data are stored and can be restructured to answer Business needs. ETL Unifies data from heterogeneous data sources (extracting the useful data) Consolidates them into a unique destination database (cleansing, modifying the data according to the desired output) Good to know ! People, very often, associate BI with reporting/data mining tool, because this is the “visible” part of the iceberg. But This is an misnomer, BI refers to the full set of tools, such as Reporting, Data warehouse and ETL. For your information, ~70% of the costs and efforts in BI projects is about the data warehouse, the most important (but hidden) part of the “iceberg”. Star Schema Optimized for SQL read requests. Fact table (metrics of the reports) in the middle, surrounded by dimension tables (Y axis) = On Line Analytical Processing (OLAP) 3NF Schema Optimized for flexibility and storage space savings = On Line Transactional Processing (OLTP) How does Analytics work ? What does OLAP mean ? !message : BI/Analytics is the way to transform raw data into decision/information Definition BI Principles Hadoop IHadoop IIuction Hadoop EcosystemChronology BI vs B Big data and YouAny Analytics Projects/ questions ? Do not hesitate to contact us
  • 5. First steps - early1950 IBM newspaper : Article " A Business Intelligence System" (Hans Peter Luhn) Birth of the wording “Business intelligence” First tools for automatic methods, providing alert services (for scientists) 1970 First MIS solutions – Management Information System Static, non flexible No analysis features 1980 First EIS software – Executive Information System More sophisticated MIS: simulations, report, forecast, 1990 BI concepts, is officially formalized by Howard Dresner, Gartner Group analyst Birth of Business Performance Management (BPM / EPM) 2005 – 2010 BI market strong consolidation – big major IT acquisitions Oracle acquired Siebel (Report - 6B$), Hyperion (EPM- 4B$), Sunopsis (ETL- 1 B$) SAP acquired Business Objects (Report – 7B$), Sysbase (DW – 6B$), Fuzi (ETL), IBM bought Cognos (Report – 5B$), Netezza (DW – 2B$), Ascential (ETL – 1B$) - Yahoo and Google faced terrible performance issues with DW architecture – Need of rethinking data analysis approach – birth of Hadoop 2012 and + Birth of Big data # 4 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com A little bit of history ? !message : Analytics has evolved from business initiative to business imperative Definition BI Principles Hadoop I Hadoop IIHadoop EcosystemChronology BI vs BigData Hadoop Big data and You
  • 6. IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Why Hadoop ? 1 2 Performance issue : Consider that over the past decade : - CPU speed performance has increased 8 to 10 times - DRAM speed performance has increased 7 to 9 times - Network speed performance has increased 100 times - Bus speed performance has increased 8 to 10 times - Hard disk drive speed performance has increased ONLY 1.2 times NoSQL: Not Only SQL Mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.  Motivations for this approach include simplicity of design, horizontal scaling, finer control over availability and most importantly COST !message : Hadoop meets the need of new scalable architectures providing a business Efficiency and flexibility over the existing relational data model ciples Hadoop I Hadoop II Hadoop EcosystemChronology BI vs BigData Hadoop Pattern Hadoop Market # 5 Big data and YouWould like to bench/test ? Go to MOP Client Center
  • 7. IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com How does it work ? Apache Hadoop is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (MapReduce). Hadoop splits files into large blocks and distributes the blocks amongst the nodes in the cluster. To process the data, Hadoop Map/Reduce transfers code (specifically Jar files) to nodes that have the required data, which the nodes then process in parallel. This approach takes advantage of data locality to allow the data to be processed faster and more efficiently via distributed processing than by using a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking Would like to appear like an expert ? HDFS default replication : 3 x, HDFS default blocks size = 128 MB, HDFS sits on top of a native Linux filesytem (ext4, ext3), Slave nodes : HDFS (= data node), MapReduce (= task tracker) , Master nodes : HDFS (= name node), MR (= job tracker), secondary name node is for High Availability !message : Volume and Variety challenges have led to the creation of new data processing : Map Reduce and HDFS Hadoop I Hadoop II Hadoop EcosystemChronology BI vs BigData Hadoop Pattern Hadoop Market BD&A # 6 Big data and YouWould like briefing ? Go to MOP Client Center
  • 8. YARN, “the hadoop 2 “ decouples MapReduce's resource management and scheduling capabilities, enabling Hadoop to support more varied processing approaches/applications (interactive SQL, real-time streaming, batch processing) # 7 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Flume was created to allow you to flow data from a source into your Hadoop® environment. ZooKeeper provides a centralized infrastructure and services that enable synchronization across a cluster. ZooKeeper maintains common objects needed in large cluster environments like configuration information, hierarchical naming space … HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data sets, which are common in many big data use cases Some folks at Facebook developed Hive™, allowing SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL statements Oozie simplifies workflow and coordina¬tion between jobs. It provides users with the ability to define actions and dependencies between actions. Pig initially developed at Yahoo! allows people to focus more on analyzing large data sets and spend less time having to write mapper and reducer programs. Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such as relational databases and data warehouses – into Hadoop Mahout takes the most popular data mining algorithms for performing clustering, regression testing and statistical modeling and implements them using the Map Reduce model Ambari is a web-based set of tools for deploying, administering and monitoring Apache Hadoop clusters !message : The HDFS file system is not restricted to MapReduce jobs. It can be used for other applications, many of which are under development at Apache Hadoop II Hadoop Ecosystem BI vs BigData Hadoop Pattern Hadoop Market BD&A Vendors Competition Big data and You
  • 9. # 8 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Different Approaches Don’t take us wrong : there is no bad approach or good approach, there is no magical approach. There are different approaches, for different needs and results. With BI approach, Business Users determine what question to ask (business hypothesis) and IT team structures the data (specific selected data into data warehouse) to answer to the question. With Big Data approach, IT delivers (all data) a platform to enable creative discovery and Business Users Explores what questions could be asked Different Architectures BI architecture: Application server and Database server are separated, Network is still in the middle, Data have to go through the network. Big Data architecture: Analysis Program runs where are the data : Functions have to go through the network. This is highly scalable and flexible by design Different Objectives Hadoop is one of the multiple facets of Big Data. This facet (Hadoop) is designed to run huge (Volume) “read” batch, in extreme costs savings way for unstructured data (Variety) !message : Do not compare apples and oranges : you should (still) need both Hadoop Ecosystem BI vs BigData Hadoop Pattern Hadoop Market BD&A Vendors Competition In Memory Big data and YouFor more information/technical details, feel free to contact us
  • 10. # 9 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Technical Hadoop Patterns Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real- time Big Data Business Use Cases Keep in Mind The term Big Data is a bit of a misnomer. Big data is not only referring to huge volume of data or Hadoop, there are many others patterns using streams or in memory solutions !message : Big Data Analytics are applied Across all Industries, different use cases BI vs BigData Hadoop Pattern Hadoop Market BD&A Vendors Competition In Memory Streams BigInsights Big data and You
  • 11. # 10 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Hadoop has been most rapidly adopted by the government,banking,finance,IT and ITES, and insurance sectors Geographical analysis of the market seems to suggest that North Americais the leadingrevenuegenerating market and will continue to remain so till 2020. Hadoop hardware-based,solution providershave been the highest receivers of venture capital funding.The recent times have witnessed a steep demandfor real-time,operationalanalytics !message : In 1990’s new performing hardware was the differentiator for companies to compete. Nowadays big data is the key competitive differentiator Hadoop Pattern Hadoop Market BD&A Vendors Competition In Memory Streams BigInsights Architecture Big data and You Hortonworks study – 2014 wikibon figures - 2013
  • 12. # 11 IBM Montpellier Client Center The market for Big Data & Analytics solutions has exploded The race is hot and complex:  Every vendor is jumping in  Alternatives from everywhere  Startups proliferate  Partnerships No other vendor has what IBM have – Software/ Hardware – Services / Research – Cloud, Mobile, Social Yet just having ‘everything’ does not make for a market leader Based primarily on 2012 Wikibon report/forcast http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2012-2017 !message : The race is hot, Every vendor is jumping in, Alternatives from everywhere, Startups proliferate, how do we differentiate in such a crowded market? Hadoop Market BD&A Vendors Competition In Memory Streams BigInsights Architecture Positioning Big data and YouAny competitive big data questions ? feel free to contact us
  • 13. # 12 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com 4 major distributions of Hadoop have spawned ecosystems of partners developing data management and analytic solutions for Big Data !message : IBM is a global Big data and Analytics leaders, industry’s most comprehensive and enterprise class solutions, broadest portfolio BD&A Vendors Competition In Memory Streams BigInsights Architecture Positioning Why Power? Big data and YouAny competitive big data questions ? feel free to contact us
  • 14. # 13 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com In-Memory - good timing for an old idea Largely driven by the big data phenomenon, In-memory computing is a powerful, transformative IT trend to meet high-performance analytics expectations and data visualization needs. In memory solution should not be confused with conventional DBMS storing data in disk blocks cached in memory. In-Memory” Database technology has been around for over a decade. Traditionally in-memory technology was used in a limited number of operational applications workloads (FSS trading, Telco Billing, HPC, embedded devices) but in 2011 we saw Inflection Point : Increased focus and ‘push’ by SAP With in-memory database, all information is initially loaded into memory. This eliminates the need for optimized databases, indexes, aggregates and designing of cubes and star schemas. The arrival of column centric databases which stored similar information together allowed storing data more efficiently with greater compression and faster read access , reducing the amount of memory needed to perform a query and increasing processing speed. That’s why column- based technology is very often associated to in memory technology Column Based Technology Volume: users /data increase, RAM needed also increases = hardware costs Velocity : real time analytics, operational analytics !message : Big Data analytics can benefit from these very large in memory Systems for velocity (since Memory has become cheaper) dors Competition In Memory Streams BigInsights Architecture Positioning Why Power? Contacts/info Big data and YouDo you need Big Data Analytics Briefing ? Come to us in MOP
  • 15. # 14 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com  Deal with Terabytes of data each second  Work with application, sensor and internet data, video/audio  Deliver insight in microseconds to analytical applications  Support complex scenarios using C++ or Java code Streams is tailor made for companies who need to process data from non-traditional sources, with huge volumes of data, and need results very, very quickly, integrated with existing analytics investments  Stream computing is a different paradigm – the left shows the traditional way data is accessed using queries to pull the data from a data storage device such as a data warehouse or database – which is still valid for many requirements  The new stream computing paradigm brings data to the query – data is pushed or flows through the analytics. This is required for many new use cases in big data  Here’s a little more on how streams works and what you can do with it.  Each of these square represents an operator. The data passes (input stream) through each operator where some action is being performed on the data (output stream)  You can fuse data form multiple streams, you can modify it, annotate it, perform an analytics operation on it, fuse multiple streams or classify it. !message : Velocity challenges have led to the creation of new data computing paradigm and solution: streaming to bring microseconds effective real time In Memory Streams BigInsights Architecture Positioning Why Power? Contacts/info Big data and YouDo you need Big Data Analytics Briefing ? Come to us in MOP
  • 16. # 15 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Hadoop is an Open Source implementation and although very well maintained, doing the “job” for companies it implies a risk. Like Linux, major IT companies provide Hadoop distributions. IBM took this Hadoop and ruggedized it for enterprises, adding enterprises features such as performance, resilience and IBM experiences, (bigsheets, bigsql,gpfs…) while maintaining the open standards 100%. We call it Biginisghts, running on x86, Power Systems and Mainframe (linux) 2 editions : basic edition (100% open source – free) and Enterprise Edition BigSheets - a big data visualization capability that enables end users to collect, explore and uncover actionable insights through a commonly understood spreadsheet experience (drag and drop, clicks without any Java or Hadoop skills) Adaptive Map Reduce – Already proven product from Platform Computing (HPC acquisition) , rewriting Map Reduce paradigm in C++ (No garbage collection, faster memory management), allowing : • Optimized Shuffle, map sort • Resource management and scheduling of jobs is separated • leverage shared memory across JVMs, eliminating data movement BigSQL – SQL on Hadoop is challenging (wide variety of data, MR is batch oriented), BigSQL provides Native full compliant SQL access to data stored in BigInsights, Real JDBC/ODBC drivers, and optimization based on Massively Parallel processing (MPP) architecture, from DB2 experience Spectrum Scale – GPFS FPO (file placement optimizer) scalable, high performance, and highly reliable, 20+ years experienced product, has many advantages over HDFS: • POSIX compliant • No single point of failure • Multi tenant • HA/DR solutions IBM BigInsights for Apache Hadoop v4 has been just released based on ODP initative Version 3.0 – Enterprise Edition !message : IBM Hadoop strategy : better analytics tooling that is easier to use + commitment to Hadoop open source (ODP initiative) In Memory Streams BigInsights Architecture Positioning Why Power? Contacts/info Big data and You
  • 17. # 16 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com How are leading companies transforming their data and analytics environment to take advantage of Big Data and provide faster, better insights at reduced costs within their existing Enterprise Data Warehouses ? 100010010101010 100010010101010 100010010101010 100001101010101100 100001101010101100 000111000010011 000111000010011 !message : The foundational schematic to bring analytics to all stages in the data lifecycle can be overlaid with specific products that provide the functions Streams BigInsights Architecture Positioning Why Power? Contacts/info Big data and YouNeed Customer Enablement ? Education ? Send us an email
  • 18. # 3 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com !message : Systems of Record Structured data from operational systems Transformational benefit / business outcomes come from integration of new data sources with traditional corporate data to find new insights Systems of Engagement Data that “connects” companies with their customers, partners and employees Systems of Insight Diverse data types that combine structured and unstructured data for business insight Streams BigInsights Architecture Positioning Why Power? Contacts/info In Memory Hadoop EDW Appliance # 17 Big data and YouNeed Architecture Workshop ? Sizing ? Send us an email
  • 19. # 18 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com Important to keep in mind Big Data (BigInsights, Cognos, SPSS, …) can run on IBM System z. Customers could take advantages of co-locating business data and OLAP data, managing high speed transactions and complex queries for real time operational analyticson a single integrated platform and take benefits of the performance, resiliency and quality of service of IBM Mainframe for critical businesses., as many banks/insurance customers !message : The infrastructure is a foundational piece to IBM’s perspective of delivering capabilities and offerings for BD&A Hadoop is Linux – Linux is Power Hadoop is cheap - Power is cheap Hadoop ecosystem – PowerLinux market acceptance Power advantages for Big Data Linux on Power – run the same commands as linux on x86 – versions release as the same date Linux on Power makes 17,6% of top 500 most linux powerful systems (with 5 in top 10) POWER8 increases performance, reliability and availability lead over Intel, alternative to intel OpenPower foundation brings Rapid innovation to Power Platform for open linux Little Endian support makes porting Linux on x86 applications even easier Power8 design point is for big data (more threads, more cache , more bandwidth, CAPI …) Intel design point is for multiple market (smart phone, tablet desktop PC, servers …) Streams BigInsights Architecture Positioning Why Power? Contacts/info Big data and YouFeel free to contact MOP PowerLinux center for more details
  • 20. # 20 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com IBM BigData RessourcesWw Competency Centers Big Data Analytics Links Web sites ibm.com/Hadoop Information Management Acceleration Zone PowerLinux Big Data IBM communities IBM Systems Big Data and Analytics BDSC practitioner wiki IBM Analytics Global Big Data& Analytics Clients References IBM Developper Works https://www.ibm.com/developerworks/analytics/ Please, Please Help us in improving this document – if any comments / ideas please feel free to send an email http://bigdatauniversity.com/ http://wikibon.org/wiki/v/Category:Big_Data http://en.wikipedia.org/wiki/Apache_Hadoop http://www.slideshare.net/search/slideshow? searchfrom=header&q=big+data [INFO] Based on 3 experienced years of big data projects , after many weeks of intensive work for compiling several presentations done to customers or conferences, synthetizing concepts, the objective of this educational paper is to clarify some of the concepts and solutions around Big Data in order to better understand the related challenges and opportunities. But There may be (so many) typing errors, mistakes, misleading words, missing concepts, so Please be kind  Streams Biginsights Architecture Positioning Why Power? Contacts/info Big data and YouIf we can not help you directly, we’ill point you to the right person
  • 21. > Strong history of leadership in open source & standards : IBM has always been a believer in standardization of interfaces to components of IT and application infrastructure (SQL, Eclipse, OpenPower …) > Supports our commitment to open source currency in all future releases > Accelerates IBM innovation within Hadoop & surrounding applications > Expecting Hortonworks, Pivotal distribution adoption on PowerLinux > The current ecosystem is challenged and slowed by fragmented and duplicated efforts. The ODP Core will take the guesswork out of the process and accelerate many use cases by running on a common platform. Freeing up enterprises and ecosystem vendors to focus on building business driven applications. # 21 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com !message : ODP is clearly a major and strategic choice in Open community to accelerate Hadoop adoption and grow BigInsights and PowerLinux ecosystem / ISV NEW AND/OR HOT !!! OPEN DATA PLATFORM Big data and You What is Open Data Platform (ODP) ? > It is an Open-source, non-profit entity, focused, committed in evolving the current state of the platform, and delivering a Foundation certified, packaged, and tested Reference Distribution Why Open Data Platform (ODP) ? Where to position ODP vs Apache ? > ODP supports the Apache (ASF) mission > ASF provides a governance model around individual projects without looking at ecosystem > ODP aims to provide a vendor-led consistent packaging model for core Apache components as an ecosystem Why IBM is involved in ODP ?
  • 22. # 22 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com !message : IBM fundamental cloud strategy : Complete cloud offering, mixed between control and simplicity. Big data and You NEW AND /OR HOT !!! Big Data/Analytics and Cloud Customer Data Center (On-Premises) Cloud Data Center (Off Premises) SIMPLICITY CONTROL PureData for analytics DB2 BLU Infosphere Biginsights Cloudant DashDB Softlayer Cloudant DashDB Distributed NoSQL “Data Layer”, Powering Web, mobile, & IoT since 2009 Available as a fully-managed DBaaS, managed by you on-premises or hybrid Transactional JSON “document” database Spreads data across data centers & devices Ideal for apps that require: > Massive, elastic scalability > High availability > Geo-location services > Full-text search > Occasionally connected users Data warehouse and analytics as a service on the cloud • Next Generation In-Memory • Columnar • SIMD Hardware Acceleration • Actionable Compression • Support for OLAP SQL extensions • Connect common 3rd party BI tools dashDB keeps data warehouse infrastructure out of your way, allowing you to take benefits of :
  • 23. # 23 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com !message : Spark is positioned as a fast and general engine for Big Data. It generalizes the MapReduce model and (could?)is poised to replace MapReduce Big data and You NEW AND/OR HOT !!! SPARK Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well suited to machine learning algorithms. Spark requires a cluster manager and a distributed storage system. For cluster management, Spark supports standalone (native Spark cluster), Hadoop YARN, or Apache Mesos.For distributed storage, Spark can interface with a wide variety, including Hadoop Distributed File System (HDFS), Cassandra, OpenStack Swift, and Amazon S3. Spark also supports a pseudo-distributed local mode, usually used only for development or testing purposes, where distributed storage is not required and the local file system can be used instead; in this scenario, Spark is running on a single machine with one executor per CPU core. Spark had over 465 contributors in 2014, making it the most active project in the Apache Software Foundation and among Big Data open source projects
  • 24. # 24 IBM Montpellier Client Center Christophe.menichetti@fr.ibm.com !message : From application point of view, data lake challenge is to be an unique and unified data repositories, queryable like a black box Big data and You NEW AND /OR HOT !!! DATA LAKE ARCHITECTURE IDC in late 2014 stated “By 2017 unified data platform architecture will become the foundation of BDA strategy. The unification will occur across information management, analysis, and search technology.”  A Data reservoir is a data lake that provides data to an organization for a variety of analytics processing including: • Discovery and exploration of data • Simple ad hoc analytics • Complex analysis for business decisions • Reporting • Real-time analytics  It is possible to deploy analytics into the data reservoir to generate additional insight from the data loaded into the data reservoir.  A data reservoir manages shared repositories of information for analytical purposes.  Each Data Reservoir Repository is optimized for a particular type of processing. • Real-time analytics, deep analytics (such as data mining), exploratory analytics, OLAP, reporting, … Example – Creating a logical warehouse Information virtualization hides the complexities of where the data is located. Here different repositories are being used to host different workloads, but this complexity is hidden by the information virtualization layer.