2. Please note
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product
direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract. The development, release, and
timing of any future features or functionality described for our products remains at our sole
discretion.
Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will experience
will vary depending upon many factors, including considerations such as the amount of
multiprogramming in the user’s job stream, the I/O configuration, the storage configuration,
and the workload processed. Therefore, no assurance can be given that an individual user
will achieve results similar to those stated here.
1
3. IBM Montpellier Client Center
Our Client Center partners with
clients to meet their IT infrastructure
goals and improve their overall
business by demonstrating the
capabilities of the IBM solutions.
Smarter Computing Design: Benchmarks & Proofs of Concept:
System Briefings Energy, Cities, Cloud, Water, –PureSystems
Business Resilience –System z
Software Briefings –Power Systems
Enterprise Architecture Design
–HPC
Demonstrations z Key Workload Initiatives –System x & Blade
Advanced Technical Skills –Storage
Industry Showcases Solution Testing
ISV Solution Centers:
BP, ISV & CSI Support SAP, Oracle, Siebel WW GDPS Solution Testing
WW Financial Services CoE Software zTEC
New Technology Introduction
Talk & Teach Design Prove
2
4. Innovation Lab – Resources & Skills
Smarter Cities Innovation through R&D Collaborative Projects supported by CAS
France in partnership with Labs & Clients (funded by Governments or European
Commission) and Client projects
Big Data / BAO & Smarter Cities offerings
Customer Briefings & Workshops: Architecture, Design Session, PoC
Presales technical support: RFP, sizing, pilot support, architecture Showcases Xavier Vasques Virginie Radisson Marie Angèle Grilli Olivier Hess
Manager Business Leader Project Manager CTO
Smarter Energy & Cities: Innovation with a vision of improving Energy
Smarter cities
Consumption through the use of IT and BAO with Universities/company
Montpellier Water Management COE: The use of numerical simulations-HPC for
Water Manager as Deep Thunder from IBM Research first implemented in IBM
Europe by our IBM Montpellier team.
Christophe Menichetti Elsa Fabres Colin Dumontier Romain Chailan
BAO/Big Data Data Analytics & IOC HPC/Water Specialist PhD Student
Specialist Specialist
Saniya Ben Hassen Jean-Philippe Durney Denis Gras
BAO IT Architect BAO/Big Data IT Smarter Cities
Architect IT Architect
Promote and develop innovative assets around Data
applied to Smarter Planet/Cities issues in order to
engage customer and collaborative R&D projects
3
5. AGENDA
Big Data Challenges
> Why the interest is growing?
Big Data Technologies
> What is Big Data ?
IBM Big Data Solutions
> IBM Big Insights and IBM Streams
Big Data in action
> Our Customer Center Showcase experience
4
4
6. AGENDA
Big Data Challenges
> Why the interest is growing?
Big Data Technologies
> What is Big Data ?
IBM Big Data Solutions
> IBM Big Insights and IBM Streams
Big Data in action
> Our Customer Center Showcase experience
5
5
8. Our data rich world is exploding…
4.6
IT: Logs & 30 billion RFID
transactions
billion
tags today camera
Twitter process (1.3B in 2005 phones
7 TBs of world
data every day wide
900
million
GPS
devices
Facebook processes
sold
10 TBs of annually
World Data Centre for Climate data every day
keeps 220 TBS of Web data
by 2013
and 9 PBs of auxiliary
supporting data 2
billion
Capital market people
data volumes grew on the
76 million smart Web by
1,750%, 2003-06 meters in 2009… 2011
200M by 2014 Text, Blog,
Weblog 7
7
9. The Big Data Opportunity
Extracting insight from an immense volume, variety and velocity
of data, in context, beyond what was previously possible.
Variety: Manage the complexity of
multiple relational and non-
relational data types and
schemas
Velocity: Streaming data and large
volume data movement
Volume: Scale from terabytes to
zettabytes (1B TBs)
8
8
8
10. Bring Together a Large Volume and Variety of Data to Find New Insights
Multi-channel customer
sentiment and experience a
analysis
Detect life-threatening
conditions at hospitals in
time to intervene
Predict weather patterns to plan
optimal wind turbine usage, and
optimize capital expenditure on
asset placement
Make risk decisions based on
real-time transactional data
Identify criminals and threats
from disparate video, audio,
and data feeds
9
9
11. AGENDA
Big Data Challenges
> Why the interest is growing?
Big Data Technologies
> What is Big Data ?
IBM Big Data Solutions
> IBM Big Insights and IBM Streams
Big Data in action
> Our Customer Center Showcase experience
10
10
12. Big Data : why is it possible Now ?
Traditional approach : Data to Function Traditional approach
Application server and Database
User request Query Data server are separate
Database Data can be on multiple servers
Application Analysis Program can run on
server
server
multiple Application servers
Network is still a the middle
Send result return Data Data have to go through the network
process Data
Data
•Big Data Approach
Big Data approach : Function to Data Analysis Program runs where are
Query & the data : on Data Node
Send Function to process Data Only the Analysis Program are have
process on Data Data to go through the network
User request Data
nodes Analysis Program need to be
Data
nodes
Master Data
nodes MapReduce aware
node nodes Highly Scalable :
Data
Data 1000s Nodes
Data Petabytes and more
Data
Send Consolidate result
11
11
13. Big Data : why is it possible Now ?
Traditional approach : Data to Function Example :
User request Query Data How many hours Clint Eastwood
Database appears in all the movies he has done ?
Application
server All movies need to be parsed to find
server
Clint face
Send result return Data
Traditional approach : All movies are
process Data going to be sent through the Network
Data
Big Data approach : Function to Data
Query &
• Big Data Approach : Only the
Send Function to process Data Analysis Program and Clint picture are
process on Data Data sent through the Network
User request Data
nodes
Data
nodes
Master Data
nodes
node nodes
Data
Data
Data
Data
Send Consolidate result
12
12
14. Merging the Traditional and Big Data Approaches
Traditional Approach Big Data Approach
Structured & Repeatable Analysis Iterative & Exploratory Analysis
IT
Business Users
Delivers a platform to
Determine what enable creative
question to ask discovery
IT Business Users
Structures the Explores what
data to answer questions could be
that question asked
Monthly sales reports Brand sentiment
Profitability analysis Product strategy
Customer surveys Maximum asset utilization
13
13
15. AGENDA
Big Data Challenges
> Why the interest is growing?
Big Data Technologies
> What is Big Data ?
IBM Big Data Solutions
> IBM Big Insights and IBM Streams
Big Data in action
> Our Customer Center Showcase experience
14
14
16. IBM Big Data platform
Analyse unstructured Big
Data
Analyze structutred Big
Data
Analytic Applications Content Analytics
Cognos BI Reporting Exploration / Functional Industry Predictive
Reporting BIContent Index for contextual collaborative
Reporting / Content
SPSS Visualization App App Analytics Analytics
Analytics insights
Reporting
Create Reports on BigInsights , Analyze
In Streams Simplify your warehouse
Unlock Big Data Big Data Platform PureData Analytics, PureData
Operational Analytics
Infosphere Data Explorer
Visualization Application Systems Deliver deep insight with advanced
Gather, extract and explore data using
best of breed visualization
& Discovery Development Management in-database analytics and operational
analytics
Analyze Raw Rata Accelerators
InfoSphere BigInsights
Infosphere Streams (RT) Index Big Data
Data Explorer
Speed time to value with analytic and Hadoop Stream Data Content Analytics
application accelerators Content
System Computing Warehouse Management Index for contextual collaborative
insights
Reduce costs with Hadoop
PlatForm Computing , GPFS
Cost-effectively analyze Manage Big Data
petabytes of structured and
Gardium, Information Server
unstructured information
Information Integration & Governance Govern data quality and manage
information lifecycle
insights
Analyze Streaming Data
InfoSphere Streams
Cloud | Mobile | Security
Analyze streaming data and large data
bursts for real-time insights 15
17. AGENDA
Big Data Challenges
> Why the interest is growing?
Big Data Technologies
> What is Big Data ?
IBM Big Data Solutions
> IBM Big Insights and IBM Streams
InfoSphere BigInsights
16
16
18. What’s so Special About Open Source Hadoop?
Storage Scalable
• Distributed • New nodes can be added on the fly
• Reliable
• Commodity gear Affordable
• Massively parallel computing on
commodity servers
Flexible
• Hadoop is schema-less – can absorb
MapReduce any type of data
• Parallel Programming
Fault Tolerant
• Fault Tolerant
• Through MapReduce software
framework
17
17
19. Basic Hadoop principles: HDFS and MapReduce
Hadoop Distributed File System = HDFS : where Hadoop stores the data
– This file system spans all the nodes in a cluster
Hadoop computation model
– Data stored in a distributed file system spanning many inexpensive computers
– Bring function to the data
– Distribute application to the compute resources where the data is stored
– Scalable to thousands of nodes and petabytes of data
public static class TokenizerMapper Hadoop Data Nodes
extends Mapper<Object,Text,Text,IntWritable> {
private final static IntWritable
one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text val, Context
StringTokenizer itr =
new StringTokenizer(val.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
}
context.write(word, one); 1. Map Phase
}
} (break job into small parts)
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWrita
private IntWritable result = new IntWritable();
public void reduce(Text key, Distribute map 2. Shuffle
Iterable<IntWritable> val, Context context){
int sum = 0;
for (IntWritable v : val) { tasks to cluster (transfer interim output
sum += v.get();
. . .
for final processing)
MapReduce Application 3. Reduce Phase
(boil all output down to
Shuffle a single result set)
Result Set Return a single result set
18
18
20. InfoSphere BigInsights
Platform for volume, variety, velocity -- V3
Enhanced Hadoop foundation
Analytics for V3
Enterprise Edition
Text analytics & tooling
Licensed
Usability
Web console Business process accelerators (“Apps”)
Integrated install Text analytics
Spreadsheet-style analysis tool
Spreadsheet-style tool
Enterprise class
RDBMS, warehouse connectivity
Ready-made “apps”
Integrated Web-based console
Enterprise Class Basic Edition Flexible job scheduler
Storage, security, cluster Performance enhancements
Free download
management Eclipse-based tooling
Integration Integrated install LDAP authentication
Online InfoCenter
Connectivity to DB2, Netezza, JDBC ....
BigData Univ.
databases, SPSS, Cognos, Unica, Apache
Hadoop
coremetrics, Streams, Datastage
Breadth of capabilities 19
19
21. InfoSphere BigInsights – A Full Hadoop Stack
Open Source Components IBM Specific Components
20
20
22. Vestas optimizes
capital investments
based on 3 Petabytes
of information.
Capabilities Utilized:
InfoSphere BigInsights
InfoSphere Warehouse
• Model the weather to optimize
placement of turbines, maximizing
power generation and longevity.
• Reduce time required to identify
placement of turbine from weeks to
hours.
• Incorporate 3 PB of structured and
semi-structured information flows.
• Data volume expected to grow to 6 PB.
21
21
2
23. AGENDA
Big Data Challenges
> Why the interest is growing?
Big Data Technologies
> What is Big Data ?
IBM Big Data Solutions
> IBM Big Insights and IBM Streams
InfoSphere Streams
>
22
22
24. IBM InfoSphere Streams for companies who need to…
Real-time delivery
Deal with Terabytes of data each
second ICU Environment
Monitoring
Monitoring
Work with application, sensor and Algo Powerful Telco churn
Analytics predict
internet data, video/audio Trading
Cyber Smart
Security Government / Grid
Deliver insight in microseconds to Law enforcement
analytical applications
Support complex scenarios using Millions of
events per Microsecond
C++ or Java code Latency
second
Integrate with existing analytics
& data warehousing investments
Traditional /
Non-traditional
data sources
23
23
25. Stream Computing – Analyze Data in Motion
Traditional Computing Stream Computing
Historical fact finding Current fact finding
Find and analyze information stored on disk Analyze data in motion – before it is stored
Batch paradigm, pull model Low latency paradigm, push model
Query-driven: submits queries to static data Data driven – bring the data to the query
24
24
26. Big Data in Real-Time with InfoSphere Streams
Filter / Sample
Modify Annotate
Fuse
Classify
25
25
27. Asian Telco reduces
billing costs and
improves customer
satisfaction
Capabilities:
Stream Computing
Analytic Accelerators
Real-time mediation and analysis of
5B CDRs per day
Data processing time reduced from
12 hrs to 1 min
Hardware cost reduced to 1/8th
Proactively address issues
(e.g. dropped calls) impacting customer
satisfaction.
26
26
2
28. Most Use Cases Combine Technologies
Variety
Volume
Combination of Streams filters
Non-traditional/ incoming data
internet data
with traditional
data
InfoSphere BigInsights
InfoSphere Streams
Traditional
Data Reuse Warehouse
Analytic models
IBM Data Warehouse Velocity
Persistent Data In-Motion Data
27
27
29. Big Data Patterns
Common Big Data and Warehouse patterns
Separate unstructured & structured analysis Common analysis of structured and unstructured
data
App /BI App / BI App / BI
Visualization Visualization Visualization / Exploration
Exploration Exploration
BigInsights Warehouse
BigInsights Warehouse
Unstructured Structured
Unstructured Structured
Warehouse and BigInsights partitioning Warehouse batch offload
App / BI App / BI
App / BI
Visualization Visualization
Visualization
Exploration Exploration
Exploration
Warehouse BigInsights BigInsights
Warehouse
Structured
Structured 28
30. Big Data Patterns
Common Big Data and Warehouse patterns
In motion, at rest analysis with BigInsights In motion and at rest applications
Real time App Analytic App / Real time App / BI
/ BI BI
Streams BigInsights
Streams BigInsights
Warehouse
Warehouse
Streaming data Streaming data
In motion, at rest analysis of structured and In motion, structured at rest analysis
unstructured data
Real time App Real time App Analytic App / BI
Analytic App / BI
/ BI / BI
BigInsights Warehouse Streams BigInsights Warehouse
Streams
Streaming data Unstructured Structured Streaming data Structured
data data data
29
31. AGENDA
Big Data Challenges
> Why the interest is growing?
Big Data Technologies
> What is Big Data ?
IBM Big Data Solutions
> IBM Big Insights and IBM Streams
Big Data in action
> Our Customer Center Showcase experience
30
30
32. Big Data Use Cases and customer outcomes
Findings from the research collaboration of IBM Institute for Business Value and Saïd Business School, University of Oxford
Big data objectives Big data sources
Customer-centric outcomes New business model
Respondents were
Operational optimization Employee collaboration asked which data
Risk / financial management sources are currently
being collected and
Top functional objectives identified by organizations with active big analyzed as part of
data pilots or implementations. Responses have been weighted active big data efforts
and aggregated. within their organization.
31
33. Operations / Performance Data is Exploding
A typical enterprise with 5000 servers, running 125 applications across 2 to 3
data centers generates in excess of 1.3 TB of data per day
Data Ratio
Only 3% of the data generate is operations Metric Data Unstructured Data
oriented metric data. 3%
97% is made up of unstructured/semi
structured data 97%
Workloads are running on heterogeneous
platforms.
32
34. Log Analysis: Problem Characteristics
Several thousand log files collected daily, data collected over several years
Infrastructure (Servers, Networks, Storage), Middleware (App Server, Web Server, Database Server,
Messaging Server), Apps
Value in collocating and co-analyzing the above data
Millions of files, petabytes of data in total, terabytes produced per day.
The relationships between logs (links shown below) have to be discovered
Large percentage of storage in an enterprise is for log data
Analysis of log data has many challenges
One replica stops
responding... Collection and parsing of data
App
2
App
Server Interpretation of logs
App Load
2 Balancer
Replicated
Database SMEs flooded with common bugs
...causing a fraction of database calls to time out...
Lack of a joined up view.
...which leads to intermittent failures in the
application. Reactive rather than proactive
33
35. Central Lab Platform – Before
The consequence of scattered Infrastructures for hands-on classes are
high costs and business transformation roadblocks
34
34
36. Central Lab Platform – After
The scattered infrastructures were transformed into a
centralized consolidated hands-on Cloud Platform
35
35
37. Central Lab Platform Cloud Architecture
SELF-SERVE SERVICE SERVICE DYNAMIC
PORTAL REQUEST PROVISIONING INFRASTRUCTURE
Class
Manager Teacher & Students
Management
CLP Cloud Management Front-end Internet
access
Web Portal
Planning VPN
Reporting
Invoicing
Reservation
CLP Application engine Setup
manager
Shared CLP
Resources
TA CLP
TPM
Daily repl.
Workflows
TA DB CLP DB & Scripts
36
39. Big Data Project Trends & Directions
2 Majors Front End objective to demonstrate Big Data Benefit
Navigating Enterprise Information: “Leverage Big Data Business Value”
• 360° Operational View : To accelerate incident resolution
• 360° Business View : To provide metric and Insight
– Cloud Data Center utilization : Data Center Business View
– Training Labs : Data Center’s Customer Business View
Predictive Incident Alerting : “Act Proactively on Incident”
• Create Predictive Models based on log history to alert before Incident
arrived
• Reduce number of Incident Tickets
38
40. How support Team Work today : Many applications / Information dispersion
39
41. Navigating Enterprise Information: 360° Operational View
About | Help | Profile | Logout - Durney
Power System System X System Z Storage Software
Sort by: Date Relevance Title
Search: 153494
Your query has been expanded. Show Expansions
0 documents selected. Select/deselect all on this page Global Status
Documentation Service Warn Error Down Up
Top 76 Results
Ticket Citrix 0 0 0 10
Creator ID Assignee Status Priority Course code Class # Contact Network 0 0 0 180
Lab Setup Guide (4)
Nick Yabut 153494 Jean-Philippe Durney Open 2 AN14GB H65X Martin Elliff Storage 0 0 0 170
Courses Exercices (3)
need to rebuild LPAR2 for this course (sys5442_lpar2), but can't log into the class NIM server Phone # Master
Production documentation (10)
nim151 (10.6.151.35). It appears to be off-line and it is not showing on the managed system. Cell # servers 2 0 0 4
Best Practice (3) Emailmartin.elliff@mail.com Nim 4 0 56 87
Citrix (3) Sametime ID inst151 TPM 2 0 0 1
Provisionning (10)
Storage (4) Course Schedule
Open Tickets
TSM (3)
Overview (4) Sev 1 Sev 2 Sev 3
Processes (5) 5
Tech Choices (12) 4
How To (15)
3
more | all
2
Lotus Notes 1
Re: AN14 scripts on LPAR 10nov.2012 0
I have copied a tar file with all the script for the an14
course on you nim server "sys3862_nim1" in Ticket on AN14 h-24 h-12 h-6 h-1
/home/an14. ... AN14 scripts on LPAR Sent by: ID Assignee Status Priority Course code Class # Contact
Jeffrey Emmanuel D ...
153301 Pascal Seignez Closed 2 AN14GB H65X Martin Elliff
Re: Ticket #123078 course AN14 ref 8849/E9D4/9416
26fév.2012 Access
AN14 ref 8849/E9D4/9416 Hello We have sent 3 IBM CLP class information: AN14GB / H65X (Jan. 21, 2013, 12:00 PM)sys5442 -- We have found that when a device
course kits : IBM CLP class ... is deleted from any of the LPARs (rmdev -dl hdisk2), cfgmgr has to be run twice to bring the device back online. I'm Top 11 Results
sure this is not standard behaviour. Can you explain why this is happening?
mime.htm (Ticket #123078 Updated (IBM Problem Tracking & ...
25fév.2012
AN14 class number E9D4 - customer sent message 150000 Pascal Seignez Closed 2 AN14GB 9023 Amin Ezzy HMC (4)
on St to request for 4 more additional ... AN14 class TPMHMC (3)
number E9D4, and also applid for two more kits ST: AN14G 9023 all students could not log in to the HMCs, username / password error
because the students ... Course HMC (1)
HMC Power down 03Jan.2013 NIM (2)
Pour le cours an14, ZRGV, l'instructeur demande •nim_master
pourquoi les lpars sont en AIX 7. … 60906 2nd Level Support Closed 2 AN14GB 2861 Martin Elliff
•nim151
Customer has issues on course AN14GB/2861. Storage (2)
more | all CLP Servers (3)
Citrix (1)
59861 Jean Midot Closed 2 AN140 VYRM Ben Gibbs Admin Tools (5)
Unable to log in to citrix (elabs), UID and PW not working : error : invalid credentials for all one example : UID :
stud148_1 pw : dayheat_67 more | all
40
42. Navigating Enterprise Information: 360° Business View
About | Help | Profile | Logout - Durney
Power System System X System Z Storage Software
Sort by: Date Relevance Title
Search:
Your query has been expanded. Show Expansions
Show Metrics
• Number of Running courses versus Number of Logged Students (typically Extract through Big Data Log
Analysis)
• Cumulative time usage per course/session
• Servers, Storage usage
• Electric consumption
Create Correlated views
• Electric Consumption versus number of courses running
• Consolidate view per by Global Training Partner
Analyze operation
• Number of Ticket per Courses Brand, per Course, per Geo
• Average Resolution time per Incident type
• Top 10 incident per frequency
• Top 10 incident per Geo
• Top 10 course per Geo
41
41
43. To learn more and deeper
IBM Tivoli Product to monitor and analyse machine logs:
> IBM Log Analytics
Download the presentation
on Pulse2013 site
Session 1844 :
Problem Determination and
Resolution in Minutes Using
Unstructured Data Analytics
Martin O’Brien - Product
Manager
Geetha Adinarayan - Client
Best Practices Lead
42