More Related Content Similar to Analyzing Big Data - Jeff Scheel (20) Analyzing Big Data - Jeff Scheel1. © 2014 IBM Corporation
Open '14
Analyzing Big Data
Jeff Scheel
Chief Engineer
Linux on Power
June 2, 2014
scheel@us.ibm.com
2. © 2014 IBM Corporation2
Agenda
1. Getting started with Big Data
2. OpenPOWER Foundation
3. The future of Analytics
3. © 2014 IBM Corporation
Getting started with Big Data
4. © 2014 IBM Corporation4
Big Data is growing
and moving fast from
a variety of sources,
are you keeping up?
• 1 Trillion connected
devices generate 2.5
quintillion bytes data /
day
• 80% of the world’s data
today is unstructured
• 1 in 2 business leaders
don’t have access to
data they need
5. © 2014 IBM Corporation5
“Data is the new oil”
In its raw form, oil has little value. Once processed and refined, it helps power the world.
“Big Data has arrived at Seton
Health Care Family, fortunately
accompanied by an analytics tool
that will help deal with the
complexity of more than two
million patient contacts a year…”
“Data is the new oil.”
Clive Humby
“At the World Economic Forum
last month in Davos,
Switzerland, Big Data was a
marquee topic. A report by the
forum, “Big Data, Big Impact,”
declared data a new class of
economic asset, like currency or
gold.
“Increasingly, businesses are
applying analytics to social
media such as Facebook and
Twitter, as well as to product
review websites, to try to
“understand where customers
are, what makes them tick and
what they want”, says Deepak
Advani, who heads IBM’s
predictive analytics group.”
“Companies are being inundated
with data—from information on
customer-buying habits to
supply-chain efficiency. But many
managers struggle to make
sense of the numbers.”
6. © 2014 IBM Corporation6
The challenge: handling the large Volume, Variety, Velocity, and
Veracity of data to find new insights and improve business outcome
BI / Reporting Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream Computing Data Warehouse
MFG - Analyze & correlate
log records to improve
service and predict failures
Telco - Address
customer satisfaction,
Predict churn, and match
promotions in real time
Healthcare - Detect life-
threatening conditions at
hospitals in time to
intervene
Retail - Multi-channel
customer sentiment and
experience analysis
Financial Services - Make
risk decisions based on
real-time transactional
data
Law Enforcement -
Identify criminals and
threats from video, audio
feeds
7. © 2014 IBM Corporation7
Customers are deploying new infrastructure to leverage all data types
Data in
Motion
Data at
Rest
Data in
Many Forms
Information
Ingestion and
Operational
Information
Decision
Management
BI and Predictive
Analytics
Navigation
and Discovery
Intelligence
Analysis
Landing Area,
Analytics Zone
and Archive
Raw Data
Structured Data
Text Analytics
Data Mining
Entity Analytics
Machine Learning
Real-time
Analytics
Video/Audio
Network/Sensor
Entity Analytics
Predictive
Exploration,
Integrated
Warehouse, and
Mart Zones
Discovery
Deep
Reflection
Operational
Predictive Stream Processing
Data Integration
Master Data
Stream
s
Information Governance, Security and Business Continuity
Hadoop Infrastructure – currently being
deployed on commodity hardware
Hadoop Infrastructure – currently being
deployed on commodity hardware
8. © 2014 IBM Corporation8
WATSON
Two new Watson-based products:
• Interactive Care Insights for Oncology
• The WellPoint Interactive Care Guide and
Interactive Care Reviewer
IBM and Red Hat
innovating in Healthcare
with Watson
• Watson's oncology education:
• 600,000 pieces of medical
evidence
• 2 million pages of text
• 25,000 training cases
• Watson can review
1.5 million patient records
faster than it takes most office
computers to boot up
9. © 2014 IBM Corporation9
Big Data implementation patterns
Common analysis of structured &
unstructured data
WarehouseHadoop
App / BI
Visualization / Exploration
Warehouse and BigInsights partitioning
HadoopWarehouse
App / BI
Visualization
Exploration
App / BI
Visualization
Exploration
App / BI
Visualization
Exploration
HadoopWarehouse
Warehouse batch offload
Warehouse
App /BI
Visualization
Exploration
Hadoop
StructuredUnstructured
App / BI
Visualization
Exploration
Separate unstructured &
structured analysis
StructuredUnstructured
Structured Structured
10. © 2014 IBM Corporation10
What the experts say
1. Seek project input from Sales,
Marketing, and Operations
teams
2. Select projects which are well-
defined and have quick ROI –
less than a year
3. Leverage your experiences
from data warehouse and
business intelligence projects
4. Avoid starting with “Big Bang”
Source: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=POL03133USEN
11. © 2014 IBM Corporation11
More ideas for starting
Warehouse
App /BI
Visualization
Exploration
Hadoop
Existing BI Stack
App / BI
Visualization
Exploration
Separate unstructured & structured analysis
New
Find a small problem to solve, i.e. an
internal phone directory, and start
“on-the-side”.
Locate relevant data and identify
pieces what are “in motion” or “at
rest”.
For data at rest, build opensource Hadoop on your PowerLinux
system or try the InfoSphere BigInsights Basic Edition (no charge).
For data in motion, use the InfoSphere Streams trial download.
Reference the IBM Information Center for details on
how to import data into Hadoop and
how to write applications using Streams Studio.
Explore Datameer to visualize your Hadoop based Big Data
12. © 2014 IBM Corporation12
PowerLinux jump start services facilitate starting with Big
Data Analytics
5 Day IBM Power Analytics
Services Jump Start
Includes:
• 5 days, on-site service offering
• Quick Analytics Assessment Workshop
•Software Installation
• Hands on education in getting started
• Evaluating the analytical approach for your
business that will make the biggest impact
• Quick sample application to consume
customer data
Reference Architecture Workshop
Why Jump Start Services for your
IBM Power Analytics solution?
• Learn how to optimally leverage IBM Power
System for Analytics
• Learn the benefits and reasoning of Big Data
•Learn how to gain business value from the
data you have
2 Day IBM Power Analytics
Services Jump Start
Includes:
• 2 days, on-site Big Data Analytics service
offering
•Software installation
• Hands on education in getting started
Evaluating the analytical approach for your
business that will make the biggest impact
IBM Systems Lab Services & Training - Power Systems
Services for PowerLinux, AIX, and OS
Contact – Linda Hoben, Opportunity Manager, hoben@us.ibm.com
IBM Power Servers is an ideal
platform for streaming data and
performing analytic computations for
a multitude of applications.
Let us help make you successful!
13. © 2014 IBM Corporation13
IBM POWER has a strong history in transactional processing
workloads
1,556 2,845 5,669 9,200 12,602
23,871
32,046
50,164
63,021
95,081
150,000$109.00
$89.00
$52.70
$43.00
$17.80
$8.31
$5.42 $5.19 $2.97 $2.81 $0.69
0
20000
40000
60000
80000
100000
120000
140000
160000
S70 S7A S80 S85 p690 p690+ p690++ p5-595 p5-595+ P6 595 P7 780
$0
$20
$40
$60
$80
$100
$120
tpcC $/tpcC
14. © 2014 IBM Corporation14
POWER8 Processor
Caches
• 512 KB SRAM L2 / core
• 96 MB eDRAM shared L3
• Up to 128 MB eDRAM L4
(off-chip)
Cores
• 12 cores (SMT8)
• 8 dispatch, 10 issue,
16 exec pipe
• 2X internal data
flows/queues
• Enhanced prefetching
• 64K data cache,
32K instruction cache
Accelerators
• Crypto & memory expansion
• Transactional Memory
• VMM assist
• Data Move / VM Mobility Energy Management
• On-chip Power Management Micro-controller
• Integrated Per-core VRM
• Critical Path Monitors
Technology
•22nm SOI, eDRAM, 15 ML 650mm2
Memory
• Up to 230 GB/s
sustained bandwidth
Bus Interfaces
• Durable open memory
attach interface
• Integrated PCIe Gen3
• SMP Interconnect
• CAPI (Coherent
Accelerator Processor
Interface)
ComputerWorld: To make the chip faster, IBM has
turned to a more advanced manufacturing process,
increased the clock speed and added more cache
memory, but perhaps the biggest change heralded
by the Power8 cannot be found in the specifications.
After years of restricting Power processors to its
servers, IBM is throwing open the gates and will be
licensing Power8 to third-party chip and component
makers.
The Register: the Power8 is so clearly engineered
for midrange and enterprise systems for running
applications on a giant shared memory space,
backed by lots of cores and threads. Power8 does
not belong in a smartphone unless you want one the
size of a shoebox that weighs 20 pounds. But it most
certainly does belong in a badass server, and
Power8 is by far one of the most elegant chips that
Big Blue has ever created, based on the initial specs.
PCWorld: With Power8, IBM has more than doubled
the sustained memory bandwidth from the Power7
and Power7+, to 230 GB/s, as well as I/O speed, to
48 GB/s. Put another way, Watson’s ability to look up
and respond to information has more than doubled
as well.
Microprocessor report: Called Power8, the new
chip delivers impressive numbers, doubling the
performance of its already powerful predecessor,
Power7+. Oracle currently leads in server-processor
performance, but IBM’s new chip will crush those
records. The Power8 specs are mind boggling.
Source: Hotchips presentation
15. © 2014 IBM Corporation15
POWER8 delivers 2.5x performance on Big Data / Hadoop
POWER8 reduces the number of servers by 60% based on the best x86 published Terasort
result
POWER8 S822L will deliver over 2x the
performance of the best published x86 system
… and continues to offer far superior RAS
POWER8 delivers 1.7X over HP on a
per-core normalized benchmark.
POWER8 exploits additional cores, more
threads, larger caches, memory bandwidth
Terasort is a popular benchmark to measure
the performance of a Hadoop solution
Sorts a large dataset (10 TB) in parallel
Exercises the Map-reduced framework
and Hadoop Distributed File System
(HDFS)
>2x>2x
Relative System Performance
0
0.5
1
1.5
2
2.5
3
POWER8 Cisco
2.5x2.5x
IBM Analytics Stack: IBM Power System S822L; 24 cores / 192 threads, POWER8; 3.0GHz, 512 GB memory, RHEL 6.5, InfoSphere BigInsights 3.0
Compared to a 16 Cores HP system
http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns944/le_tera.pdf
16. © 2014 IBM Corporation16
Power Systems
S822LPower Systems
S812L
• 1-socket, 2U
• Linux Only
• 2-socket, 2U
• Linux Only
• 2-socket, 2U
• All Operating Systems
Power Systems
S822
Power Systems
S814
• 1-socket, 4U
• All Operating Systems
Power Systems
S824
• 2-socket, 4U
• All Operating
SystemsPower Systems
S824L
• 2-socket, 4U
• Linux Only
• SOD
1 & 2 Sockets
New IBM Power Systems based on POWER8
17. © 2014 IBM Corporation
OpenPOWER Foundation – The emerging
ecosystem
18. 18 © OpenPOWER Foundation 2014
Industry trends
• The number of companies designing & building servers is
increasing
– Traditionally there have been few companies designing systems: HP, IBM, SUN, Dell,
etc.
– Today there are many more: Google, Microsoft, Facebook, Rackspace, Huawei,
Sugon, Inspur, etc.
– A fairly mature ecosystem including the Taiwanese ODMs is a key enabler of this
trend
• Numerous disruptive forces are impacting these custom
system designs and driving designers to consider new ways of
innovating
– Ability to handle rapid growth in Big Data & Analytics based solutions
– Choice and Innovation
– CPU SOC integration drive need for chip development
• These trends create a need for a server targeted “chip-system-
software” ecosystem
– IBM has technology and a software stack ready to meet these needs
– IBM recognizes the need to work with partners to create this ecosystem
– IBM recognizes the need for choice and options in processor sourcing
19. 19 © OpenPOWER Foundation 2014
OpenPOWER Foundation Structure
OpenPOWER is an industry foundation based on the POWER architecture, enabling an Open
community for development and opportunity for member differentiation and growth
20. 20 © OpenPOWER Foundation 2014
Building collaboration and innovation at all levels
Welcoming new members in all areas of the ecosystem
100+ inquiries and numerous active dialogues underway
Boards/Systems
I/O, Storage, Acceleration
Chip/SOC
System/Software/Services
21. 21 © OpenPOWER Foundation 2014
OpenPOWER Proposed Ecosystem Enablement
XCATXCAT
System Operating Environment Software Stack
A modern development environment is emerging
based on tools and services
Cloud
Software
Operating
System / KVM
Standard Operating
Environment
(System Mgmt)
Software
Power Open Source Software Stack Components
Existing
Open
Source
Software
Communitie
s
Firmware
Hardware
New OSS
Community
OpenPOWER
Technology
OpenPOWER
Firmware
CAPP
PCIe
POWER8
CAPI over PCIe
“Standard POWER Products” – 2014
Hardware
“Custom POWER SoC” – Future
Customizable
Framework to Integrate
System IP on Chip
Industry IP License Model
Multiple Options to Design with POWER Technology Within OpenPOWER
22. © 2014 IBM Corporation22
Non-IBM POWER8 products
http://www.enterprisetech.com/2014/04/28/inside-google-tyan-power8-server-boards/
The Tyan reference (ATX) board,
SP010, measures 12” by 9.6”
➢
one single-chip module (SCM)
➢
four DDR3 memory slots
➢
four 6 Gb/sec SATA peripheral connectors
➢
two USB 3.0 ports
➢
two Gigabit Ethernet network interfaces
➢
keyboard and video
➢
intended for developers
The Google reference board
➢
two single-chip module (SCM)
➢
four modified SATA ports
➢
Google use only
23. © 2014 IBM Corporation
The future of Analytics
24. © 2014 IBM Corporation24
The future of Analytics: An open approach
Open Platform for
Choice
25. 25 © OpenPOWER Foundation 2014
POWER8 CAPI
Custom
Hardware
Application
POWER8
CAPP
Coherence Bus
PSL
FPGA or ASIC
Customizable Hardware
Application Accelerator
• Specific system SW, middleware, or user application
• Written to durable interface provided by PSL
POWER8
PCIe Gen 3
Transport for encapsulated messages
Processor Service Layer (PSL)
• Present robust, durable interfaces to applications
• Offload complexity / content from CAPP
Virtual Addressing
• Accelerator can work with same memory addresses that the
processors use
• Pointers de-referenced same as the host application
• Removes OS & device driver overhead
Hardware Managed Cache Coherence
• Enables the accelerator to participate in “Locks” as a normal thread
Lowers Latency over IO communication model
Coherent Accelerator Processor Interface (CAPI)
26. © 2014 IBM Corporation26
Coherent Accelerator Processor Interface (CAPI) Overview
CAPP PCIe
POWER8 Processor
Typical I/O Model Flow
Flow with a Coherent Model
Shared Mem.
Notify Accelerator
Acceleration
Shared Memory
Completion
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator
Acceleration
Poll / Int
Completion
Copy or Unpin
Result Data
Ret. From DD
Completion
FPGA
Functionn
Function0
Function1
Function2
CAPI
IBM Supplied POWER
Service Layer
27. © 2014 IBM Corporation27
Example: Innovative “In-Memory” NoSQL/KVS Integrated Solution - via
POWER8 CAPI-attached Flash
WWW
10Gb Uplink
POWER8 Server
Flash Array w/ up
to 40TB
Differentiated NoSQL
(POWER8 + CAPI Flash)
Infrastructure Attributes
- 192 threads in 4U Server drawer
- 40 TB of memory based Flash per 4U Drawer
- Shared Memory & Cache for dynamic tuning
- Elimination of I/O and Network Overhead
- Cluster solution in a box
5X Cost Reduction with
equivalent performance
WWW
500GB
Cache Node500GB
Cache Node500GB
Cache Node500GB
Cache Node500GB
Cache Node500GB
Cache Node
Backup Node
Load Balancer
Today’s NoSQL
in memory (x86)
10Gb Uplink
Infrastructure Requirements
- Large Distributed (Scale out)
- Large Memory per node
- Networking Bandwidth Needs
- Load Balancing
Power CAPI-attached Flash model for NoSQL offers dramatic (24:1) density advantage
29. © 2014 IBM Corporation29
For more information on Big Data / Analytics
● Sales kits
– PartnerWorld
– IBM internal
● Worldwide contacts
– Renato Loffreda-Mancinelli, World Wide Business Analytics
and Big Data Solutions on Power - Business Dev. Leader
(loffreda@us.ibm.com)
– Michael Tabron, Solution Offering Manager, Power Analytics
(tabron@us.ibm.com)
– Gina King, Solution Offering Manager, Big Data Analytics
(glking@us.ibm.com)
– Bob Friske, Marketing Manager (rfriske@us.ibm.com)
30. © 2014 IBM Corporation30
Q & A
Summary:
1.Getting started with Big Data is the
toughest part. Start simple, small,
and on the side.
2.The OpenPOWER Foundation
enables new systems and helps
support the emerging analytic
solutions around NoSQL
databases.
3.POWER8 technology like CAPI will
enable new solutions from IBM and
the OpenPOWER Foundation
31. © 2014 IBM Corporation31
Special notices
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in
other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM
offerings available in your area.
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions
on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give
you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY
10504-1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives
only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or
guarantees either expressed or implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the
results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations
and conditions.
IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions
worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment
type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal
without notice.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.
All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are
dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this
document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-
available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document
should verify the applicable data for their specific environment.
Revised September 26, 2006
33. © 2014 IBM Corporation33
Where to find more information? http://openpowerfoundation.org/