CTO of ParStream Joerg Bienert hold a presentation on February 25, 2014 about Big Data for Business Users. He talked about several use cases of current ParStream customers and ParStreams' technology itself.
2. Big Data
“Every two days now we create as much information
as we did from the dawn of civilization up until 2003.”
Eric Schmidt, Ex Google CEO
Real Time
“85% of respondents say the issue is not about volume
but the ability to analyze and act on data in real time”
Cap Gemini Study on Big Data 2012
Fast Data
“It’s About Fast (not just Big) Data”
Karl Keirstead, BMO Capital Markets 2013
3. Real-time on Big Data becomes
essential for survival of businesses
Fraud prevention
Algo trading
A/B-Testing
Campaign steering
Interactive Analytics App analytics
Recommendation engine
Trading risk analytics
Algorithmic decisions
Network monitoring
Realtime
Network Data
Web Logs
M2M
Sensors
Shopping Cart
Programmatic ad-serving
Big Data
Twitter
Point of Sale Data
Stock Data
Logicstics
Locations
Car Data
Financial TX
7. Immediate Answers & Availability
Batch Import
Real-Time
Automatic response systems
● Offer-Caches
Response time
● Ad-Serving
● Re-Targeting
Trading analytics ●
● Recommendation
● Smart Grids
/ promotional items
● Guided Shopping
● SEO analytics
● Fraud detection
● Investment risk analytics
● Campaign Control
● Application monitoring
● Geo-spatial analytics ● Trend-Spotting
● Web-Analytics
< 1..10 milli sec
10..100 milli sec
1 sec
10 sec
● Geo-Steering
Customer account analytics ●
● Revenue assurance
● Prepaid-accounts
Lag Time
Answers
Interactive Analytics
Continuous Import
1 min
● Customer churn rate reduction
10 min
Post-mortem Analytics
Weekly
Daily
Online Investigation
Hourly
Every minute
Availability
1h
Every second
8. USE CASES IN ALL INDUSTRIES
Many Applications
All Industries
eCommerce
Services
Social
Networks
Telco
Facetted
Search
Web
analytics
SEOanalytics
OnlineAdvertising
Ad serving
Profiling
Targeting
Customer
attrition
prevention
Network
monitoring
Targeting
Prepaid
account
mgmt
Finance
Trend
analysis
Fraud
detection
Automatic
trading
Risk
analysis
Energy
Oil and Gas
Smart
metering
Smart grids
Wind parks
Mining
Solar Panels
Many
More
Production
Mining
M2M
Sensors
Genetics
Intelligence
Weather
Confidential
8
9. Real-time Requires New Technology
1
Immediate
Availability
2
Billion
Records
3
Immediate
Answers
4
Interactive
Analytics
Real-Time
Monitoring
Any Stream
Continuous
Data Import
Any Bus
Any File
5
Geo-Distributed
Processing
Realtime
Big Data
Engine
Ultra-fast
Querying
Real-Time
Dashboarding
Interactive
Analytics
6
Low
TCO
9
10. Web-Analytics
etracker is a leading web-analytics and campaign
steering company in Europe
Real-time web-analytics for 50,000
domains delivering 10 billion web-clicks
Continuous data import with maximum
latency of 30 seconds
Complex interactive analytics for lifesegmentation of customer groups
< 2 sec query response time for
> 100 concurrent interactive user
Campaign steering – moving ahead
from trail and error to continuous
multidimensional optimization
11. Gasturbines
ParStream imports 500,000 sensor readings per sec
delivering real-time monitoring and long-term analytics
5,000 sensors are delivering
1,800,000,000 measurements per hour
ParStream immediately imports and
stores all sensor readings
Real-time monitoring with ParStream
ensures early issue identification
Long-term analytics for predictive
maintenance reduces downtime
Maintenance of gas turbines is a more
lucrative business than the initial build
12. FMCG Retailer
ParStream extends usage of QlikView installation
from 400M to 6B records for interactive analytics
Customer is the leading retail chain in
Austria, a long term QlikView customer
POS-data analytics is heavily used
for price negotiations with vendors
QlikView is easy to use and ultra fast
but limits data volume to 400M records
Limited volume, time range and
granularity of data hinders negotiations
ParStream extends usage of QlikView
from 2 weeks to 6 month of data
Further extension to 30 billion records
planned to cover 2.5 years of data
13. Telecom
End-to-end network monitoring on packet-level detail
unveils bottle-necks unseen for decades
Netw
ork
Analy
tics
NPI
Analy
tics
Analy
tics
CRM/
CEM
Analy
tics
M2M
Analy
tics
Continuous import with >1 million rows
per second per node
Package level granularity delivers
Decentralized
storage & analytics
Ad-hoc integration
previously impossible insights
Cache
Field trail discovered bottle-neck
nobody expected, billion dollar
investment saved
Logical data
warehouse
NoSQL
Federation Server
Decentralized architecture capturing,
storing and analyzing data at source
Local
NDC
Local
NDC
Local
NDC
Local
NDC
Local
NDC
Massive reduction in network traffic
due to decentralized storage
Solution is blue-print for
Internet-of-Things use-cases
14. SEO Analytics at Searchmetrics
Interactive domain
traffic competitor
report & analysis
Google Search
First 100
domains
for 10 million
keywords in
10 countries
• Keyword-Analysis of competitor
domains
• Complex SQL Queries in Realtime
<1 sec response time
v
Application Server
• 7 Tbyte mport
• 10 billion records
Complex correlative
SQL queries of
many concurrent users
10,000,000,000
domain keyword relations
• < 1 sec Response time
• Reduction from 150 to 4 Servers
15. Bio-Technology
INRA MetaGenoPolis (MGP) analyzes 17 billion
records interactively – growing 100x per year
INRA is the world leader in metagenomic research
Up to 50 million different bacteria are
identified per stool sample
Sample size will grow by 100x over
next 12 month
Data volume will grow from 17 billion
to 2 trillion records
Researchers analyze correlation of
bacteria presence with illnesses
ParStream is used to interactively
discover and analyze correlations
16. Science: Climate Research
Detection of Hurricane Risk Areas
• Interactive Analytics of
weather simulation data
• Response time 0.1 sec
on 3 billion data records
• Multi-dimensional querying
on geo-location data
• Run complex queries In-Database
at very high speed
• No need for Cubes –
up-to-date & full granularity
• Continuously import
new data with low-latency
17. Facetted Search
Coface Services is the Innovation Leader
in reliable Business Information
Interactive guided selection process
delivers better conversion rate
Multi-lingual text search and
numeric-multiple-choice filters
15 billion data points
1,000 Coface columns
+10,000 Customer columns
>100 concurrent users
< 100 ms response time
18. Real-time Requires New Technology
1
Immediate
Availability
2
Billion
Records
3
Immediate
Answers
4
Interactive
Analytics
Real-Time
Monitoring
Any Stream
Continuous
Data Import
Any Bus
Any File
5
Geo-Distributed
Processing
Realtime
Big Data
Engine
Ultra-fast
Querying
Real-Time
Dashboarding
Interactive
Analytics
6
Low
TCO
18
19. Needs vs. Reality
You want…
What you get…
Scales on big data
and big streams
Does not scale
(traditional DBMS)
Sub-Second queries
high speed import
Too Slow
(Hadoop, Map Reduce)
Fully flexible
fully granular
Inflexible
(Cassandra, KVS)
20. ParStream Is Build For Fast Data
ParStream is the
fastest real-time database
for smart data
Continous
Import
Ultra-fast
Querying
High Query
Throughput
Billions of
Records
Thousands
Of Columns
Unique Combination of
continuous high speed import and
ultra-fast query response times
21. Outstanding Technology with USP –
high performance compressed index
Patented high performance
Front-End
Application
Tool
compressed index - USP!
Build from scratch in C++
100 % own patented IP
Leading edge DB architecture
Massively parallel shared
nothing cluster architecture
C++
UDF - API
SQL API / JDBC / ODBC
Real-Time Analytics Engine
In-Memory and
Disk Technology
Massively Parallel
Processing (MPP)
Optimized for standard hardware
High Performance
Compressed Index
(HPCI)
v
Multi-Dimensional
Partitioning
Shared Nothing
Architecture
3rd generation Columnar Storage
High Speed
Loader with Low Latency
and many Linux distributions
Runs on single server, cluster
and all clouds
Map-Reduce
RDBMS
Raw-Data
22. High Performance Compressed Index (HPCI)
Massive Performance Gain On Analytical Operations –
Major Technological Innovation and Differentiation
Standard index architecture
– High Memory Requirements
– High Load on CPUs
– Latency due to Decompression
– Not Suitable for Big Data
Superior ParStream index architecture
+ Immediate Query Processing
+ No Need for Decompression
+ Massively reduced memory + IO load
+ Ultra-high Throughput
24. Real-time Query Performance
Query Response Time
9000
8000
Q#
PS (mS)
Factor
7797
264
29
2
8036
313
25
3
7949
381
20
4
6000
RS (mS)
1
7000
7086
129
55
5000
Parstream
4000
RedShift
3000
2000
1000
0
1
Query #
2
3
4
QUERY
1
select count(distinct AirlineID) as airlines, count(distinct FlightNum) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'
2
select count(distinct AirlineID) as airlines, count(distinct FlightNum), sum(Distance) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'
3
select count(distinct AirlineID) as airlines, count(distinct FlightNum), count(distinct Distance), sum(Distance) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'
4
select max(TaxiIn), sum(DepDelayMinutes), min(TaxiIn), avg(ArrDelayMinutes) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'
Environment: Single EC2 XL node with 15 GB RAM, 2 TB disk on Amazon AWS.
OTP Data Set with about 150 Million records
Comparison with leading analytical databases are available on request
25. ParStream – real-time demo
Try out the interactive ParStream demo on https://www.parstream.com/product/demos/
26. ParStream – The Company
• Founded 2008 in Cologne
• 50 employees in Cologne, Paris, Silicon Valley, Boston
• International Customers
• Running 24x7 in production for more than 3 years
• $ 15.6 M funding: Khosla Ventures (lead), Andy Bechtolsheim,
Crunchfund, Data Collective, Baker Capital, Tola Capital, and others