Weitere ähnliche Inhalte Ähnlich wie Real Time and Big Data – It’s About Time (20) Mehr von MapR Technologies (20) Kürzlich hochgeladen (20) Real Time and Big Data – It’s About Time1. © 2014 MapR Technologies 1© 2014 MapR Technologies
Real Time and Big Data – It’s About Time
2. © 2014 MapR Technologies 2
What is Real Time
Event
Occurs
Gain
Insight
Take
Action
Time Elapsed
3. © 2014 MapR Technologies 3
Time to Insight
Event
Occurs
Gain
Insight
NFS + Drill
Kafka + Camus + Drill
HBase/MapR-DB + Drill
Time to Ingest Data Time to Iterate+
4. © 2014 MapR Technologies 4
Real-time Data Exploration on newly ingested data via NFS
Sources
RELATIONAL
WEB
SERVER
APPLICATION
SERVER
REAL TIME
ANALYTICS
MAPR DISTRIBUTION FOR HADOOP
N
F
S
drillbit drillbit
ODBC
Node Node
drillbit drillbit
Node Node
drillbit drillbit
Node Node
5. © 2014 MapR Technologies 5
Real-time Data Exploration on newly ingested streams via
Kafka and Camus
REAL TIME
ANALYTICS
MAPR DISTRIBUTION FOR
HADOOP
drillbit drillbit
ODBC
Node Node
drillbit drillbit
Node Node
drillbit drillbit
Node Node
Camus
ClusterCluster
Kafka
Cluster
Sources
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
6. © 2014 MapR Technologies 6
Real-time Data Exploration on Operational Data stored in
HBase/MapR-DB
REAL TIME
ANALYTICS
MAPR DISTRIBUTION FOR HADOOP
ODBC
Node
HBase drillbit
Node
HBase drillbit
Node
HBase drillbit
Node
HBase drillbit
APPLICATION SERVER
7. © 2014 MapR Technologies 7
Apache Drill Brings Flexibility & Performance
Access to any data type, any data source
• Relational
• Nested data
• Schema-less
Rapid time to insights
• Query data in-situ
• No Schemas required
• Easy to get started
Integration with existing tools
• ANSI SQL
• BI tool integration
Scale in all dimensions
• TB-PB of scale
• 1000’s of users
• 1000’s of nodes
Granular Security
• Authentication
• Row/column level controls
• De-centralized
8. © 2014 MapR Technologies 8
Omni-SQL (“SQL-on-Everything”)
Drill: Omni-SQL
Whereas the other engines we're discussing here create a relational database
environment on top of Hadoop, Drill instead enables a SQL language interface to
data in numerous formats, without requiring a formal schema to be declared. This
enables plug-and-play discovery over a huge universe of data without
prerequisites and preparation. So while Drill uses SQL, and can connect to
Hadoop, calling it SQL-on-Hadoop kind of misses the point. A better name might
be SQL-on-Everything, with very low setup requirements.
Andrew Brust,
“
”
9. © 2014 MapR Technologies 9
JSON Model, Columnar Speed
JSON
BSON
Mongo
HBase
NoSQL
Parquet
Avro
CSV
TSV
Schema-lessFixed schema
Flat
Complex
Name Gender Age
Michael M 6
Jennifer F 3
{
name: {
first: Michael,
last: Smith
},
hobbies: [ski, soccer],
district: Los Altos
}
{
name: {
first: Jennifer,
last: Gates
},
hobbies: [sing],
preschool: CCLC
}
RDBMS/SQL-on-Hadoop table
Apache Drill table
10. © 2014 MapR Technologies 10
Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA
BEFORE READ
SCHEMA ON THE
FLY
11. © 2014 MapR Technologies 11
Drill’s Role in the Enterprise Data Architecture
Raw data
• JSON, CSV, ...
“Optimized” data
• Parquet, …
Centrally-structured
data
• Schemas in Hive
Metastore
Relational data
• Highly-structured data
Hive, Impala, Spark SQL
Oracle, Teradata
Exploration
(known and unknown questions)
12. © 2014 MapR Technologies 12
Data Warehouse Augmentation with Drill
Augment existing expensive SQL analytics platform with Hadoop and Drill
• Apache Drill allows interactive analysis on large datasets with MapR as the
underlying platform that meets scale, reliability and data protection needs
• SQL users did not have to learn Pig, HiveQL or any other language and
continue to use Tableau on top of Drill
OBJECTIVES
CHALLENGES
SOLUTION
• Hadoop and Drill dramatically reduce the price point to about $1,000 / TB
• MapR platform with Drill delivers reliability and performance for the end users
• Leverage existing BI and SQL skill-sets on Hadoop without retraining
Business
Impact
Potential
• Mine purchase data and compare consumer shopping habits
• Require internal SQL specialists to gain instant access to data at all times
• Currently process tens of TB on Traditional MPP DB
• Want to preserve instant access to data but a lower price point
• Need a system that is reliable, does not lose data and is fast
• Must be able to leverage the SQL skill sets in the company
Retail Analytics
13. © 2014 MapR Technologies 13
Real-time Action
Event
Occurs
Take
Action
14. © 2014 MapR Technologies 14
Real-time processing leading to instant action
MAPR DISTRIBUTION FOR HADOOP
HBase
APPLICATION SERVERS
File system
Batch: Spark, Drill
File system
File system
File system
Kafka
HBase
HBase
HBase
Stream
Processing
ACTION
ACTION
15. © 2014 MapR Technologies 15
Stream Processing – Global MSSP
SENSOR DATA
FIREWALL
LOGS
INTRUSION
PROTECTION
SYSTEM LOGS
Globally Dispersed
Datacenters
SECURITY
APPLIANCE LOGS
SQL Queries
and
Reporting
Batch
Processing
Graph
Processing
New Threat Footprint
within 2-5 min
Closed-Loop
Operations
Benefits: Unified platform for Analytics
Low Operational Costs
Faster Response Times
Better Algorithms
MapR M7 Distribution for Hadoop
1 million events/sec. Over 100 channels
Spark
Streaming
for known threats
& aggregation
Mahout, MLLib
Drill, Impala
GraphX & Titan
16. © 2014 MapR Technologies 16
Operations + Analytics = Real-time, Personalized Services
Fraud model
Recommendations
table
MapR Distribution for Hadoop
Fraud
investigator
Interactive
marketer
Online
transactions
Fraud
detection
Personalized
offers
Clickstream
analysis
Fraud
investigation tool
Real-time Operational Applications
Analytics
17. © 2014 MapR Technologies 17
Q&A
@mapr maprtech
tshiran@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies