More Related Content
Similar to A modern IoT data processing toolbox (20)
More from DataWorks Summit (20)
A modern IoT data processing toolbox
- 3. © 2015 MapR Technologies, confidential© 2015 MapR Technologies, confidential
IoT—a superset of the Internet
- 4. © 2015 MapR Technologies, confidential
IoT—a superset of the Internet
What is the IoT?
“The idea of an all-encompassing and ubiquitous network of devices to
facilitate co-ordination and communication between the devices
themselves as well as between the devices and human end-users. The
involved devices are typically constrained devices such as RFID
sensors, but may also more sophisticated ones like smartphones.”
- 5. © 2015 MapR Technologies, confidential
IoT—a superset of the Internet
thin fat
stationary
mobile
devices and their deployment
- 6. © 2015 MapR Technologies, confidential
The IoT landscapeinfrastructuredataapps
- 7. © 2015 MapR Technologies, confidential
New and Existing
Devices
IoT Gateways Network/Wireless
Services
Backend Systems
Orientation iot.eclipse.org
- 8. © 2015 MapR Technologies, confidential
Categorization & use cases
- 9. © 2015 MapR Technologies, confidential
Categorization & use cases: Personal IoT
Scope is on a single person, for example a smartphone equipped
with GPS sensor or a fitness device that measures the heart and
sharing this data with her GP. One of the fastest growing, rather
consumer-oriented areas of IoT.
Use cases and apps
• Quantified self
• Smart jackets
• Personal digital assistant
- 10. © 2015 MapR Technologies, confidential
Categorization & use cases: Group IoT
Focuses on a small group of people, for example a family in the
context of a smart home where the deployed sensors capture
temperature and lighting conditions for optimal comfort. One of the
most challenging areas and yet early days.
Use cases and apps
• Smart homes
• Proactive/predictive car maintenance
• Interactive tourism
- 11. © 2015 MapR Technologies, confidential
Categorization & use cases: Community IoT
Considers a large group of people, potentially tens of thousands,
usually in the context of public infrastructure, such as smart cities.
Some immature from a commercial POV but potentially promising
IoT area.
Use cases and apps
• Smart cities
• Health care (monitoring, trackers)
- 12. © 2015 MapR Technologies, confidential
Categorization & use cases: Industrial IoT
Scope can be either within an organization or between
organizations and/or individuals. This is arguably the most
established and mature part of IoT, see also M2M.
Use cases and apps
• Smart factory
• Retailer supply chain
• Agriculture
• Waste management
- 13. © 2015 MapR Technologies, confidential
IoT lends itself to ‘Big Data’ approach
Scaling out on commodity hardware,
in a schema-on-read fashion,
over community-defined interfaces
• Volume: store all incoming sensor data for historical references
• Variety: dozens of data formats in use in the IoT world, none is relational
• Velocity: many devices generate data at a high rate; usually data streams
- 14. © 2015 MapR Technologies, confidential© 2015 MapR Technologies, confidential
My IoT toolbox
- 15. © 2015 MapR Technologies, confidential
Apache Kafka
• A high-throughput, distributed,
persistent publish-subscribe
messaging system
• Originates from LinkedIn
• Typically used as buffer/ de-coupling
layer in online stream processing
kafka.apache.org
- 16. © 2015 MapR Technologies, confidential
Fluentd
Data collector for unified logging layer
www.fluentd.org
- 17. © 2015 MapR Technologies, confidential
Apache Storm
• Distributed, fault-tolerant stream-
processing platform
• Guaranteed message processing;
takes care of replaying messages
on failure
• Concepts: tuples, streams,
spouts, bolts, topologies
storm.apache.org
- 18. © 2015 MapR Technologies, confidential
Apache Spark
spark.apache.org
Continued innovation bringing new functionality, such as:
• Tachyon (Shared RDDs, off-heap solution)
• BlinkDB (approximate queries)
• SparkR (R wrapper for Spark)
Spark SQL
(SQL/HQL)
Spark Streaming
(stream processing)
MLlib
(machine learning)
Spark (core execution engine—RDDs)
GraphX
(graph processing)
Mesos
file system (local, MapR-FS, HDFS, S3) or data store (HBase, Elasticsearch, etc.)
YARNStandalone
- 19. © 2015 MapR Technologies, confidential
Apache HBase
• Distributed, column-oriented NoSQL database built on top of HDFS
• Based on Google’s BigTable technology, CP
• Scales to 1,000s of commodity servers, billions of rows/ PB of data
• Low-latency get/put operations
hbase.apache.org
- 20. © 2015 MapR Technologies, confidential
drill.apache.org
Apache Drill
• Interactive analysis at scale,
with global and local schema
• Support evolving NoSQL data structures
• Self-service BI; use with or without
Hadoop
mapr.com/blog/how-use-sql-hadoop-drill-rest-
json-nosql-and-hbase-simple-rest-client
- 21. © 2015 MapR Technologies, confidential© 2015 MapR Technologies, confidential
Time series data
- 22. © 2015 MapR Technologies, confidential
OpenTSDB
OpenTSDB is a distributed time series database on top of HBase,
enabling you …
• to store & index, as well as
• to query & plot
… metrics at scale.
opentsdb.net
- 23. © 2015 MapR Technologies, confidential
OpenTSDB: key concepts
data point: (timestamp, value) + metric + tag: key=value time series
(00:38, 56) mysql.com_delete schema=userdb
- 24. © 2015 MapR Technologies, confidential
read pathwrite path
OpenTSDB: high-level architecture
MapR-DB
HBase RPC: PUT, SCAN
TSD RPC
tcollector
tcollector
tcollector
app/metric
shell script
(alert, etc.)
TSD TSD TSD TSD
TSD RPC or HTTP
opentsdb.net/overview.html
- 25. © 2015 MapR Technologies, confidential
OpenTSDB with MapR
https://github.com/mapr-demos/opentsdb
message
queue
data points
users
tcollector MapR-DB
web app
buffering data for 1 hour in collector allows
1000x decrease in insertion rate
- 26. © 2015 MapR Technologies, confidential
OpenTSDB: interfacing
• HTTP API
• CLI (tsd, query, mkmetric, etc.)
• Java lib: asynchbase
• Improved collectors: scollector
• Dashboard: Grafana
- 27. © 2015 MapR Technologies© 2015 MapR Technologies
The Internet of Things architecture (iot-a)
- 28. © 2015 MapR Technologies
Key Requirements for an IoT Data Platform
• Deal with raw data natively
• Support a range of
workloads; streaming as
first-class citizen
• Ensure business continuity
• Provide secure and
privacy-aware operation
mapr.com/blog/key-requirements-iot-data-platform
- 29. © 2015 MapR Technologies, confidential
The IoT architecture (iot-a)
iot-a.info
MQ/SP
DFS
DB
input outputas-it-happens
outputinteractive
outputbatch
- 30. © 2015 MapR Technologies, confidential
Example iot-a
HDFS
HBase
input outputas-it-happens
outputinteractive
outputbatch
batch jobs
batch jobs
- 31. © 2015 MapR Technologies, confidential
A proof of concept from the automotive sector
- 32. © 2015 MapR Technologies, confidential
A proof of concept from the automotive sector
- 33. © 2015 MapR Technologies, confidential
Largest Biometric Database in the World
PEOPLE
1.2B
PEOPLE
uidai.gov.in/images/AadhaarTechnologyArchitecture_March2014.pdf
- 34. © 2015 MapR Technologies, confidential
Financial Services
Fraud detection
Personalized
offers
Fraud
investigation
tool
Fraud investigator
Fraud model
Recommendations
table
Clickstream
analysis
Online
transactions
MapR Distribution for Hadoop
Analytics
Interactive marketer
- 35. © 2015 MapR Technologies, confidential
Waste & Recycling Leader—Architecture
Truck
Truck
Truck
.
.
.
MapR
lat/lng
lat/lng
lat/lng
Online alerts
Batch processing
(MapReduce)
Tax
reduction
reporting
Shortest path graph
algorithm
(Titan)
Route
optimization
Real-time stream
processing
(Apache Storm)
- 36. © 2015 MapR Technologies, confidential
$50M$50M
in Free Training
- 37. © 2015 MapR Technologies
Q&A
@mhausenblas maprtech
mhausenblas@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies