2. DataTorrent in Hadoop Ecosystem
• Most powerful Hadoop platform for real-time stream computations
• Massive Real-Time Production Monitoring, Analytics, and Alerting
– Systems monitoring: Resource Utilization, Logs Analysis
– Predictive Maintenance, DOS Attack, Launch Validation etc.
3. DataTorrent Technology Stack
Malhar – Open Source Operators and Apps Library
(Apache v2 License)
SLA
Alerts
Tools
Web Services
State
Snapshot
Security
Scalability
Fault
Tolerance
Partitioning
Dynamic
Modifications
StrAM (Stream Application Master)
4. DataTorrent’s Platform Differentiators
.
Extreme Scalability
•
•
•
Automatically scale to
changing loads
Sub-second latency with
linear scalability
Complex monitoring
applications with massive
computations
Mission Critical
•
•
•
Built-in Stateful Faulttolerance. 24/7 uptime
guaranteed
Predictive Analysis, and
trouble shooting
Update your application
while it's running!
Hadoop-Native
•
•
•
Runs on your existing
Apache Hadoop cluster.
Develop faster with our
open-source framework.
Integrate seamlessly with
your existing monitoring
stack.
5. Stream Processing
Stream 3
Stream 1
Data
Load
Stream 4
Stream 2
Window 3
•
•
•
•
•
Window 2
Window 1
A Stream is a sequence of data events with schema
An Operator takes input streams and compute output streams
An Application is a Directed Acyclic Graph (DAG)
In-memory asynchronous distributed computations
A Streaming Window is an atomic batch of sequential data events
8. Open Sourced Production Operations Application
Real-Time Dashboards and Actions
•
•
•
•
•
•
DOS Attack
Predictive maintenance of servers
Pre and post Launch analysis
404 Response
Root cause analysis for LAMP architecture
Segmentation
–
–
–
–
•
Geo Location
Gender, Age
Resource usage (urls)
Etc.
URL Analysis
– Response times
– Patterns
• Seamless integration into monitoring
stacks
9. How to get Started?
• DataTorrent
• Try Sandbox (https://datatorrent.com)
• Free for small to medium enterprises: Contact us for details
• Malhar Open Source (Apache 2.0) project
• https://github.com/DataTorrent/Malhar
• malhar-users@googlegroups.com
• Applications available Jan 2014
• LogStream: Site Operations
• Map-Reduce Monitor
DataTorrent Inc.
3200 Partrick Henry, 2nd Fl
Santa Clara, CA 95054
info@datatorrent.com
www.datatorrent.com
Twitter.com/DataTorrent
Facebook.com/DataTorrent
10. Platform Capabilities
Scale able High
Performance
• Throughput in Billions Events/Sec
• Latency in Milliseconds
Powerful Tools
• GUI For Cluster Performance Monitoring
• GUI and Debuggers for Event Data
• Test Framework, Certification, Versioning
• CLI, Macros
Easy To Use
Fault-Tolerance
• No State loss, No Message loss node outage recovery
• State Management
• Efficient State Checkpointing
• Library of Operator Templates
• Focus On Business Logic
• Connectors to Current Tools
• HDFS, Hbase, MySql, ActiveMQ
• APIs for Tool Integrations
Adaptability
Native YARN
Application
• Runtime Scaling and Resource Optimization
• Dynamic Application Modification
•Integrates with Hadoop 2.0 Distributions
•Apache, Cloudera, Hortonworks, MapR, Pivotal
•Co-Exists with Existing Batch Infrastructure
•Multi-Tenancy with Existing Hadoop Applications