All about Big Data components and the best tools to ingest, process, store and visualize the data.
This is a keynote from the series "by Developer for Developers" powered by eSolutionsGrup.
3. BIG DATA CHALLENGES
3 Vs of
Big Data
Terabytes
Records
Transactions
Tables, files
Batch
Near time
Real time
Streaming
Structured
Unstructured
Semistructured
All of the above
VOLUME
VELOCITY VARIETY
4. BIG DATA PIPELINE
Source Ingest
Process
Analyze
VisualizeStore
Structured
Semi-Structured
Unstructured
Messaging
API/ODBC
ETL
Replication
Web Dashboards
Mobile Devices
Web Services
Data Lake
Operational
Data Store
Real Time
Batch
Interactive
AI/ML
5. BIG DATA POPULAR USE CASES
Fraud Detection
Security Intelligence
Price Optimization
Behavioral
Analytics
Recommendation
Engines
Social Media Analysis
and Response
Internet of Things
Financial Trading
Improving Science
and Research
Performance
Optimisation
Improving
Healthcare
6. “
Big Data is data sets that are too large,
complex and dynamics for any
conventional data tools to capture,
store, manage and analyze.
14. DATA STORAGE CAP
Consistency
Availability
Partition Tolerance
All the clients see the
same data regardless
of updates or deletes
System continues to
operate as expected
even with node failures
System continues to operate
as expected despite network
or message failures
15. DATA STORAGE HDFS
Distributed File System
Master/Slave architecture
Provides file permissions
and authentication
High fault-tolerance
Read/Write terabytes
of data per second
Streaming data access
Replicates the data
for durability
16. DATA STORAGE HBASE
NoSQL database
Consistency and
Partition Tolerance
No data types
Stores data in HDFS
Optimized for reads
Column-Oriented
Automatic sharding
and load balancing
Master/Slave architecture
Support Aggregation
17. DATA STORAGE CASSANDRA
NoSQL database
Optimized for writes
No Single Point of Failures
Column-Oriented
Tunable Consistency
Ring architecture
Availability and
Partition Tolerance
Scalable with large clusters
18. DATA STORAGE SOLR
Full-Text Search
Linear Scalability
Distributed Index
Schema / Schemaless
Auto Index Replication Inverted Indexing
Auto Failover and
Recovery
Sharding and Replications
19. DATA STORAGE REDIS
Persistence via Snapshot / Journal
Key-Value NoSQL database
In memory data store
Keys can have expiry time
Master/Slave architecture
Publish / Subscribe system
Consistency and
Partition Tolerance
20. DATA STORAGE TITAN
Graph database
CAP according to
backend storage
Geo, numeric range, full text
ElasticSearch, Solr, Lucene
Support ACID and
Eventual Consistency
Very large graphs
Storage backends
Cassandra, HBase, Oracle
Concurrent Transactions and
Operational Graph Processing
Elastic and linear
scalability
22. DATA PROCESSING
BATCH
Data arrives and is processed
at certain interval.
NEAR REAL-TIME
The time between when data
arrives and is processed is very
small (micro-batches).
REAL TIME
Data arrives and is processed
in a continuous manner.
23. DATA ANALYTICS
INTERACTIVE
Set of approaches to explore data, supporting
exploration at the rate of human thought.
MACHINE LEARNING
Turning data into information using automated
methods without direct human intervention.