Big Data Ecosystem

BIG DATA
Lucian Neghina
Big Data & Cloud Computing
by Developer for Developers
Ecosystem

Introduction
Let’s see what is Big Data
1

BIG DATA CHALLENGES
3 Vs of
Big Data
Terabytes
Records
Transactions
Tables, files
Batch
Near time
Real time
Streaming
Structured
Unstructured
Semistructured
All of the above
VOLUME
VELOCITY VARIETY

BIG DATA PIPELINE
Source Ingest
Process
Analyze
VisualizeStore
Structured
Semi-Structured
Unstructured
Messaging
API/ODBC
ETL
Replication
Web Dashboards
Mobile Devices
Web Services
Data Lake
Operational
Data Store
Real Time
Batch
Interactive
AI/ML

BIG DATA POPULAR USE CASES
Fraud Detection
Security Intelligence
Price Optimization
Behavioral
Analytics
Recommendation
Engines
Social Media Analysis
and Response
Internet of Things
Financial Trading
Improving Science
and Research
Performance
Optimisation
Improving
Healthcare

“
Big Data is data sets that are too large,
complex and dynamics for any
conventional data tools to capture,
store, manage and analyze.

DATA INGESTION
Source
systems
Ingest / Collect
CATEGORIES OF DATA
● Data in motion
● Data at rest
Destination system
WHAT IS

DATA INGESTION SQOOP
RDBMS
PostgreSQL,
Oracle,
MySQL, ...
Sqoop Import
Sqoop job
Map
Map
Map
Hadoop Cluster
Sqoop Export
HDFS

DATA INGESTION KAFKA
CONNECT
PRODUCER-CONSUMER
Data
Source
KafkaConnect
KafkaConnect
Data
Sink
Producer
Producer
Producer
Consumer
Consumer
ConsumerKafka Cluster
Kafka Cluster

DATA INGESTION FLUME
External
Source
HDFS
File
Flume Agent
Source
Sink 1
Sink 2
Channel 1
Channel 2
Event
Event
Event Event
Event
Event
Event

DATA INGESTION NIFI
Edge Data
IoT Devices
Client
Libraries
Mobile
Client
Libraries
Container
MiNiFi
IoT Devices
Client
Libraries
Gateway
MiNiFi
Server Cluster
NiFi NiFi NiFi
Regional Center
Server Cluster
NiFi NiFi NiFi
Core Data Center
Kafka
Storm
Others...
Kafka
Spark

DATA STORAGE CAP
Consistency
Availability
Partition Tolerance
All the clients see the
same data regardless
of updates or deletes
System continues to
operate as expected
even with node failures
System continues to operate
as expected despite network
or message failures

DATA STORAGE HDFS
Distributed File System
Master/Slave architecture
Provides file permissions
and authentication
High fault-tolerance
Read/Write terabytes
of data per second
Streaming data access
Replicates the data
for durability

DATA STORAGE HBASE
NoSQL database
Consistency and
Partition Tolerance
No data types
Stores data in HDFS
Optimized for reads
Column-Oriented
Automatic sharding
and load balancing
Support Aggregation

DATA STORAGE CASSANDRA
NoSQL database
Optimized for writes
No Single Point of Failures
Column-Oriented
Tunable Consistency
Ring architecture
Availability and
Partition Tolerance
Scalable with large clusters

DATA STORAGE SOLR
Full-Text Search
Linear Scalability
Distributed Index
Schema / Schemaless
Auto Index Replication Inverted Indexing
Auto Failover and
Recovery
Sharding and Replications

DATA STORAGE REDIS
Persistence via Snapshot / Journal
Key-Value NoSQL database
In memory data store
Keys can have expiry time
Publish / Subscribe system
Consistency and
Partition Tolerance

DATA STORAGE TITAN
Graph database
CAP according to
backend storage
Geo, numeric range, full text
ElasticSearch, Solr, Lucene
Support ACID and
Eventual Consistency
Very large graphs
Storage backends
Cassandra, HBase, Oracle
Concurrent Transactions and
Operational Graph Processing
Elastic and linear
scalability

Process & Analyze
Big Data Component
4

DATA PROCESSING
BATCH
Data arrives and is processed
at certain interval.
NEAR REAL-TIME
The time between when data
arrives and is processed is very
small (micro-batches).
REAL TIME
Data arrives and is processed
in a continuous manner.

DATA ANALYTICS
INTERACTIVE
Set of approaches to explore data, supporting
exploration at the rate of human thought.
MACHINE LEARNING
Turning data into information using automated
methods without direct human intervention.

Visualization
Big Data Component
5

Monitoring
DATA VISUALIZATION
Business users
Data scientist,
developers
NotebooksBusiness Intelligence Frameworks
D3.js Chart.js Google
Charts

Thank You !
@eSolutionsGrup
www.esolutions.ro

Big Data Ecosystem

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Big Data Ecosystem

Ähnlich wie Big Data Ecosystem (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data Ecosystem