Tata AIG General Insurance Company - Insurer Innovation Award 2024
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
1. June 2015
OpenStack and BigData
1
Yaron Haviv - Founder & CTO
yaronh@iguaz.io Personal Blog: SDSBlog.com
2. iguazu Falls (Brazil) 1746 M3/sec flow 82m drop 2700m wide 275 discrete falls
Innovating storage and data management
to address Big Data applications’ challenges
5. Big Data Deployment Architectures Have Evolved to Address the 3 V’s
In-Memory
Processing
State
Checkpoints
Batch
Processing
Data
Sources
Ingestion
Raw Data
• Input Datasets
• Logs, Time Series
• Media, Video
Aggregated Data
• Files, Records, Counters
• Transactional Updates
Analytics,
OLTP, Users
• Durable Buffer
• Inline processing
Temp
Files
Data Lake
6. Big Data Deployment Architectures Have Evolved to Address the 3 V’s
In-Memory
Processing
Batch
Processing
Data
Sources
Ingestion
Raw Data
• Input Datasets
• Logs, Time Series
• Media, Video
Aggregated Data
• Files, Records, Counters
• Transactional Updates
Analytics,
OLTP, Users
• Durable Buffer
• Inline processing
Complex and Immature Stack, Resource Intensive, Long Integration
State
Checkpoints
Temp
Files
9. Hadoop & Virtualization
9
HDFS over Virtualized
Hardware is 2x slower
Source: http://www.slideshare.net/yuzhidong/benchmarking-sahara-based-big-data-as-a-service-solutions
10. Rate of Unstructured Data Generation Grows Exponentially
10
4300% Faster Data
Growth Rate by 2020
Storage must be elastic,
dense, and highly efficient
11. How to Simplify Big Data Infrastructure ?
11
Disk
Disk
Disk
Obj
Disk
Disk
Disk
Obj
Disk
Disk
Disk
Obj
Low-cost
Endless Scalability and
Global Distribution
Gather Data
Process Data
(in VMs & Containers)
Consume Data
Shared Data Repository (Object Storage) e.g. Amazon S3, Swift
Home Grown Apps
What’s missing ?
Performance
& Latency
? ? ? ? ?
Application
Integration
Consistency
Security &
Policies
13. Recommended BigData Architecture With OpenStack
10/40GbE SDN Fabric
Shared Storage
Big Data Applications Running
in Servers, VMs or Containers
Ingestion
Mobile Clients
Deployment, Job Scheduling,
Orchestration, Monitoring
Network segmentation and
provisioning, Firewall
Nova
Sahara
Neutron
S3, Swift, Manila, Cinde
KVM
Dockers
File and Object Storage for Data
Block for KVM VM Disks
14. What is Manila?
• Multi-tenant file
share as a service
• Like Cinder for files
• Integrated with Neutron
• Supported Protocols
– NFS, CIFS
– GPFS, Ceph, Gluster
– More to come
14
File Sharing with OpenStack Manila
15. • Automated deployment
and management of
Hadoop/Spark clusters
• Job Execution/tracking
• In/out Data access
15
OpenStack Sahara
19. Define Data Sources & Destinations
19
Input, Output, and Intermediate
data can reside on shared
file/object storage
• Simple data management
• Elastic Storage as a service model
• Data sharing across jobs and with
external consumers/producers
21. • BigData Assumptions & Requirements Have Changed Dramatically
– Address Volume + Velocity + Variety, and real-time/interactive response
– Run over Virtualized Cloud Infrastructure
– Deliver availability, security and operational efficiency
• BigData Solutions must evolve to use
– Infinitely scalable and high-performance data-lakes vs directly attached storage
– Dockers, Network Virtualization, Automated Deployment and operation
• BigData is one of the key application categories for OpenStack
– Think twice before you lock your precious data in public clouds
21
Summary