Rapid Prototyping for Big Data with AWS

RAPID PROTOTYPING FOR
BIG DATA WITH AWS
Tuesday, March 15, 2016
8 AM PST/4 PM BST/5 PM CEST
webinar webinar@softserveinc.com

SPEAKERS
Serge Haziyev
VP of Technology Services,
SoftServe
Taras Bachynskyy
Data Architect,
SoftServe
Vadim Astakhov
Solution Architect,
Amazon Web Services
Ariel Weil
VP of Marketing and Business
Development, Yottaa
webinar

AGENDA
webinar
Big Data
Prototyping
AWS as a
Prototype
Accelerator
Case study Questions

TYPICAL BIG DATA CHALLENGES
UNSTRUCTURED
STRUCTURED
HIGH
MEDIUM
LOW
Archives Docs Business
Apps
Media Social
Networks
Public
Web
Data
Storages
Machine
Log Data
Sensor
Data
Velocity Variety VolumeComplexity
Architecture Concerns:
• Scalability
• Performance
• Extensibility
• Data Quality
• Fault-Tolerance and
Availability
• Security
• Cost
• Skills Availability
Data Sources:
webinar

WHY PROTOTYPING IS IMPORTANT?
Typical signs to start prototyping:
• Requirements are uncertain
• Technologies are new
• No comparable system has been previously developed
• No full buy-in from the business
They said they didn’t
need a prototype
webinar

TYPES OF PROTOTYPES
Throwaway Prototype
(Proof-of-Concept)
Horizontal Prototype
Vertical
Evolutionary Prototype
Minimum Viable
Product (MVP)
webinar

WHEN AND WHY TO PROTOTYPE?
Find more info at: “Strategic Prototyping for Developing Big Data Systems”,
IEEE Software, March-April, 2016
Initial
Architecture
Analysis
Vertical
Evolutionary
Prototype
PoC
MVP
Rapid Horizontal
Prototype
Projecttimeline(When?)
• Identification of missing, conflicting or ambiguous architectural requirements
• Creation of initial architecture design and selection of candidate technologies
Goals (Why?):
• Confirmation of user interface requirements and system scope
• Demonstration version of the system to obtain buy-in from the business
• Integration of selected technologies
• Clarification of complex requirements
• Testing critical functionality and quality attribute scenarios
• Validation of technologies and scenarios that pose risks
PoCs
• Getting early feedback from end users and updating the product accordingly
• Presentation of a working version to a trade show or customer event
• Evaluation of team progress and alignment
webinar

AGENDA
webinar
Big Data
Prototyping
Challenges
AWS as a
Prototype
Accelerator
Case study Questions

BIG DATA CHALLENGES
Volume
Velocity
Variety
Big Data Real-time Big Data
webinar

SIMPLIFY BIG DATA PROCESSING
Ingest Collect Process Analyze
Data Answers
Time
webinar

EMR Redshift
Process
AWS BIG DATA TECHNOLOGIES
EC2
S3Amazon Kinesis GlacierDynamoDB
AWS Direct Connect AWS Import/Export
Ingest
Automate
AWS Data Pipeline
Store
VPN/Public Web
webinar

S3
Kinesis
DynamoDB
RDS (Aurora)
AWS Lambda
KCL Apps
EMR Redshift
Machine
Learning
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
BIG DATA PROCESSING
Data Answers
webinar

DATA STRUCTURE AND QUERY TYPES VS STORAGE TECHNOLOGY
Structured – Simple Query
NoSQL
Amazon DynamoDB
Cache
Amazon ElastiCache
(MemCached/Redis -PubSub)
Structured – Complex Query
SQL
Amazon RDS
DW SQL
Amazon Redshift
Unstructured – No Query
Cloud Storage
Amazon S3
Amazon Glacier
Unstructured – Custom Query
Search
Amazon CloudSearch
Hadoop/HDFS
Amazon Elastic MapReduce
Complexity
Query Structure Complexity
webinar

DATA CHARACTERISTICS: HOT, WARM, COLD
Hot Warm Cold
Volume MB–GB GB–TB PB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–High High Very High
Request rate Very High High Low
Cost/GB $$-$ $-¢¢ ¢
webinar

WHAT DATA STORE SHOULD I USE?
Hot Warm Data Cold
webinar

YOUR BIG DATA APPLICATION ON AWS
Log4J
EMR-Kinesis Connector
Hive with
Amazon S3
Amazon Redshift
parallel COPY from
Amazon S3
Amazon Kinesis
processing state
webinar

YOTTAA CREATES AN ABSTRACTION LAYER ON TOP OF
INFRASTRUCTURE, APP & VISITOR BROWSER
webinar

YOTTAA’S PROXY-BASED SOLUTION SEES EVERY VISITOR
REQUEST & INFRASTRUCTURE RESPONSE
Primary Web
(www) Domain
Visitor
Browser
YOTTAA
Network
WAF
Incumbent
CDN
Resource
Domain(s)
3rd Party
WAF
(if present)
3rd Party
Domain(s)
Asset
Optimization
Non-optimized
Assets
webinar

REAL-TIME WEB ANALYTICS – LOB & IT USE CASES TO
DRIVE YOTTAAS BUSINESS FORWARD
“The Business”
Customer Journey
• User experience
• Visitor Targeting
• Vendor Attribution
• Business Agility
IT & Operations
Service Levels
• Speed
• Scalability
• Security
• Standards
webinar

Complete Visibility
• Centralized log delivery & analytics
• Role-based Access Control
• Dual-factor authentication
• Account lockout
Actionable Insights
• Real-time traffic & threat analysis
• Event management
• In-line actions via Yottaa Portal
THE SOLUTION: IMPACTANALYTICSTM BIG DATA
ANALYTICS FOR ACTIONABLE INSIGHT
webinar

TECHNICAL SOLUTION
Architecture Drivers
▪ Volume (> 100 TB scale)
▪ Throughput (> 20K/sec)
▪ Performance (low latency)
▪ Exploratory analytics
▪ Near Real-time (5 sec latency)
▪ Historical view (5 years data)
Lambda Architecture
Solution
Combine different techniques
 Stream (resent data) – hot data
 Batch (all data) – cold and warm
Velocity
Variety
Volume
Batch Layer
Speed Layer
Serving Layer
Mater Data
Stream
Processing
Batch
View
Real-time
View
Batch
Processing
Web Logs
webinar

TECHNICAL SOLUTION. PROTOTYPE
Technologies
 S3
 EMR
 EC2
 Redshift
 Elasticsearch
 Kafka
 Flume
Prototyping Phase
Speed Layer > 80% scenarios
 Validate Elasticsearch aggregates
 Compatibility & integration
 Performance and load testing
webinar
PoC & Vertical Prototype
Batch Layer
Speed Layer
Elastic
Log Parser
Elastic
Driver
EMR
Event
Broker
Kafka-
Elastic
Consumer
S3 +
Serving Layer
Redshift
Web Logs
Logs
Collector

BIG DATA PROTOTYPING
webinar
-as-a-Service = time-to-market
“on demand” model = cost economy
rich services portfolio
elasticity
 Identify risks
 Choose prototyping approach
 Evaluate your decisions
 Achieve required quality attributes
 Determine needed hardware and
configuration
 Workload and concurrency matters
ARCHITECTUR
E DRIVERS
DECISIONS
ASSOCIATED
SCENARIOS
ESTIMATED
CAPACITY
SOLUTION
RISKS
Analysis Design Evaluation
Required Actions
Why AWS Big Data ChallengesEnvironment Experience
Strategy

QUESTIONS & ANSWERS
e-mail your questions to webinar@softserveinc.com
webinar

THANK YOU
www.softserveinc.com
webinar

Rapid Prototyping for Big Data with AWS

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Rapid Prototyping for Big Data with AWS

Ähnlich wie Rapid Prototyping for Big Data with AWS (20)

Mehr von SoftServe

Mehr von SoftServe (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Rapid Prototyping for Big Data with AWS

Hinweis der Redaktion