SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Achieve Big Data Analytic Platform with
Lambda Architecture on Cloud
SPN Infra. , Trend Micro
Scott Miao & SPN infra.
9/10/2016
1
Who am I
• Scott Miao
• RD, SPN, Trend Micro
• Hadoop ecosystem about 6 years
• AWS for BigData about 3 years
• Expertise in HDFS/MR/HBase/AWS EMR
• @takeshimiao
• @slideshare
Agenda
• Why go on Cloud
• Common Cloud Services in Trend
• Lambda Architecture on Cloud
• Servicing Layer as-a Service
• What we learned
Why go on Cloud
Data volume increases 1.5 ~ 2x every year
Growth
becomes 2x
Return of Investment
• On traditional infra., we put a lot of efforts on services operation
• On the Cloud, we can leverage its elasticities to automate our services
• More focus on innovation !!
Time
Money
Revenue
Cost
Why AWS ?
AWS is a leader of IaaS platform
https://www.gartner.com/doc/reprints?id=1-2G2O5FC&ct=150519&st=sbSource: Gartner (May 2015)
AWS Evaluation
Cost acceptable
Functionalities satisfied
Performance satisfied
Common Cloud Services in Trend
ANALYTIC ENGINE + CLOUD STORAGE
Common Services on the Cloud
Cloud CI/CD
Common
Auth
Analytic
Engine
Cloud Storage
AE + CS
Analytic Engine
•Computation service for
Trenders
•Based on AWS EMR
•Simple RESTful API calls
•Computing on demand
•Short live
•Long running
•No operation effort
•Pay by computing
resources
Cloud Storage
•Storage service for
Trenders
•Based on AWS S3
•Simple RESTful API calls
•Share data to all in one
place
•Metadata search for files
•No operation effort
•Pay by storage size used
Analytic Engine is a…
A common Big Data computation
service on Cloud (AWS)
2
Major Features in nutshell
14
AE
CS
submitJob
EMR
createCluster
Input from
• cs path
• cs metadata search
• Pig UDFs support
Output to CS
with meta data
UIs
Cost visibility
(AWS Cost explor.)
Client logs
(SumoLogic)
Cluster info.
(Proxy Gateway)
Visibility
• Fully HA
• Fully automated
• Auto recovery
Support usecases
1. User creates a cluster
2. User can create multiple clusters as he/she need
3. User submits job to target cluster to run
4. AE delivers job to secondary cluster if target cluster
down
5. Diff. group of users are not allowed to submit cluster(s)
6. Diff. group of users are not allowed to delete cluster
7. Only same group of users are allowed to delete cluster
8. User wants to know what their current cost is
9. User wants to troubleshoot his/her submitted job
10. User wants to observe his/her cluster status
2
1.User invokes submitJob
2.Auth service check user’s credential
3.AE knows user name and group
4.AE matches the job and
deliver it to target cluster
5.AE pull data from CS
6.Job run on target cluster
7.AE output result to CS
8. AE sends msg to SNS
Topic if user specified
Usecase#3 – User submits job to target cluster to
run (1/4)
16
AE SaaSusers
submitJob
EMR
Cloud Storage
1.
2.
4.
3.
clusterCriteria:
[[‘sched:adhoc’,
‘env:prod’],
[“env:prod”]]
group:SPN,
tag:
‘sched:routine’,
‘env:prod’
validUser
is SPN
group
group:SPN,
tag:
‘sched:adhoc’,
‘env:prod’
5.
7.
6.
8.
Auth Service
Usecase#3 – User submits job to target cluster to
run (2/4)
• Sample payload of submitJob API
2
{
"clusterCriterias": [
{
"tags": [
"sechd:adhoc",
"env:prod"
]
},
{
"tags": [
"env:prod"
]
}
],
"commandArgs": "$inputPaths $outputPaths",
// see below
Usecase#3 – User submits job to target cluster to
run (3/4)
2
// see previous
"fileDependencies": "s3://path/to/my/main.sh,s3://path/to/my/test.pig",
"inputPaths": [
"cs://path/to/my/input/data“
// or you can use metadata search for input data
// “csq://first_entry_date:['2016-05-30T09:00:000Z','2016-05-30T09:01:000Z'}”
],
"name": "SubmitJob_pig_cs_to_cs_csq",
"outputPaths": [
"cs://path/to/my/output/result"
],
"tags": [
"env:my-test"
],
"notifyTo" : "arn:aws:sns:us-east-1:123456789123:my-sns"
}
Usecase#3 – User submits job to target cluster to
run (4/4)
• All existing job types used in on-premise are
supported
• Pure MR
• Pig and UDFs
• Hadoop streaming
– Python, Ruby, etc
2
Usecase#8 – User wants to know what their current
cost is (1/2)
20
• Billing & Cost management -> Cost Explorer -> Launch Cost Explorer
• Filtered by
• tags: “sys = ae“ and “comp = emr” and “other = <your-cluster-name>”
• Group by Service
Usecase#8 – User wants to know what their current
cost is (2/2) - Billing and Cost Analysis
• Attach tags to your AWS resources
21
Tag Key Tag Value (sample) Description
name aesaas-s-11-api *optional* for AWS cost explorer
stack aesaas-s-11 *optional* for AWS cost explorer
service aesaas *optional* for AWS cost explorer
owner spn
*required* the bill is under whose
budget
env prod|stg|dev *required* environment type
sys ae *required* the system name
comp api-server|emr *required* the subcomponent name
other spn-stg
*optional* an optional tag that free for
other usage.
Why we use AE instead of EMR directly ?
• Abstraction
• Avoid locked-in
• Hide details impl. behind the scene
• AWS EMR was not design for long running jobs
• >= AMI-3.1.1 – 256 ACTIVE or PENDING jobs (STEPs)
• < AMI-3.1.1 – 256 jobs in total
• Better integrated with other common services
• Keep our hands off from AWS native codes
• Centralized Authentication & Authorization
• Leverage our internal LDAP server
• No AWS tokens for user
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/AddingStepstoaJobFlow.html
Lambda Architecture on Cloud
Next Phase
Cloud
Infra.
AE-v1.0
AE + CS
(v1.1~)
Lambda
arch.
24
What is Lambda (λ) Architecture
2
Data
Ingestion
Batch Layer
Master
Dataset
Speed Layer
Streaming
Processing
Batch
Processing Batch View
Merged View
Real-Time View
Serving Layer
Data Access
API
Batch Layer as-a
Service
Serving Layer
as-a Service
A data-processing architecture designed to handle massive
quantities of data by taking advantage of both batch- and stream-
processing methods
https://en.wikipedia.org/wiki/Lambda_architecture
Servicing Layer as-a Service
METADATA STORE
Goals
Help everyone to easily access metadata shared by
several teams
• Access data in one place
• Avoid storage duplication
• Share immediately to all
• Provide unified intelligence
Common metadata storage for several services
• Abstract to hide infra & ops
• Customize for different needs
28
(on aws)
Usecase
• Store all threat entities into one place from new born
– Every team can leverage contributions from other teams at very early
stage
2
Features
30
Metadata Store
Service
Random Writes
Bulk Writes
Sync Query
Async Query
Automatic Provision Customizable Schema
Unified Intelligence Threat Monitor
Borrow idea from Star Schema
• A schema design widely used in data
warehousing
31
Historical data – measurements or
metrics for a specific event
Descriptive attributes – characteristics
to describe and select the fact data
Basic Idea
• Refer to Star Schema design
– Fact table
• Put all records into this table (Single Source of Truth)
• Affordable for random and bulk load of writes
• Fast random reads by rowkey
– Dimension table
• Fast and flexible info. discovery
• Get rowkey of records stored in Fact table
• Then retrieve records by rowkey
Reference Implementation – Part 1
• This Star Schema concept can be fulfill by different impl.
• A famous one is HBase + Indexer + Solr
http://www.hadoopsphere.com/2013/11/the-evolving-hbase-ecosystem.html
https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html
Reference Implementation – Part 2
2
http://www.slideshare.net/AmazonWebServices/bdt310-big-data-architectural-patterns-and-
best-practices-on-aws #p57
Dimension
Tables
Schema
Dimension Tables
Engine:
Elastic Search
Dimension Tables
Engine:
MySQL (RDS)
Dimension Tables
Engine:
Dynamo DB
Propagate data to dimension storage
35
Fact Tables
(Dynamo DB)
Propagato
r
Dynamo DB Streams
Propagation Rules
Random Writes
Bulk Writes
(Eventually Consistent)
2
http://www.programmableweb.com/wp-content/open.graph-600x403.png
http://www.parorrey.com/wp-content/uploads/2012/01/facebook-graph-api.jpg
2
http://www.olily.com/cblog/wp-content/uploads/2013/11/%E6%97%85%E5%B1%9502.jpg
What we learned
FROM BIG DATA ON CLOUD
Pros & Cons
Aspects IDC AWS
Data Capacity Limited by physical
rack space
No limitation in
seasonable amount
Computation
Capacity
Limited by physical
rack space
No limitation in
seasonable amount
DevOps Hard, due to on
physical machine/
VM farm
Easy, due to code is
everything (CI/CD)
Scalability Hard, due to on
physical machine/
VM farm
Easy, relied on ELB,
Autoscaling group
from AWS
Pros & Cons
Aspects IDC AWS
Disaster Recovery Hard, due to on
physical machine/
VM farm
Easy, due to code is
everything
Data Location Limited due to IDC
location
Various and easy
due to multiple
regions of AWS
Cost Implied in Total
Cost of Ownership
Acceptable cost
with Cost
Conscious Design
Something more details…
We Are Hiring !
Backup
AE SaaS Architecture Design
IDC
High Level Architecture Design
46
AZb
AE API servers
RDS
Private ELB
AZa
AZb
AZc
AE API servers
RDS
services
services
services
peering
HTTPS
EMR
EMR
Cross-account
S3 buckets
Time based
Auto
Scaling
group
worker
s
worker
sMulti-AZs
Auto
Scaling
group
Time based
Auto
Scaling
group
Eureka
Eureka
VPN
HTTPS/HTTP
Basic
Cloud StorageInternet
HTTPS/HTTP
Basic
Amazon
SNS
Oregon (us-west-2) SJC1
SPN VPC
CI
slave
Splunk
forwarde
r
peering
VPN
Splunk
peering
What is Netflix Genie
• A practice from Netflix
• A hadoop client to submit jobs to EMR
• Flexible data model design to adopt diff kind of
cluster
• Flexible Job/cluster matching design (based on
tags)
• Cloud characteristics built-in design
– e.g. auto-scaling, load-balance, etc
• It’s goal is plain & simple
• We use it as an internal component
47https://github.com/Netflix/genie/wiki
What is Netflix Eureka
• Is a RESTful service
• Built by Netflix
• A critical component for Genie to do Load Balance and
failover
48
Genie
API API API
9/12/2016 Confidential | Copyright 2016 TrendMicro Inc. 49
AWS EMR (Elastic MapReduce)
2
http://www.slideshare.net/AmazonWebServices/amazon-elastic-mapreduce-deep-dive-and-best-
practices-bdt404-aws-reinvent-2013
2
http://www.slideshare.net/AmazonWebServices/deep-dive-amazon-elastic-map-reduce?from_action=save
2
9/12/2016 Confidential | Copyright 2016 TrendMicro Inc. 53
Lessons Learned on AWS details
Different types of Auto-scaling group
54
Service
Auto Scaling
Group Type
Features Provision
Deploy/Conf
ig Method
OpsWorks
24/7
•manual creation/deletion
•configure one instance for one AZ
• CloudFormation
• AWS::OpsWorks::In
stance.
AutoScalingType
chef recipe
time-based
•can specify time slot(s) based on
hour unit, on everyday or any day
in week
•configure one instance for one AZ
load-based
•can specify CPU/MEM/workload
avg. based on an OPS layer
•UP: when to increase instances
•Down: when to decrease instances
•No max./min. # of instances
setting
•configure one instance for one AZ
EC2
•can set max./min. for # of instance
•Multi-AZs support
• CloudFormation
• AWS::AutoScaling::
AutoScalingGroup
• AWS::AutoScaling::
LaunchConfigurati
on
user-data
ELB + Auto-Scaling Group
• ELB
– Health Check
• Determining the route for coming requests
• Auto-Scaling Groups
– Monitoring EC2 instance by CloudWatch
– If EC2 abnormal, then terminate and start a new
one
• ELB + Auto-Scaling Group
– Auto attach/detach EC2 instance(s) to ELB if
Auto-Scaling Group launch/terminate EC2
http://docs.aws.amazon.com/autoscaling/latest/userguide/autoscaling-load-balancer.html
Auto Recovery based on Monit
• OpsWorks already use Monit for Auto
Recovery
– Leverage the Monit on EC2
– Have practices in on-premise
2
AZ1 AZ2
API
server
API
server
https://mmonit.com/monit/
Auto Scaling group
• Instance check by
CloudWatch
• Process check by
Monit
• No process –
restart process
• Process health
check failed –
terminate EC2
• Terminate EC2 !Auto Scaling group
launch new EC2

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Amazon Web Services
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services
 
Investing the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesInvesting the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesDataWorks Summit/Hadoop Summit
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...DataWorks Summit/Hadoop Summit
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Databricks
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...Amazon Web Services
 
Apache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionApache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionWeiwei Yang
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangDatabricks
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on MesosJen Aman
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
 
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS Amazon Web Services
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingAmazon Web Services
 

Was ist angesagt? (20)

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Investing the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesInvesting the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resources
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
Apache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionApache Hadoop YARN State of the Union
Apache Hadoop YARN State of the Union
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
 
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc Bourlier
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
AWS RDS Migration Tool
AWS RDS Migration Tool AWS RDS Migration Tool
AWS RDS Migration Tool
 

Andere mochten auch

Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Tin Ho
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Real time machine learning
Real time machine learningReal time machine learning
Real time machine learningVinoth Kannan
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedGuido Schmutz
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Jing-Doo Wang
 
Yarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine LearningYarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine Learningojavajava
 
How to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentHow to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentAnna Yen
 
2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform Security2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform SecurityJazz Yao-Tsung Wang
 
A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)Trevor Lalish-Menagh
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Taiwan User Group
 
2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈晨揚 施
 
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Flink Taiwan User Group
 
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"DataStax Academy
 

Andere mochten auch (20)

Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Real time machine learning
Real time machine learningReal time machine learning
Real time machine learning
 
Arquitectura Lambda
Arquitectura LambdaArquitectura Lambda
Arquitectura Lambda
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Big data philly_jug
Big data philly_jugBig data philly_jug
Big data philly_jug
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
 
Yarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine LearningYarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine Learning
 
How to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentHow to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environment
 
2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform Security2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform Security
 
A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
 
2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈
 
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
 
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
 
Apache spark meetup
Apache spark meetupApache spark meetup
Apache spark meetup
 

Ähnlich wie Achieve big data analytic platform with lambda architecture on cloud

Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Amazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)Amazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
Scalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureScalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureSangJin Kang
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftAmazon Web Services LATAM
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 

Ähnlich wie Achieve big data analytic platform with lambda architecture on cloud (20)

Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Scalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureScalability strategies for cloud based system architecture
Scalability strategies for cloud based system architecture
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon Redshift
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 

Mehr von Scott Miao

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingScott Miao
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Attack on graph
Attack on graphAttack on graph
Attack on graphScott Miao
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclientsScott Miao
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradminScott Miao
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoringScott Miao
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapiScott Miao
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytoolScott Miao
 

Mehr von Scott Miao (11)

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Attack on graph
Attack on graphAttack on graph
Attack on graph
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclients
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
 

Kürzlich hochgeladen

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Kürzlich hochgeladen (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

Achieve big data analytic platform with lambda architecture on cloud

  • 1. Achieve Big Data Analytic Platform with Lambda Architecture on Cloud SPN Infra. , Trend Micro Scott Miao & SPN infra. 9/10/2016 1
  • 2. Who am I • Scott Miao • RD, SPN, Trend Micro • Hadoop ecosystem about 6 years • AWS for BigData about 3 years • Expertise in HDFS/MR/HBase/AWS EMR • @takeshimiao • @slideshare
  • 3. Agenda • Why go on Cloud • Common Cloud Services in Trend • Lambda Architecture on Cloud • Servicing Layer as-a Service • What we learned
  • 4. Why go on Cloud
  • 5. Data volume increases 1.5 ~ 2x every year Growth becomes 2x
  • 6. Return of Investment • On traditional infra., we put a lot of efforts on services operation • On the Cloud, we can leverage its elasticities to automate our services • More focus on innovation !! Time Money Revenue Cost
  • 8. AWS is a leader of IaaS platform https://www.gartner.com/doc/reprints?id=1-2G2O5FC&ct=150519&st=sbSource: Gartner (May 2015)
  • 9. AWS Evaluation Cost acceptable Functionalities satisfied Performance satisfied
  • 10. Common Cloud Services in Trend ANALYTIC ENGINE + CLOUD STORAGE
  • 11. Common Services on the Cloud Cloud CI/CD Common Auth Analytic Engine Cloud Storage
  • 12. AE + CS Analytic Engine •Computation service for Trenders •Based on AWS EMR •Simple RESTful API calls •Computing on demand •Short live •Long running •No operation effort •Pay by computing resources Cloud Storage •Storage service for Trenders •Based on AWS S3 •Simple RESTful API calls •Share data to all in one place •Metadata search for files •No operation effort •Pay by storage size used
  • 13. Analytic Engine is a… A common Big Data computation service on Cloud (AWS) 2
  • 14. Major Features in nutshell 14 AE CS submitJob EMR createCluster Input from • cs path • cs metadata search • Pig UDFs support Output to CS with meta data UIs Cost visibility (AWS Cost explor.) Client logs (SumoLogic) Cluster info. (Proxy Gateway) Visibility • Fully HA • Fully automated • Auto recovery
  • 15. Support usecases 1. User creates a cluster 2. User can create multiple clusters as he/she need 3. User submits job to target cluster to run 4. AE delivers job to secondary cluster if target cluster down 5. Diff. group of users are not allowed to submit cluster(s) 6. Diff. group of users are not allowed to delete cluster 7. Only same group of users are allowed to delete cluster 8. User wants to know what their current cost is 9. User wants to troubleshoot his/her submitted job 10. User wants to observe his/her cluster status 2
  • 16. 1.User invokes submitJob 2.Auth service check user’s credential 3.AE knows user name and group 4.AE matches the job and deliver it to target cluster 5.AE pull data from CS 6.Job run on target cluster 7.AE output result to CS 8. AE sends msg to SNS Topic if user specified Usecase#3 – User submits job to target cluster to run (1/4) 16 AE SaaSusers submitJob EMR Cloud Storage 1. 2. 4. 3. clusterCriteria: [[‘sched:adhoc’, ‘env:prod’], [“env:prod”]] group:SPN, tag: ‘sched:routine’, ‘env:prod’ validUser is SPN group group:SPN, tag: ‘sched:adhoc’, ‘env:prod’ 5. 7. 6. 8. Auth Service
  • 17. Usecase#3 – User submits job to target cluster to run (2/4) • Sample payload of submitJob API 2 { "clusterCriterias": [ { "tags": [ "sechd:adhoc", "env:prod" ] }, { "tags": [ "env:prod" ] } ], "commandArgs": "$inputPaths $outputPaths", // see below
  • 18. Usecase#3 – User submits job to target cluster to run (3/4) 2 // see previous "fileDependencies": "s3://path/to/my/main.sh,s3://path/to/my/test.pig", "inputPaths": [ "cs://path/to/my/input/data“ // or you can use metadata search for input data // “csq://first_entry_date:['2016-05-30T09:00:000Z','2016-05-30T09:01:000Z'}” ], "name": "SubmitJob_pig_cs_to_cs_csq", "outputPaths": [ "cs://path/to/my/output/result" ], "tags": [ "env:my-test" ], "notifyTo" : "arn:aws:sns:us-east-1:123456789123:my-sns" }
  • 19. Usecase#3 – User submits job to target cluster to run (4/4) • All existing job types used in on-premise are supported • Pure MR • Pig and UDFs • Hadoop streaming – Python, Ruby, etc 2
  • 20. Usecase#8 – User wants to know what their current cost is (1/2) 20 • Billing & Cost management -> Cost Explorer -> Launch Cost Explorer • Filtered by • tags: “sys = ae“ and “comp = emr” and “other = <your-cluster-name>” • Group by Service
  • 21. Usecase#8 – User wants to know what their current cost is (2/2) - Billing and Cost Analysis • Attach tags to your AWS resources 21 Tag Key Tag Value (sample) Description name aesaas-s-11-api *optional* for AWS cost explorer stack aesaas-s-11 *optional* for AWS cost explorer service aesaas *optional* for AWS cost explorer owner spn *required* the bill is under whose budget env prod|stg|dev *required* environment type sys ae *required* the system name comp api-server|emr *required* the subcomponent name other spn-stg *optional* an optional tag that free for other usage.
  • 22. Why we use AE instead of EMR directly ? • Abstraction • Avoid locked-in • Hide details impl. behind the scene • AWS EMR was not design for long running jobs • >= AMI-3.1.1 – 256 ACTIVE or PENDING jobs (STEPs) • < AMI-3.1.1 – 256 jobs in total • Better integrated with other common services • Keep our hands off from AWS native codes • Centralized Authentication & Authorization • Leverage our internal LDAP server • No AWS tokens for user http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/AddingStepstoaJobFlow.html
  • 24. Next Phase Cloud Infra. AE-v1.0 AE + CS (v1.1~) Lambda arch. 24
  • 25. What is Lambda (λ) Architecture 2
  • 26. Data Ingestion Batch Layer Master Dataset Speed Layer Streaming Processing Batch Processing Batch View Merged View Real-Time View Serving Layer Data Access API Batch Layer as-a Service Serving Layer as-a Service A data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream- processing methods https://en.wikipedia.org/wiki/Lambda_architecture
  • 27. Servicing Layer as-a Service METADATA STORE
  • 28. Goals Help everyone to easily access metadata shared by several teams • Access data in one place • Avoid storage duplication • Share immediately to all • Provide unified intelligence Common metadata storage for several services • Abstract to hide infra & ops • Customize for different needs 28 (on aws)
  • 29. Usecase • Store all threat entities into one place from new born – Every team can leverage contributions from other teams at very early stage 2
  • 30. Features 30 Metadata Store Service Random Writes Bulk Writes Sync Query Async Query Automatic Provision Customizable Schema Unified Intelligence Threat Monitor
  • 31. Borrow idea from Star Schema • A schema design widely used in data warehousing 31 Historical data – measurements or metrics for a specific event Descriptive attributes – characteristics to describe and select the fact data
  • 32. Basic Idea • Refer to Star Schema design – Fact table • Put all records into this table (Single Source of Truth) • Affordable for random and bulk load of writes • Fast random reads by rowkey – Dimension table • Fast and flexible info. discovery • Get rowkey of records stored in Fact table • Then retrieve records by rowkey
  • 33. Reference Implementation – Part 1 • This Star Schema concept can be fulfill by different impl. • A famous one is HBase + Indexer + Solr http://www.hadoopsphere.com/2013/11/the-evolving-hbase-ecosystem.html https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html
  • 34. Reference Implementation – Part 2 2 http://www.slideshare.net/AmazonWebServices/bdt310-big-data-architectural-patterns-and- best-practices-on-aws #p57
  • 35. Dimension Tables Schema Dimension Tables Engine: Elastic Search Dimension Tables Engine: MySQL (RDS) Dimension Tables Engine: Dynamo DB Propagate data to dimension storage 35 Fact Tables (Dynamo DB) Propagato r Dynamo DB Streams Propagation Rules Random Writes Bulk Writes (Eventually Consistent)
  • 36.
  • 39. What we learned FROM BIG DATA ON CLOUD
  • 40. Pros & Cons Aspects IDC AWS Data Capacity Limited by physical rack space No limitation in seasonable amount Computation Capacity Limited by physical rack space No limitation in seasonable amount DevOps Hard, due to on physical machine/ VM farm Easy, due to code is everything (CI/CD) Scalability Hard, due to on physical machine/ VM farm Easy, relied on ELB, Autoscaling group from AWS
  • 41. Pros & Cons Aspects IDC AWS Disaster Recovery Hard, due to on physical machine/ VM farm Easy, due to code is everything Data Location Limited due to IDC location Various and easy due to multiple regions of AWS Cost Implied in Total Cost of Ownership Acceptable cost with Cost Conscious Design Something more details…
  • 42.
  • 46. IDC High Level Architecture Design 46 AZb AE API servers RDS Private ELB AZa AZb AZc AE API servers RDS services services services peering HTTPS EMR EMR Cross-account S3 buckets Time based Auto Scaling group worker s worker sMulti-AZs Auto Scaling group Time based Auto Scaling group Eureka Eureka VPN HTTPS/HTTP Basic Cloud StorageInternet HTTPS/HTTP Basic Amazon SNS Oregon (us-west-2) SJC1 SPN VPC CI slave Splunk forwarde r peering VPN Splunk peering
  • 47. What is Netflix Genie • A practice from Netflix • A hadoop client to submit jobs to EMR • Flexible data model design to adopt diff kind of cluster • Flexible Job/cluster matching design (based on tags) • Cloud characteristics built-in design – e.g. auto-scaling, load-balance, etc • It’s goal is plain & simple • We use it as an internal component 47https://github.com/Netflix/genie/wiki
  • 48. What is Netflix Eureka • Is a RESTful service • Built by Netflix • A critical component for Genie to do Load Balance and failover 48 Genie API API API
  • 49. 9/12/2016 Confidential | Copyright 2016 TrendMicro Inc. 49 AWS EMR (Elastic MapReduce)
  • 52. 2
  • 53. 9/12/2016 Confidential | Copyright 2016 TrendMicro Inc. 53 Lessons Learned on AWS details
  • 54. Different types of Auto-scaling group 54 Service Auto Scaling Group Type Features Provision Deploy/Conf ig Method OpsWorks 24/7 •manual creation/deletion •configure one instance for one AZ • CloudFormation • AWS::OpsWorks::In stance. AutoScalingType chef recipe time-based •can specify time slot(s) based on hour unit, on everyday or any day in week •configure one instance for one AZ load-based •can specify CPU/MEM/workload avg. based on an OPS layer •UP: when to increase instances •Down: when to decrease instances •No max./min. # of instances setting •configure one instance for one AZ EC2 •can set max./min. for # of instance •Multi-AZs support • CloudFormation • AWS::AutoScaling:: AutoScalingGroup • AWS::AutoScaling:: LaunchConfigurati on user-data
  • 55. ELB + Auto-Scaling Group • ELB – Health Check • Determining the route for coming requests • Auto-Scaling Groups – Monitoring EC2 instance by CloudWatch – If EC2 abnormal, then terminate and start a new one • ELB + Auto-Scaling Group – Auto attach/detach EC2 instance(s) to ELB if Auto-Scaling Group launch/terminate EC2 http://docs.aws.amazon.com/autoscaling/latest/userguide/autoscaling-load-balancer.html
  • 56. Auto Recovery based on Monit • OpsWorks already use Monit for Auto Recovery – Leverage the Monit on EC2 – Have practices in on-premise 2 AZ1 AZ2 API server API server https://mmonit.com/monit/ Auto Scaling group • Instance check by CloudWatch • Process check by Monit • No process – restart process • Process health check failed – terminate EC2 • Terminate EC2 !Auto Scaling group launch new EC2

Hinweis der Redaktion

  1. What’s our goal
  2. What’s our goal
  3. https://en.wikipedia.org/wiki/Star_schema
  4. Do not use CloudSearch Aurora is good !!
  5. MongoDB is excluded based on Chien’s suggestion.
  6. TAO: The Associations and Objects, a distributed Graph data store Unicorn: Graph-aware search system Graph API: interface to users