SlideShare ist ein Scribd-Unternehmen logo
1 von 31
LOG DATA ANALYSIS PLATFORM
May, 2015
Agenda
1) User-Group Introduction
2) Problematic
3) Log Data Analysis System Overview
4) Task Analysis
5) Solution Architecture
6) Trade-off Analysis
7) Automation
8) Performance Testing
9) Outcome & Plans
PROBLEMATIC
Demo Lab: Why we’ve started this project?
1) Increase Internal Experience
2) Create Reference Solution w/o NDA Limitations
3) Get Playground for Tests
4) Provide Demo Environment for Customers (using their data)
5) Decrease time to Market (by introducing automation)
LOG DATA ANALYSIS PLATFORM :
OVERVIEW
Log Data Analysis Platform Details
Key Facts:
• ~270-300 Web Servers
• Log Types: HTTPD Access
logs, Error logs, Application
Server Servlet, OS Service
Logs
• ~500K events per minute
• 150GB of data per day
Technologies:
• Flume
• Hadoop/HDFS, MapReduce
• Hive, Impala
• Oozie
• Elasticsearch, Kibana 3
• Tableau Analytics platform
• Puppet + Vagrant
Log Data Examples
Access log:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Error log:
[Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client
stopped connection before send body completed
[Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist:
/home/httpd/twiki/view/Main/WebHome
Vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0
iostat
Linux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011
avg-cpu: %user %nice %system %iowait %steal %idle
5.68 0.00 0.52 2.03 0.00 91.76
TASK ANALYSIS
Architecture Drivers: Use Cases
Architecture Drivers: Quality Attributes (1/3)
Architecture Drivers: Quality Attributes (2/3)
Architecture Drivers: Quality Attributes (3/3)
Architecture Drivers: Limitations
Demo Lab: Marketecture
SOLUTION ARCHITECTURE
Solution Architecture
Batch Layer Serving Layer
Speed Layer
Raw Data
Storage
Data
Strea
m
Real-time
Views
Static Views
Precomputing
Precomputing
Ad-hoc Batch
Views
Static Batch
Views
Corporate BI
Tool
Legend:
Layer boundary
Data flow (with direction indicated)
Query flow
Apache HTTP Servers
Raw Data
Storage Pre-computing Batch Views
Real-Time Views
Dashboard/
Search
Data Stream
Real-Time Processing and
Aggregations
BI Tool
 Avro as a Raw Data Storage file
format
 Parquet as a Batch Views file
format
 Star schema as a Batch Views
data model
Architecture: Flume Topology
Batch ETL
TRADE-OFF ANALYSIS
Distribution Selection
Hive Stinger vs Impala
Compression Ratio
Access Speed
AUTOMATION
Automation (saves time and money)
80% 20%
Development and Debugging F&P Testing, Demo
Local Development Cloud Development
vagrant up
Automation Process
Phase Tool Notes
VM Provisioning Vagrant — Supports:
VirtualBox, VMWare ESX, Amazon AWS
VM Bootstraping Puppet — Installs Cloudera Manager, Cloudera Distribution
Hadoop, ElasticSearch+Kibana, Flume, Microstrategy, Log
Generator.
— Creates Cluster using Cloudera Manager API.
Configure ETL
and BI
Puppet — Configures Flume, Oozie, ElasticSearch, Impala, Hive,
Microstrategy Dashboards
Integration Tests Puppet — Generates Workload and ensures data go through.
— Checks Logs for errors.
— Calculates timing/throughput.
PERFORMANCE TESTING
Log Generator
1 Thread can generate:
4200 events / second (File source)
5500 events / second (TCP source)
Accurate Sizing
100k/min
50k/min
20k/min
200k
/min
Calculator!
OUTCOME & PLANS
Outcome
1) Demo lab, playground, testing platform (in 1 hour)
2) Sizing Calculator
3) Help to get 3 new customers (one is really, really
huge)
4) Strategic Partnership with Cloudera
5) Tons of experience and fun 
Plans
1) Add support for other Hadoop Distributions
(Hortonworks, MapR)
2) Make Project Open-Source
Thank You!
31
SoftServe US Office
One Congress Plaza,
111 Congress Avenue, Suite 2700 Austin, TX
78701
Tel: 512.516.8880
Contacts
Valentyn Kropov
vkrop@softserveinc.com
Tel: 866.687.3588 x4341

Weitere ähnliche Inhalte

Was ist angesagt?

Metail at Cambridge AWS User Group Main Meetup #3
Metail at Cambridge AWS User Group Main Meetup #3Metail at Cambridge AWS User Group Main Meetup #3
Metail at Cambridge AWS User Group Main Meetup #3Gareth Rogers
 
An Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginAn Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginMitsuhiro Tanda
 
ADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationMark Kromer
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIQAware GmbH
 
Sql to dax
Sql to daxSql to dax
Sql to daxAnnie Xu
 
Anomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark StreamingAnomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark StreamingKeira Zhou
 
Prometheus loves Grafana
Prometheus loves GrafanaPrometheus loves Grafana
Prometheus loves GrafanaTobias Schmidt
 
Apache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachApache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachCalculated Systems
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Databricks
 
MLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkMLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkJen Aman
 
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowAutomatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowDatabricks
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackSylvain Wallez
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...Spark Summit
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Mark Kromer
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 

Was ist angesagt? (20)

Metail at Cambridge AWS User Group Main Meetup #3
Metail at Cambridge AWS User Group Main Meetup #3Metail at Cambridge AWS User Group Main Meetup #3
Metail at Cambridge AWS User Group Main Meetup #3
 
An Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginAn Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram Plugin
 
ADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview Migration
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
Sql to dax
Sql to daxSql to dax
Sql to dax
 
Module Owb Repositories
Module Owb RepositoriesModule Owb Repositories
Module Owb Repositories
 
Anomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark StreamingAnomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark Streaming
 
Prometheus loves Grafana
Prometheus loves GrafanaPrometheus loves Grafana
Prometheus loves Grafana
 
Apache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachApache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop Approach
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
 
MLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkMLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using Spark
 
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowAutomatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
A Short Presentation on Kafka
A Short Presentation on KafkaA Short Presentation on Kafka
A Short Presentation on Kafka
 

Andere mochten auch

Application Monitoring with WSO2 App Server
Application Monitoring with WSO2 App ServerApplication Monitoring with WSO2 App Server
Application Monitoring with WSO2 App ServerSagara Gunathunga
 
Designing Big Data Systems Like a Pro
Designing Big Data Systems Like a ProDesigning Big Data Systems Like a Pro
Designing Big Data Systems Like a ProSoftServe
 
Approaching Quality in Digital Era
Approaching Quality in Digital EraApproaching Quality in Digital Era
Approaching Quality in Digital EraSoftServe
 
Impala Performance Update
Impala Performance UpdateImpala Performance Update
Impala Performance UpdateCloudera, Inc.
 
Training Webinar: Effective Platform Server Monitoring
Training Webinar: Effective Platform Server MonitoringTraining Webinar: Effective Platform Server Monitoring
Training Webinar: Effective Platform Server MonitoringOutSystems
 
Well Log Interpretation and Petrophysical Analisis in [Autosaved]
Well Log Interpretation and Petrophysical Analisis in [Autosaved]Well Log Interpretation and Petrophysical Analisis in [Autosaved]
Well Log Interpretation and Petrophysical Analisis in [Autosaved]Ridho Nanda Pratama
 

Andere mochten auch (8)

Application Monitoring with WSO2 App Server
Application Monitoring with WSO2 App ServerApplication Monitoring with WSO2 App Server
Application Monitoring with WSO2 App Server
 
Designing Big Data Systems Like a Pro
Designing Big Data Systems Like a ProDesigning Big Data Systems Like a Pro
Designing Big Data Systems Like a Pro
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Approaching Quality in Digital Era
Approaching Quality in Digital EraApproaching Quality in Digital Era
Approaching Quality in Digital Era
 
Impala Performance Update
Impala Performance UpdateImpala Performance Update
Impala Performance Update
 
Training Webinar: Effective Platform Server Monitoring
Training Webinar: Effective Platform Server MonitoringTraining Webinar: Effective Platform Server Monitoring
Training Webinar: Effective Platform Server Monitoring
 
Well Log Interpretation and Petrophysical Analisis in [Autosaved]
Well Log Interpretation and Petrophysical Analisis in [Autosaved]Well Log Interpretation and Petrophysical Analisis in [Autosaved]
Well Log Interpretation and Petrophysical Analisis in [Autosaved]
 
Well log data processing
Well log data processingWell log data processing
Well log data processing
 

Ähnlich wie Log Data Analysis Platform by Valentin Kropov

Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementHao Chen
 
Server Monitoring (Scaling while bootstrapped)
Server Monitoring  (Scaling while bootstrapped)Server Monitoring  (Scaling while bootstrapped)
Server Monitoring (Scaling while bootstrapped)Ajibola Aiyedogbon
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User GroupWhat is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User GroupMaarten Balliauw
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureDatabricks
 
Cowboy Dating with Big Data or DWH Evolution in Action, Борис Трофимов
Cowboy Dating with Big Data or DWH Evolution in Action, Борис ТрофимовCowboy Dating with Big Data or DWH Evolution in Action, Борис Трофимов
Cowboy Dating with Big Data or DWH Evolution in Action, Борис ТрофимовSigma Software
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft Private Cloud
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Amazon Web Services
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureCarst Vaartjes
 
Cowboy dating with big data, Борис Трофімов
Cowboy dating with big data, Борис ТрофімовCowboy dating with big data, Борис Трофімов
Cowboy dating with big data, Борис ТрофімовSigma Software
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in ActionHao Chen
 
BizSpark Startup Night Windows Azure March 29, 2011
BizSpark Startup Night Windows Azure March 29, 2011BizSpark Startup Night Windows Azure March 29, 2011
BizSpark Startup Night Windows Azure March 29, 2011Spiffy
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 

Ähnlich wie Log Data Analysis Platform by Valentin Kropov (20)

Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture Evolvement
 
Server Monitoring (Scaling while bootstrapped)
Server Monitoring  (Scaling while bootstrapped)Server Monitoring  (Scaling while bootstrapped)
Server Monitoring (Scaling while bootstrapped)
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User GroupWhat is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Cowboy Dating with Big Data or DWH Evolution in Action, Борис Трофимов
Cowboy Dating with Big Data or DWH Evolution in Action, Борис ТрофимовCowboy Dating with Big Data or DWH Evolution in Action, Борис Трофимов
Cowboy Dating with Big Data or DWH Evolution in Action, Борис Трофимов
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics Architecture
 
Cowboy dating with big data, Борис Трофімов
Cowboy dating with big data, Борис ТрофімовCowboy dating with big data, Борис Трофімов
Cowboy dating with big data, Борис Трофімов
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
BizSpark Startup Night Windows Azure March 29, 2011
BizSpark Startup Night Windows Azure March 29, 2011BizSpark Startup Night Windows Azure March 29, 2011
BizSpark Startup Night Windows Azure March 29, 2011
 
Evolving Architecture
Evolving ArchitectureEvolving Architecture
Evolving Architecture
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 

Mehr von SoftServe

Digital Product Security
Digital Product SecurityDigital Product Security
Digital Product SecuritySoftServe
 
Testing Tools and Tips
Testing Tools and TipsTesting Tools and Tips
Testing Tools and TipsSoftServe
 
Android Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsAndroid Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsSoftServe
 
Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...SoftServe
 
How to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsHow to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsSoftServe
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution SoftServe
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS SoftServe
 
Implementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowImplementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowSoftServe
 
Using AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondUsing AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondSoftServe
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationSoftServe
 
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...SoftServe
 
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...SoftServe
 
Managing Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovManaging Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovSoftServe
 
How to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullyHow to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullySoftServe
 
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukProduct Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukSoftServe
 
From Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym FedorovFrom Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym FedorovSoftServe
 
Why Ukraine? by Brian Borack, COO
Why Ukraine? by Brian Borack, COOWhy Ukraine? by Brian Borack, COO
Why Ukraine? by Brian Borack, COOSoftServe
 

Mehr von SoftServe (20)

Digital Product Security
Digital Product SecurityDigital Product Security
Digital Product Security
 
Testing Tools and Tips
Testing Tools and TipsTesting Tools and Tips
Testing Tools and Tips
 
Android Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsAndroid Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, Tools
 
Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...
 
How to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsHow to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps Solutions
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS
 
Implementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowImplementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should Know
 
Using AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondUsing AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and Beyond
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
 
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
 
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
 
Managing Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovManaging Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max Markov
 
How to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullyHow to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions Successfully
 
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukProduct Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
 
From Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym FedorovFrom Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym Fedorov
 
Why Ukraine? by Brian Borack, COO
Why Ukraine? by Brian Borack, COOWhy Ukraine? by Brian Borack, COO
Why Ukraine? by Brian Borack, COO
 

Kürzlich hochgeladen

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

Log Data Analysis Platform by Valentin Kropov

  • 1. LOG DATA ANALYSIS PLATFORM May, 2015
  • 2. Agenda 1) User-Group Introduction 2) Problematic 3) Log Data Analysis System Overview 4) Task Analysis 5) Solution Architecture 6) Trade-off Analysis 7) Automation 8) Performance Testing 9) Outcome & Plans
  • 4. Demo Lab: Why we’ve started this project? 1) Increase Internal Experience 2) Create Reference Solution w/o NDA Limitations 3) Get Playground for Tests 4) Provide Demo Environment for Customers (using their data) 5) Decrease time to Market (by introducing automation)
  • 5. LOG DATA ANALYSIS PLATFORM : OVERVIEW
  • 6. Log Data Analysis Platform Details Key Facts: • ~270-300 Web Servers • Log Types: HTTPD Access logs, Error logs, Application Server Servlet, OS Service Logs • ~500K events per minute • 150GB of data per day Technologies: • Flume • Hadoop/HDFS, MapReduce • Hive, Impala • Oozie • Elasticsearch, Kibana 3 • Tableau Analytics platform • Puppet + Vagrant
  • 7. Log Data Examples Access log: 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Error log: [Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed [Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist: /home/httpd/twiki/view/Main/WebHome Vmstat procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0 iostat Linux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011 avg-cpu: %user %nice %system %iowait %steal %idle 5.68 0.00 0.52 2.03 0.00 91.76
  • 10. Architecture Drivers: Quality Attributes (1/3)
  • 11. Architecture Drivers: Quality Attributes (2/3)
  • 12. Architecture Drivers: Quality Attributes (3/3)
  • 16. Solution Architecture Batch Layer Serving Layer Speed Layer Raw Data Storage Data Strea m Real-time Views Static Views Precomputing Precomputing Ad-hoc Batch Views Static Batch Views Corporate BI Tool Legend: Layer boundary Data flow (with direction indicated) Query flow Apache HTTP Servers Raw Data Storage Pre-computing Batch Views Real-Time Views Dashboard/ Search Data Stream Real-Time Processing and Aggregations BI Tool  Avro as a Raw Data Storage file format  Parquet as a Batch Views file format  Star schema as a Batch Views data model
  • 21. Hive Stinger vs Impala Compression Ratio Access Speed
  • 23. Automation (saves time and money) 80% 20% Development and Debugging F&P Testing, Demo Local Development Cloud Development
  • 25. Automation Process Phase Tool Notes VM Provisioning Vagrant — Supports: VirtualBox, VMWare ESX, Amazon AWS VM Bootstraping Puppet — Installs Cloudera Manager, Cloudera Distribution Hadoop, ElasticSearch+Kibana, Flume, Microstrategy, Log Generator. — Creates Cluster using Cloudera Manager API. Configure ETL and BI Puppet — Configures Flume, Oozie, ElasticSearch, Impala, Hive, Microstrategy Dashboards Integration Tests Puppet — Generates Workload and ensures data go through. — Checks Logs for errors. — Calculates timing/throughput.
  • 27. Log Generator 1 Thread can generate: 4200 events / second (File source) 5500 events / second (TCP source)
  • 30. Outcome 1) Demo lab, playground, testing platform (in 1 hour) 2) Sizing Calculator 3) Help to get 3 new customers (one is really, really huge) 4) Strategic Partnership with Cloudera 5) Tons of experience and fun  Plans 1) Add support for other Hadoop Distributions (Hortonworks, MapR) 2) Make Project Open-Source
  • 31. Thank You! 31 SoftServe US Office One Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880 Contacts Valentyn Kropov vkrop@softserveinc.com Tel: 866.687.3588 x4341