SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Data-Driven Development Era
and Its Technologies
Developers Summit 2015 Autumn (Oct 14, 2015)
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, Norikra, Hadoop, ...
Treasure Data, Inc.
HQ
Branch
http://www.treasuredata.com/
Main Topics around "Data"
• Data collection
• Storage
• Data processing
• Batch distributed processing
• Stream processing
• Machine Learning
• Near real-time query & Data lake
• Visualization
Data Analytics Flow
Collect Store Process Visualize
Data source
Reporting
Monitoring
Where before What
Using Services or Not
• Using services fully-managed:
• Google BigQuery & Dataflow
• Treasure Data services
• Using services self-managed:
• Amazon EMR & Redshift
• Google Cloud Dataproc
• Using your own environment & cluster
Using Services or Not
• Using services fully-managed:
• Google BigQuery & Dataflow
• Treasure Data services
• Using services self-managed:
• Amazon EMR & Redshift
• Google Cloud Dataproc
• Using your own environment & cluster
a bit more cost
extremely less efforts
fully controlled by self
extremely more efforts
less cost
less efforts
Using Services or Not:
"Use Services!"
To concentrate
DATA and Analytics,
NOT tools
Why should we use services?
• About distributed systems:
• hard to operate & upgrade
• impossible to "small-start"
• very hard to hire professional engineer
• Data Driven Development:
• collect/store data at first!
• consider output data at second!
• "before building your own environment"
Really? Are you TD guy?
• ...Really!
• But it requires very long discussions :P
• "スタートアップのデータ処理基盤、作るか、使うか"

http://tsuchinoko.dmmlabs.com/?p=1770
How to choose software/services
in
Data-Driven Development
"What" decides "How"
• Distributed systems are to solve problems
• There're many kind of data
• There're many problems
• Systems solve different problems from each other
• There are no "Silver bullet"!
What First, How Second
• What do you want to do?
• Reporting? Analytics? Recommendation? or ...
• What type of data you wan to process?
• Stored large log? Stream sensor data? or ...
• What is you need as result?
• CSV? Spreadsheet? Graph? DB Relation? or ...
How?(just for example)
• MapReduce, Tez
• Large batch jobs, big JOINs, high stability
• Spark
• Small/Middle batch jobs, machine learning
• Impala, Presto, Drill, Redshift, BigQuery
• Near-real-time search, small-to-large analytics
• Storm, Spark streaming
• Stream data conversion/aggregation
"Processing" is just a part
of whole dataflow!
Data Analytics Flow (again)
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Analytics Flow (again)
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Collection
• Data Driven Development -> collect at first!
• As batch: Data already exists as files
• Easily integrated with existing batch systems
• Sqoop, Embulk, ...
• As stream: Data just generated now
• Easily connected with monitoring systems
• Without burst network traffic
• Flume, Logstash, Fluentd, ...
Fluentd: Support Service
by SRA OSS
with Treasure Data
Released
TODAY!
Other Important Topics
• Storage: Performance, Availability, Schema management
• Apache Hadoop HDFS, Apache HBase, Amazon S3, Cloudera Kudu, ...
• Visualization: Functionality, Connectivity, Visibility
• Tableau, Pentaho, Many other enterprise products, ...
• Distributed Queues: Performance, Stability, Connectivity
• Apache Kafka, Amazon Kinesis, ...
Get Familiar with Options
NOT to Take Pains about Technology!
Concentrate
DATA and Analytics,
NOT tools.
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupPresto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupWojciech Biela
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionTamir Dresher
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackElasticsearch
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElasticsearch
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for AnalyticsVaidik Kapoor
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at TwitterBill Graham
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...Spark Summit
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONOutlyer
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 

Was ist angesagt? (20)

Presto
PrestoPresto
Presto
 
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupPresto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 
Hadoop at Ebay
Hadoop at EbayHadoop at Ebay
Hadoop at Ebay
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for Analytics
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 

Andere mochten auch

Stream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LTStream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LTMasahiro Nagano
 
データファースト開発
データファースト開発データファースト開発
データファースト開発Katsunori Kanda
 
失敗から学ぶ データ分析グループの チームマネジメント変遷
失敗から学ぶデータ分析グループのチームマネジメント変遷失敗から学ぶデータ分析グループのチームマネジメント変遷
失敗から学ぶ データ分析グループの チームマネジメント変遷Tokoroten Nakayama
 
hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話eventdotsjp
 
Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupHivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupMakoto Yui
 
PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する
PGに簡単なゲームのやり方を学習させる Vol.1  - まずはQ学習を理解するPGに簡単なゲームのやり方を学習させる Vol.1  - まずはQ学習を理解する
PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する敬 松風敬
 
Henry Cipolla - Data Driven Development
Henry Cipolla - Data Driven DevelopmentHenry Cipolla - Data Driven Development
Henry Cipolla - Data Driven DevelopmentMassTLC
 
Sales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersSales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersTaxJar
 
(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive ApplicationsAmazon Web Services
 
同じサービスを ECSとOpsWorksで 運用してみた
同じサービスをECSとOpsWorksで運用してみた同じサービスをECSとOpsWorksで運用してみた
同じサービスを ECSとOpsWorksで 運用してみたJun Ichikawa
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWSAmazon Web Services
 
Improve Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsImprove Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...Amazon Web Services
 
TDD: Test Driven Development 첫번째 이야기
TDD: Test Driven Development 첫번째 이야기TDD: Test Driven Development 첫번째 이야기
TDD: Test Driven Development 첫번째 이야기Ji Heon Kim
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAmazon Web Services
 

Andere mochten auch (20)

Stream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LTStream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LT
 
データファースト開発
データファースト開発データファースト開発
データファースト開発
 
失敗から学ぶ データ分析グループの チームマネジメント変遷
失敗から学ぶデータ分析グループのチームマネジメント変遷失敗から学ぶデータ分析グループのチームマネジメント変遷
失敗から学ぶ データ分析グループの チームマネジメント変遷
 
hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話
 
Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupHivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetup
 
Sano hmm 20150512
Sano hmm 20150512Sano hmm 20150512
Sano hmm 20150512
 
PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する
PGに簡単なゲームのやり方を学習させる Vol.1  - まずはQ学習を理解するPGに簡単なゲームのやり方を学習させる Vol.1  - まずはQ学習を理解する
PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する
 
Henry Cipolla - Data Driven Development
Henry Cipolla - Data Driven DevelopmentHenry Cipolla - Data Driven Development
Henry Cipolla - Data Driven Development
 
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
 
Sales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersSales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA Sellers
 
(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications
 
同じサービスを ECSとOpsWorksで 運用してみた
同じサービスをECSとOpsWorksで運用してみた同じサービスをECSとOpsWorksで運用してみた
同じサービスを ECSとOpsWorksで 運用してみた
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS
 
Improve Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsImprove Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile Apps
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
 
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
 
TDD: Test Driven Development 첫번째 이야기
TDD: Test Driven Development 첫번째 이야기TDD: Test Driven Development 첫번째 이야기
TDD: Test Driven Development 첫번째 이야기
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWS
 
黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading
黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading
黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading
 

Ähnlich wie Data-Driven Development Era and Its Technologies

Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceMercedes Coyle
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of dataArnon Shimoni
 
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 

Ähnlich wie Data-Driven Development Era and Its Technologies (20)

Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of data
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 

Mehr von SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speedSATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of RubySATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script ConfusingSATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubySATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamSATOSHI TAGOMORI
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd SeasonSATOSHI TAGOMORI
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 

Mehr von SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Data-Driven Development Era and Its Technologies

  • 1. Data-Driven Development Era and Its Technologies Developers Summit 2015 Autumn (Oct 14, 2015) Satoshi Tagomori (@tagomoris)
  • 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, Norikra, Hadoop, ... Treasure Data, Inc.
  • 3.
  • 6. Main Topics around "Data" • Data collection • Storage • Data processing • Batch distributed processing • Stream processing • Machine Learning • Near real-time query & Data lake • Visualization
  • 7. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring
  • 9. Using Services or Not • Using services fully-managed: • Google BigQuery & Dataflow • Treasure Data services • Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc • Using your own environment & cluster
  • 10. Using Services or Not • Using services fully-managed: • Google BigQuery & Dataflow • Treasure Data services • Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc • Using your own environment & cluster a bit more cost extremely less efforts fully controlled by self extremely more efforts less cost less efforts
  • 11. Using Services or Not: "Use Services!" To concentrate DATA and Analytics, NOT tools
  • 12. Why should we use services? • About distributed systems: • hard to operate & upgrade • impossible to "small-start" • very hard to hire professional engineer • Data Driven Development: • collect/store data at first! • consider output data at second! • "before building your own environment"
  • 13. Really? Are you TD guy? • ...Really! • But it requires very long discussions :P • "スタートアップのデータ処理基盤、作るか、使うか"
 http://tsuchinoko.dmmlabs.com/?p=1770
  • 14. How to choose software/services in Data-Driven Development
  • 15. "What" decides "How" • Distributed systems are to solve problems • There're many kind of data • There're many problems • Systems solve different problems from each other • There are no "Silver bullet"!
  • 16. What First, How Second • What do you want to do? • Reporting? Analytics? Recommendation? or ... • What type of data you wan to process? • Stored large log? Stream sensor data? or ... • What is you need as result? • CSV? Spreadsheet? Graph? DB Relation? or ...
  • 17. How?(just for example) • MapReduce, Tez • Large batch jobs, big JOINs, high stability • Spark • Small/Middle batch jobs, machine learning • Impala, Presto, Drill, Redshift, BigQuery • Near-real-time search, small-to-large analytics • Storm, Spark streaming • Stream data conversion/aggregation
  • 18. "Processing" is just a part of whole dataflow!
  • 19. Data Analytics Flow (again) Collect Store Process Visualize Data source Reporting Monitoring
  • 20. Data Analytics Flow (again) Collect Store Process Visualize Data source Reporting Monitoring
  • 21. Data Collection • Data Driven Development -> collect at first! • As batch: Data already exists as files • Easily integrated with existing batch systems • Sqoop, Embulk, ... • As stream: Data just generated now • Easily connected with monitoring systems • Without burst network traffic • Flume, Logstash, Fluentd, ...
  • 22. Fluentd: Support Service by SRA OSS with Treasure Data Released TODAY!
  • 23. Other Important Topics • Storage: Performance, Availability, Schema management • Apache Hadoop HDFS, Apache HBase, Amazon S3, Cloudera Kudu, ... • Visualization: Functionality, Connectivity, Visibility • Tableau, Pentaho, Many other enterprise products, ... • Distributed Queues: Performance, Stability, Connectivity • Apache Kafka, Amazon Kinesis, ...
  • 24. Get Familiar with Options NOT to Take Pains about Technology!