Data-Driven Development Era and Its Technologies

•

10 gefällt mir•8,572 views

SATOSHI TAGOMORI

Developers Summit 2015 Autumn, Data Tech

Technologie

Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, Norikra, Hadoop, ...
Treasure Data, Inc.

Main Topics around "Data"
• Data collection
• Storage
• Data processing
• Batch distributed processing
• Stream processing
• Machine Learning
• Near real-time query & Data lake
• Visualization

Data Analytics Flow
Collect Store Process Visualize
Data source
Reporting
Monitoring

Using Services or Not
• Using services fully-managed:
• Google BigQuery & Dataﬂow
• Treasure Data services
• Using services self-managed:
• Amazon EMR & Redshift
• Google Cloud Dataproc
• Using your own environment & cluster

Using Services or Not:
"Use Services!"
To concentrate
DATA and Analytics,
NOT tools

Why should we use services?
• About distributed systems:
• hard to operate & upgrade
• impossible to "small-start"
• very hard to hire professional engineer
• Data Driven Development:
• collect/store data at ﬁrst!
• consider output data at second!
• "before building your own environment"

Really? Are you TD guy?
• ...Really!
• But it requires very long discussions :P
• "スタートアップのデータ処理基盤、作るか、使うか" 
http://tsuchinoko.dmmlabs.com/?p=1770

How to choose software/services
in
Data-Driven Development

"What" decides "How"
• Distributed systems are to solve problems
• There're many kind of data
• There're many problems
• Systems solve different problems from each other
• There are no "Silver bullet"!

What First, How Second
• What do you want to do?
• Reporting? Analytics? Recommendation? or ...
• What type of data you wan to process?
• Stored large log? Stream sensor data? or ...
• What is you need as result?
• CSV? Spreadsheet? Graph? DB Relation? or ...

How?(just for example)
• MapReduce, Tez
• Large batch jobs, big JOINs, high stability
• Spark
• Small/Middle batch jobs, machine learning
• Impala, Presto, Drill, Redshift, BigQuery
• Near-real-time search, small-to-large analytics
• Storm, Spark streaming
• Stream data conversion/aggregation

"Processing" is just a part
of whole dataﬂow!

Data Analytics Flow (again)
Collect Store Process Visualize
Data source
Reporting
Monitoring

Data Collection
• Data Driven Development -> collect at ﬁrst!
• As batch: Data already exists as ﬁles
• Easily integrated with existing batch systems
• Sqoop, Embulk, ...
• As stream: Data just generated now
• Easily connected with monitoring systems
• Without burst network trafﬁc
• Flume, Logstash, Fluentd, ...

Fluentd: Support Service
by SRA OSS
with Treasure Data
Released
TODAY!

Other Important Topics
• Storage: Performance, Availability, Schema management
• Apache Hadoop HDFS, Apache HBase, Amazon S3, Cloudera Kudu, ...
• Visualization: Functionality, Connectivity, Visibility
• Tableau, Pentaho, Many other enterprise products, ...
• Distributed Queues: Performance, Stability, Connectivity
• Apache Kafka, Amazon Kinesis, ...

Get Familiar with Options
NOT to Take Pains about Technology!

Concentrate
DATA and Analytics,
NOT tools.
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

PrestoChen Chun

Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupWojciech Biela

Where Is My Data - ILTAM SessionTamir Dresher

Augmenting Mongo DB with treasure dataTreasure Data, Inc.

A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe

Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito

Large Scale Graph Analytics with JanusGraphP. Taylor Goetz

Análisis del roadmap del Elastic StackElasticsearch

Hadoop at EbayAroop Maliakkal

Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd

Elastic Stack roadmap deep diveElasticsearch

ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI

Using Elasticsearch for AnalyticsVaidik Kapoor

Using Embulk at Treasure DataTreasure Data, Inc.

Presto at TwitterBill Graham

How to ensure Presto scalability  in multi use case Kai Sasaki

A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...Spark Summit

Presto updates to 0.178Kai Sasaki

Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONOutlyer

Amazon Elastic Map Reduce - Ian Meyershuguk

Was ist angesagt? (20)

Presto

Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup

Where Is My Data - ILTAM Session

Augmenting Mongo DB with treasure data

A Day in the Life of a Druid Implementor and Druid's Roadmap

Presto @ Treasure Data - Presto Meetup Boston 2015

Large Scale Graph Analytics with JanusGraph

Análisis del roadmap del Elastic Stack

Hadoop at Ebay

Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev

Elastic Stack roadmap deep dive

ストリーム処理を支えるキューイングシステムの選び方

Using Elasticsearch for Analytics

Using Embulk at Treasure Data

Presto at Twitter

How to ensure Presto scalability  in multi use case

A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...

Presto updates to 0.178

Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON

Amazon Elastic Map Reduce - Ian Meyers

Andere mochten auch

Stream processing in Mercari - Devsumi 2015 autumn LTMasahiro Nagano

データファースト開発Katsunori Kanda

失敗から学ぶデータ分析グループのチームマネジメント変遷Tokoroten Nakayama

hivemallを使って4日間で性別推定した話eventdotsjp

Hivemall v0.3の機能紹介＠1st Hivemall meetupMakoto Yui

Sano hmm 20150512Masakazu Sano

PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する敬松風敬

Henry Cipolla - Data Driven DevelopmentMassTLC

AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...Michigan Tech Research Institute - AURA System

Sales Tax Bootcamp for Amazon FBA SellersTaxJar

(MBL309) Analyze Mobile App Data and Build Predictive ApplicationsAmazon Web Services

同じサービスをECSとOpsWorksで運用してみたJun Ichikawa

(BDT205) Your First Big Data Application On AWSAmazon Web Services

Improve Monitoring & Monetization of Your Mobile AppsAmazon Web Services

Building Your First Big Data Application on AWSAmazon Web Services

BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012Amazon Web Services

(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...Amazon Web Services

TDD: Test Driven Development 첫번째 이야기Ji Heon Kim

Analytics on the Cloud with Tableau on AWSAmazon Web Services

黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreadingYahoo!デベロッパーネットワーク

Andere mochten auch (20)

Stream processing in Mercari - Devsumi 2015 autumn LT

データファースト開発

失敗から学ぶデータ分析グループのチームマネジメント変遷

hivemallを使って4日間で性別推定した話

Hivemall v0.3の機能紹介＠1st Hivemall meetup

Sano hmm 20150512

PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する

Henry Cipolla - Data Driven Development

AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...

Sales Tax Bootcamp for Amazon FBA Sellers

(MBL309) Analyze Mobile App Data and Build Predictive Applications

同じサービスをECSとOpsWorksで運用してみた

(BDT205) Your First Big Data Application On AWS

Improve Monitoring & Monetization of Your Mobile Apps

Building Your First Big Data Application on AWS

BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...

TDD: Test Driven Development 첫번째 이야기

Analytics on the Cloud with Tableau on AWS

黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading

Ähnlich wie Data-Driven Development Era and Its Technologies

Levelling up your data infrastructureSimon Belak

IARE_BDBA_ PPT_0.pptxAIMLSEMINARS

5 Things that Make Hadoop a Game ChangerCaserta

Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics

Open Data Summit Presentation by Joe OlsenChristopher Whitaker

An overview of modern scalable web developmentTung Nguyen

Big data.pptIdontKnow66967

Lecture1Manish Singh

Data Scientist ToolboxAndrei Savu

MS Azure with IoT - Final VersionJanani Eshwaran

Data Care, Feeding, and MaintenanceMercedes Coyle

Demystifying data engineeringThang Bui (Bob)

The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group

Introduction To Big Data & HadoopBlackvard

Accelerating analytics in a new era of dataArnon Shimoni

Architecting a datalakeLaurent Leturgez

WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2

Real Time Big Data Processing on AWSCaserta

Ähnlich wie Data-Driven Development Era and Its Technologies (20)

Levelling up your data infrastructure

IARE_BDBA_ PPT_0.pptx

5 Things that Make Hadoop a Game Changer

Lecture1 BIG DATA and Types of data in details

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...

Open Data Summit Presentation by Joe Olsen

An overview of modern scalable web development

Big data.ppt

Lecture1

Data Scientist Toolbox

MS Azure with IoT - Final Version

Data Care, Feeding, and Maintenance

Demystifying data engineering

The Data Lake and Getting Buisnesses the Big Data Insights They Need

Introduction To Big Data & Hadoop

Accelerating analytics in a new era of data

Architecting a datalake

WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World

Real Time Big Data Processing on AWS

Mehr von SATOSHI TAGOMORI

Ractor's speed is not light-speedSATOSHI TAGOMORI

Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI

Maccro Strikes BackSATOSHI TAGOMORI

Invitation to the dark side of RubySATOSHI TAGOMORI

Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI

Make Your Ruby Script ConfusingSATOSHI TAGOMORI

Hijacking Ruby Syntax in RubySATOSHI TAGOMORI

Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI

Data Processing and Ruby in the WorldSATOSHI TAGOMORI

Planet-scale Data Ingestion Pipeline: BigdamSATOSHI TAGOMORI

Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI

Ruby and Distributed Storage SystemsSATOSHI TAGOMORI

Perfect Norikra 2nd SeasonSATOSHI TAGOMORI

Fluentd 101SATOSHI TAGOMORI

The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI

How To Write Middleware In RubySATOSHI TAGOMORI

Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI

Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI

Fluentd Overview, Now and ThenSATOSHI TAGOMORI

How to Make Norikra PerfectSATOSHI TAGOMORI

Mehr von SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed

Good Things and Hard Things of SaaS Development/Operations

Maccro Strikes Back

Invitation to the dark side of Ruby

Hijacking Ruby Syntax in Ruby (RubyConf 2018)

Make Your Ruby Script Confusing

Hijacking Ruby Syntax in Ruby

Lock, Concurrency and Throughput of Exclusive Operations

Data Processing and Ruby in the World

Planet-scale Data Ingestion Pipeline: Bigdam

Technologies, Data Analytics Service and Enterprise Business

Ruby and Distributed Storage Systems

Perfect Norikra 2nd Season

Fluentd 101

The Patterns of Distributed Logging and Containers

How To Write Middleware In Ruby

Modern Black Mages Fighting in the Real World

Open Source Software, Distributed Systems, Database as a Cloud Service

Fluentd Overview, Now and Then

How to Make Norikra Perfect

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Histor y of HAM Radio presentation slidevu2urc

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Real Time Object Detection Using Open CVKhem

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Partners Life - Insurer Innovation Award 2024The Digital Insurer

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker

Exploring the Future Potential of AI-Enabled Smartphone Processors

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Scaling API-first – The story of a global engineering organization

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Data Cloud, More than a CDP by Matt Robison

GenCyber Cyber Security Day Presentation

Histor y of HAM Radio presentation slide

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Real Time Object Detection Using Open CV

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

presentation ICT roal in 21st century education

Automating Google Workspace (GWS) & more with Apps Script

Boost PC performance: How more available memory can improve productivity

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Partners Life - Insurer Innovation Award 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Data-Driven Development Era and Its Technologies

1. Data-Driven Development Era and Its Technologies Developers Summit 2015 Autumn (Oct 14, 2015) Satoshi Tagomori (@tagomoris)

2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, Norikra, Hadoop, ... Treasure Data, Inc.

4. HQ Branch

5. http://www.treasuredata.com/

6. Main Topics around "Data" • Data collection • Storage • Data processing • Batch distributed processing • Stream processing • Machine Learning • Near real-time query & Data lake • Visualization

7. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring

8. Where before What

9. Using Services or Not • Using services fully-managed: • Google BigQuery & Dataﬂow • Treasure Data services • Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc • Using your own environment & cluster

10. Using Services or Not • Using services fully-managed: • Google BigQuery & Dataﬂow • Treasure Data services • Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc • Using your own environment & cluster a bit more cost extremely less efforts fully controlled by self extremely more efforts less cost less efforts

11. Using Services or Not: "Use Services!" To concentrate DATA and Analytics, NOT tools

12. Why should we use services? • About distributed systems: • hard to operate & upgrade • impossible to "small-start" • very hard to hire professional engineer • Data Driven Development: • collect/store data at ﬁrst! • consider output data at second! • "before building your own environment"

13. Really? Are you TD guy? • ...Really! • But it requires very long discussions :P • "スタートアップのデータ処理基盤、作るか、使うか"  http://tsuchinoko.dmmlabs.com/?p=1770

14. How to choose software/services in Data-Driven Development

15. "What" decides "How" • Distributed systems are to solve problems • There're many kind of data • There're many problems • Systems solve different problems from each other • There are no "Silver bullet"!

16. What First, How Second • What do you want to do? • Reporting? Analytics? Recommendation? or ... • What type of data you wan to process? • Stored large log? Stream sensor data? or ... • What is you need as result? • CSV? Spreadsheet? Graph? DB Relation? or ...

17. How?(just for example) • MapReduce, Tez • Large batch jobs, big JOINs, high stability • Spark • Small/Middle batch jobs, machine learning • Impala, Presto, Drill, Redshift, BigQuery • Near-real-time search, small-to-large analytics • Storm, Spark streaming • Stream data conversion/aggregation

18. "Processing" is just a part of whole dataﬂow!

19. Data Analytics Flow (again) Collect Store Process Visualize Data source Reporting Monitoring

20. Data Analytics Flow (again) Collect Store Process Visualize Data source Reporting Monitoring

21. Data Collection • Data Driven Development -> collect at first! • As batch: Data already exists as files • Easily integrated with existing batch systems • Sqoop, Embulk, ... • As stream: Data just generated now • Easily connected with monitoring systems • Without burst network traffic • Flume, Logstash, Fluentd, ...

22. Fluentd: Support Service by SRA OSS with Treasure Data Released TODAY!

23. Other Important Topics • Storage: Performance, Availability, Schema management • Apache Hadoop HDFS, Apache HBase, Amazon S3, Cloudera Kudu, ... • Visualization: Functionality, Connectivity, Visibility • Tableau, Pentaho, Many other enterprise products, ... • Distributed Queues: Performance, Stability, Connectivity • Apache Kafka, Amazon Kinesis, ...

24. Get Familiar with Options NOT to Take Pains about Technology!

25. Concentrate DATA and Analytics, NOT tools. Thanks!

Data-Driven Development Era and Its Technologies

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Data-Driven Development Era and Its Technologies

Ähnlich wie Data-Driven Development Era and Its Technologies (20)

Mehr von SATOSHI TAGOMORI

Mehr von SATOSHI TAGOMORI (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Data-Driven Development Era and Its Technologies