SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Masahiro Nakagawa
Feb 7, 2015
dots. Summit 2015
Treasure Data

and OSS
Who are you?
> Masahiro Nakagawa
> github/twitter: @repeatedly
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> I love OSS :)
> D language - Phobos committer
> Fluentd - Main maintainer
> MessagePack / RPC - D and Python (only RPC)
> The organizer of several meetups (Presto, DTM, etc)
> etc…
Company background
•  Founded 2011 in Mountain View, CA!
–  The first cloud service for the entire data
pipeline!
–  Including: Acquisition, Storage, & Analysis!
•  Provide a “Cloud Data Service”!
–  Fast Time to Value!
–  Cloud Flexibility and Economics!
–  Simple and Well Supported!
•  Treasure Data has over 100+ customers
in production!
–  Incl. Fortune 500 companies!
–  400k new records / second!
–  Almost 9 Trillion records loaded!
–  Variety of use cases and verticals!
The Treasure Data Team
Hiro Yoshikawa – CEO
Open source business veteran
Kaz Ohta – CTO
Founder of world’s largest Hadoop Group
Sada Furuhashi – Software Architect
MessagaPack / Fluentd Author
Notable Investors
Othman Laraki
Ex-VP of Growth at Twitter
Jerry Yang
Founder of Yahoo!
Yukihiro “Matz” Matusmoto
Creator of “Ruby” programming language
James Lindenbaum
Founder of Heroku
Sierra Ventures - Tim Guleri
Leading venture capital firm in Big Data
TD Service Architecture
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Data Acquisition
Log collecting in TD
> Treasure Agent
> Fluentd based log collector
> Embulk
> JavaScript SDK
> Mobile SDK (iOS, Android, Unity)
Structured logging	

!
Reliable forwarding	

!
Pluggable architecture
http://fluentd.org/
Fluentd
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Written in Ruby
> Gem based various plugins
> http://www.fluentd.org/plugins
> Working in production
> http://www.fluentd.org/testimonials
Data Analytics Flow
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Analytics Flow
Store Process
Cloudera
Horton Works
Treasure Data
Collect Visualize
Tableau
Excel
R
easier & shorter time
???
Divide & Conquer & Retry
error retry
error retry retry
retry
Batch
Stream
Other stream
Core Plugins
> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism
> read / receive data
> from API, database,

command, etc…
> write / send data
> to API, database, alert,
graph, etc…
Architecture (v0.12 or later)
EngineInput
Filter Output
Buffer
> grep
> record_transfomer	

> …
> Forward	

> File tail	

> ...
> Forward	

> File	

> ...
Output
> File	

> Memory
not pluggable
FormatterParser
Before (M x N)
After (M + N)
or Embulk
Other Fluentd related OSS
> Treasure Agent
> https://github.com/treasure-data/omnibus-td-agent
> Fluentd Forwarder
> https://github.com/fluent/fluentd-forwarder
> Simple forwarder for Windows / Leaf node
> Fluentd UI
> https://github.com/fluent/fluentd-ui
> Management web UI
Other OSS products
> Scribed (C++)
> Developed by Facebook
> No maintained
> Apache Flume (Java)
> Mainly for Hadoop HDFS / HBase
> Logstash (JRuby)
> Mainly for Elasticsearch
Embulk
> Bulk Loader version of Fluentd
> Pluggable architecture
> JRuby, JVM languages (TBD)
> High performance parallel processing
> Share your script as a plugin
> https://github.com/embulk
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed
HDFS
MySQL
Amazon S3
Embulk
CSV Files
SequenceFile
Salesforce.com
Elasticsearch
Cassandra
Hive
Redis
✓ Parallel execution
✓ Data validation
✓ Error recovery
✓ Deterministic behaviour
✓ Idempotent retrying
Plugins Plugins
bulk load
Computing Framework
3 query engines in TD
> Hive (HiveQL, Batch)
> for ETL and large jobs
> Hivemall for machine learning
> Pig (Pig Latin, Batch)
> DataFu for data mining and statistics
> Presto (SQL, Short batch)
> for Ad hoc queries
Hadoop
> Distributed computing framework
> Consist of many components…













http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
http://nosqlessentials.com/
http://nosqlessentials.com/
> Low level framework for YARN applications
> New Query Engine
> Provide good IR for Hive, Pig and more
> Task and DAG based pipelining







Apache Tez
ProcessorInput Output
Task DAG
http://tez.apache.org/
Hive on MR vs. Hive on Tez
MapReduce Tez
http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/9
M
HDFS
R
R
M M
HDFS HDFS
R
M M
R
M M
R
M
R
M MM
M M
R
R
R
Avoid unnecessary HDFS write!
SELECT g1.x, g2.avg, g2.cnt

FROM (SELECT a.x AVERAGE(a.y) AS avg FROM a GROUP BY a.x) g1"
JOIN (SELECT b.x, COUNT(b.y) AS avg FROM b GROUP BY b.x) g2"
ON (g1.x = g2.x) ORDER BY avg;
GROUP b BY b.xGROUP a BY a.x
JOIN (a, b)
ORDER BY
GROUP BY x
GROUP BY a.x"
JOIN (a, b)
ORDER BY
Other OSS products
> Apache Spark
> Mainly for on-memory processing
> Spark ecosystem is now growing
> Apache Flink
> Mainly for iterative processing
> Microsoft’s Dryad
> This was premature for human being…
Presto
A distributed SQL query engine

for interactive data analisys

against GBs to PBs of data.
Presto overview
> Open sourced by Facebook
> http://prestodb.io/
> written in Java
> Built-in useful features
> Connectors
> Machine Learning
> Window function
> Approximate query
> etc…
> Used by Netflix, Dropbox, Treasure Data,
Qubole, Airbnb, LINE, GREE, Scaleout, etc
HDFS
Hive
PostgreSQL, etc.
Daily/Hourly Batch
Interactive query
Commercial

BI Tools
Batch analysis platform Visualization platform
Dashboard
HDFS
Hive
Daily/Hourly Batch
Interactive query
✓ Less scalable 
✓ Extra cost
Commercial

BI Tools
Dashboard
✓ More work to manage

2 platforms
✓ Can’t query against

“live” data directly
Batch analysis platform Visualization platform
PostgreSQL, etc.
HDFS
Hive Dashboard
Presto
PostgreSQL, etc.
Daily/Hourly Batch
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Interactive query
Presto
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Cassandra MySQL Commertial DBs
SQL on any data sets Commercial

BI Tools
✓ IBM Cognos

✓ Tableau

✓ ...
Data analysis platform
All stages are pipe-
lined
✓ No wait time
✓ No fault-tolerance
MapReduce vs. Presto
MapReduce Presto
map map
reduce reduce
task task
task task
task
task
memory-to-memory
data transfer
✓ No disk IO
✓ Data chunk must
fit in memory
task
disk
map map
reduce reduce
disk
disk
Write data

to disk
Wait between

stages
Other OSS products
> Cloudera Impala
> Mainly for HDFS / HBase
> Apache Drill
> More flexible architecture
> Apache Tajo
> For building data warehouse
Visualization
Hmm…
> There are no popular OSS products
> We don’t focus on developing
visualization tool for now
> Commercial BI tools are popular
> Tableau, Motion board and etc
> Maybe, next presentation talk about

this area deeply
Treasure Data resources
> https://github.com/treasure-data
> perfectqueue, perfectsched, etc
> https://sql.treasuredata.com/
> HiveQL syntax checker
> https://examples.treasuredata.com/
> Query catalog
http://blog.treasuredata.com/2014/11/26/12-open-source-
software-innovations-from-treasure-data-engineers/
Check: treasuredata.com
Cloud service for the entire data pipeline

Weitere ähnliche Inhalte

Was ist angesagt?

Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerTreasure Data, Inc.
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToSATOSHI TAGOMORI
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at TwitterBill Graham
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB Database
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Apache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsApache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsKasper Sørensen
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSMax Neunhöffer
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 

Was ist angesagt? (20)

Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Software + Babies
Software + BabiesSoftware + Babies
Software + Babies
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQL
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Apache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsApache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data points
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Presto
PrestoPresto
Presto
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 

Andere mochten auch

Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionDataWorks Summit
 
20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのか20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのかKatsunori Kanda
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
20150207 dots ラクスルの開発体制
20150207 dots ラクスルの開発体制20150207 dots ラクスルの開発体制
20150207 dots ラクスルの開発体制Raksul Inc.
 
Gradleでビルドするandroid NDKアプリ
Gradleでビルドするandroid NDKアプリGradleでビルドするandroid NDKアプリ
Gradleでビルドするandroid NDKアプリHideyuki Kikuma
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSTreasure Data, Inc.
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezGw Liu
 
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)Naoki (Neo) SATO
 
SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法
SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法
SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法Hideyuki Takeuchi
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...MapR Technologies Japan
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Hadoopカンファレンス20140707
Hadoopカンファレンス20140707Hadoopカンファレンス20140707
Hadoopカンファレンス20140707Recruit Technologies
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版
会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版
会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版Takuma Morikawa
 
[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기NAVER D2
 
ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!Nagato Kasaki
 

Andere mochten auch (20)

Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or question
 
20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのか20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのか
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
20150207 dots ラクスルの開発体制
20150207 dots ラクスルの開発体制20150207 dots ラクスルの開発体制
20150207 dots ラクスルの開発体制
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Gradleでビルドするandroid NDKアプリ
Gradleでビルドするandroid NDKアプリGradleでビルドするandroid NDKアプリ
Gradleでビルドするandroid NDKアプリ
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on Tez
 
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
 
SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法
SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法
SPEEDA/NewsPicksを支える価値を生み出す技術の選定手法
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Hadoopカンファレンス20140707
Hadoopカンファレンス20140707Hadoopカンファレンス20140707
Hadoopカンファレンス20140707
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版
会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版
会員数180万人のマッチングサービスpairsの 急成長を支える技術基盤 ディレクターズカット版
 
[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기
 
ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!ゼロから始めるSparkSQL徹底活用!
ゼロから始めるSparkSQL徹底活用!
 

Ähnlich wie Treasure Data and OSS

SQL for Everything at CWT2014
SQL for Everything at CWT2014SQL for Everything at CWT2014
SQL for Everything at CWT2014N Masahiro
 
Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4N Masahiro
 
Fluentd - RubyKansai 65
Fluentd - RubyKansai 65Fluentd - RubyKansai 65
Fluentd - RubyKansai 65N Masahiro
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collectorMuga Nishizawa
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 

Ähnlich wie Treasure Data and OSS (20)

SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
SQL for Everything at CWT2014
SQL for Everything at CWT2014SQL for Everything at CWT2014
SQL for Everything at CWT2014
 
Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4
 
Fluentd - RubyKansai 65
Fluentd - RubyKansai 65Fluentd - RubyKansai 65
Fluentd - RubyKansai 65
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 

Mehr von N Masahiro

Fluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUFluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUN Masahiro
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalkN Masahiro
 
Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconN Masahiro
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellN Masahiro
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellN Masahiro
 
Presto changes
Presto changesPresto changes
Presto changesN Masahiro
 
Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOSconN Masahiro
 
Fluentd v0.14 Overview
Fluentd v0.14 OverviewFluentd v0.14 Overview
Fluentd v0.14 OverviewN Masahiro
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and KafkaN Masahiro
 
fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14N Masahiro
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12N Masahiro
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and FluentdN Masahiro
 
How to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdataHow to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdataN Masahiro
 
Fluentd v0.12 master guide
Fluentd v0.12 master guideFluentd v0.12 master guide
Fluentd v0.12 master guideN Masahiro
 
Fluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At FossasiaFluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At FossasiaN Masahiro
 
Fluentd - road to v1 -
Fluentd - road to v1 -Fluentd - road to v1 -
Fluentd - road to v1 -N Masahiro
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014N Masahiro
 
Can you say the same words even in oss
Can you say the same words even in ossCan you say the same words even in oss
Can you say the same words even in ossN Masahiro
 
I am learing the programming
I am learing the programmingI am learing the programming
I am learing the programmingN Masahiro
 

Mehr von N Masahiro (20)

Fluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUFluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EU
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalk
 
Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Presto changes
Presto changesPresto changes
Presto changes
 
Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOScon
 
Fluentd v0.14 Overview
Fluentd v0.14 OverviewFluentd v0.14 Overview
Fluentd v0.14 Overview
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
 
fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and Fluentd
 
How to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdataHow to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdata
 
Fluentd v0.12 master guide
Fluentd v0.12 master guideFluentd v0.12 master guide
Fluentd v0.12 master guide
 
Fluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At FossasiaFluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At Fossasia
 
Fluentd - road to v1 -
Fluentd - road to v1 -Fluentd - road to v1 -
Fluentd - road to v1 -
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
 
Can you say the same words even in oss
Can you say the same words even in ossCan you say the same words even in oss
Can you say the same words even in oss
 
I am learing the programming
I am learing the programmingI am learing the programming
I am learing the programming
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Treasure Data and OSS

  • 1. Masahiro Nakagawa Feb 7, 2015 dots. Summit 2015 Treasure Data
 and OSS
  • 2. Who are you? > Masahiro Nakagawa > github/twitter: @repeatedly > Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer > I love OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC - D and Python (only RPC) > The organizer of several meetups (Presto, DTM, etc) > etc…
  • 3. Company background •  Founded 2011 in Mountain View, CA! –  The first cloud service for the entire data pipeline! –  Including: Acquisition, Storage, & Analysis! •  Provide a “Cloud Data Service”! –  Fast Time to Value! –  Cloud Flexibility and Economics! –  Simple and Well Supported! •  Treasure Data has over 100+ customers in production! –  Incl. Fortune 500 companies! –  400k new records / second! –  Almost 9 Trillion records loaded! –  Variety of use cases and verticals! The Treasure Data Team Hiro Yoshikawa – CEO Open source business veteran Kaz Ohta – CTO Founder of world’s largest Hadoop Group Sada Furuhashi – Software Architect MessagaPack / Fluentd Author Notable Investors Othman Laraki Ex-VP of Growth at Twitter Jerry Yang Founder of Yahoo! Yukihiro “Matz” Matusmoto Creator of “Ruby” programming language James Lindenbaum Founder of Heroku Sierra Ventures - Tim Guleri Leading venture capital firm in Big Data
  • 4. TD Service Architecture Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported
  • 6. Log collecting in TD > Treasure Agent > Fluentd based log collector > Embulk > JavaScript SDK > Mobile SDK (iOS, Android, Unity)
  • 7. Structured logging ! Reliable forwarding ! Pluggable architecture http://fluentd.org/
  • 8. Fluentd > Data collector for unified logging layer > Streaming data transfer based on JSON > Written in Ruby > Gem based various plugins > http://www.fluentd.org/plugins > Working in production > http://www.fluentd.org/testimonials
  • 9. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring
  • 10. Data Analytics Flow Store Process Cloudera Horton Works Treasure Data Collect Visualize Tableau Excel R easier & shorter time ???
  • 11. Divide & Conquer & Retry error retry error retry retry retry Batch Stream Other stream
  • 12. Core Plugins > Divide & Conquer
 > Buffering & Retrying
 > Error handling
 > Message routing
 > Parallelism > read / receive data > from API, database,
 command, etc… > write / send data > to API, database, alert, graph, etc…
  • 13. Architecture (v0.12 or later) EngineInput Filter Output Buffer > grep > record_transfomer > … > Forward > File tail > ... > Forward > File > ... Output > File > Memory not pluggable FormatterParser
  • 15. After (M + N) or Embulk
  • 16. Other Fluentd related OSS > Treasure Agent > https://github.com/treasure-data/omnibus-td-agent > Fluentd Forwarder > https://github.com/fluent/fluentd-forwarder > Simple forwarder for Windows / Leaf node > Fluentd UI > https://github.com/fluent/fluentd-ui > Management web UI
  • 17. Other OSS products > Scribed (C++) > Developed by Facebook > No maintained > Apache Flume (Java) > Mainly for Hadoop HDFS / HBase > Logstash (JRuby) > Mainly for Elasticsearch
  • 18. Embulk > Bulk Loader version of Fluentd > Pluggable architecture > JRuby, JVM languages (TBD) > High performance parallel processing > Share your script as a plugin > https://github.com/embulk http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed
  • 19. HDFS MySQL Amazon S3 Embulk CSV Files SequenceFile Salesforce.com Elasticsearch Cassandra Hive Redis ✓ Parallel execution ✓ Data validation ✓ Error recovery ✓ Deterministic behaviour ✓ Idempotent retrying Plugins Plugins bulk load
  • 21. 3 query engines in TD > Hive (HiveQL, Batch) > for ETL and large jobs > Hivemall for machine learning > Pig (Pig Latin, Batch) > DataFu for data mining and statistics > Presto (SQL, Short batch) > for Ad hoc queries
  • 22. Hadoop > Distributed computing framework > Consist of many components…
 
 
 
 
 
 
 http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
  • 25. > Low level framework for YARN applications > New Query Engine > Provide good IR for Hive, Pig and more > Task and DAG based pipelining
 
 
 
 Apache Tez ProcessorInput Output Task DAG http://tez.apache.org/
  • 26. Hive on MR vs. Hive on Tez MapReduce Tez http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/9 M HDFS R R M M HDFS HDFS R M M R M M R M R M MM M M R R R Avoid unnecessary HDFS write! SELECT g1.x, g2.avg, g2.cnt
 FROM (SELECT a.x AVERAGE(a.y) AS avg FROM a GROUP BY a.x) g1" JOIN (SELECT b.x, COUNT(b.y) AS avg FROM b GROUP BY b.x) g2" ON (g1.x = g2.x) ORDER BY avg; GROUP b BY b.xGROUP a BY a.x JOIN (a, b) ORDER BY GROUP BY x GROUP BY a.x" JOIN (a, b) ORDER BY
  • 27. Other OSS products > Apache Spark > Mainly for on-memory processing > Spark ecosystem is now growing > Apache Flink > Mainly for iterative processing > Microsoft’s Dryad > This was premature for human being…
  • 28. Presto A distributed SQL query engine
 for interactive data analisys
 against GBs to PBs of data.
  • 29. Presto overview > Open sourced by Facebook > http://prestodb.io/ > written in Java > Built-in useful features > Connectors > Machine Learning > Window function > Approximate query > etc… > Used by Netflix, Dropbox, Treasure Data, Qubole, Airbnb, LINE, GREE, Scaleout, etc
  • 30. HDFS Hive PostgreSQL, etc. Daily/Hourly Batch Interactive query Commercial
 BI Tools Batch analysis platform Visualization platform Dashboard
  • 31. HDFS Hive Daily/Hourly Batch Interactive query ✓ Less scalable ✓ Extra cost Commercial
 BI Tools Dashboard ✓ More work to manage
 2 platforms ✓ Can’t query against
 “live” data directly Batch analysis platform Visualization platform PostgreSQL, etc.
  • 32. HDFS Hive Dashboard Presto PostgreSQL, etc. Daily/Hourly Batch HDFS Hive Dashboard Daily/Hourly Batch Interactive query Interactive query
  • 33. Presto HDFS Hive Dashboard Daily/Hourly Batch Interactive query Cassandra MySQL Commertial DBs SQL on any data sets Commercial
 BI Tools ✓ IBM Cognos
 ✓ Tableau
 ✓ ... Data analysis platform
  • 34. All stages are pipe- lined ✓ No wait time ✓ No fault-tolerance MapReduce vs. Presto MapReduce Presto map map reduce reduce task task task task task task memory-to-memory data transfer ✓ No disk IO ✓ Data chunk must fit in memory task disk map map reduce reduce disk disk Write data
 to disk Wait between
 stages
  • 35. Other OSS products > Cloudera Impala > Mainly for HDFS / HBase > Apache Drill > More flexible architecture > Apache Tajo > For building data warehouse
  • 37. Hmm… > There are no popular OSS products > We don’t focus on developing visualization tool for now > Commercial BI tools are popular > Tableau, Motion board and etc > Maybe, next presentation talk about
 this area deeply
  • 38. Treasure Data resources > https://github.com/treasure-data > perfectqueue, perfectsched, etc > https://sql.treasuredata.com/ > HiveQL syntax checker > https://examples.treasuredata.com/ > Query catalog http://blog.treasuredata.com/2014/11/26/12-open-source- software-innovations-from-treasure-data-engineers/
  • 39. Check: treasuredata.com Cloud service for the entire data pipeline