SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
T R E A S U R E D A T A
REAL WORLD DISTRIBUTED
DATABASE ON CLOUD TECHNOLOGY
BDI Research Group, Aug 7th
Kai Sasaki
Senior Software Engineer at Arm Treasure Data
ABOUT ME
• Kai Sasaki (佐々⽊木 海海)

• Senior Software Engineer at Arm Treasure Data

• Ex. Yahoo Japan Corporation

• Presto/Hadoop/TensorFlow contributor

• Hivemall committer

• Started GA Tech OMSCS 

(http://www.omscs.gatech.edu/)
TREASURE DATA
Data Analytics Platform
Unify all your raw data in scalable and
secure platform. Supporting 100+
integrations to enable you to easily connect
all your data sources in real-time.
Founded in 2011.
Live with OSS
• Fluentd
• Embulk
• Digdag
• Hivemall
and more
https://www.treasuredata.com/opensource/
TREASURE DATA
Data Analytics Platform
Our customers exists across industries.
- Wish
- LG
- Subaru
- KIRIN
etc…
Support vast amount of use cases.
- Automotive
- Retail
- Digital Marketing
- IoT
TREASURE DATA
Data Analytics Platform
Presto: 11.7+ million
Hive: 1.3+ million
Total Record: 1064+ trillion
Streaming: 60%
Data Connector: 30%
Bulk Import: 10%
Imported Records: 60+ billion
Integrations: 114
TREASURE DATA
Treasure CDP
Customer Data Platform (CDP) is a
marketer-based management system.
Client can unify customer database with
various kind of data sources and combine it
to find a specific customer profile.
TREASURE DATA
NEW CHAPTER!
TREASURE DATA
AGENDA
• Cloud Storage of Treasure Data

• Distributed Query Processing Engine

• Presto/Hive

• Enterprise Level Storage System
CLOUD STORAGE
CLOUD STORAGE IN TD
• Our Treasure Data storage service is built on cloud storage
like S3. (Plazma)

• Presto/Hive just provides a distributed query execution layer.
It requires us to make our storage system also scalable.
• On the other hand, we should make use of maintainability
and availability provided cloud service provider (IaaS).
PLAZMA
• We built a thin storage layer on existing cloud storage and
relational database, called Plazma.

• Plazma is a central component that stores all customer data for
analysis in Treasure Data.

• Plazma consists of two components

• Metadata (PostgreSQL)

• Storage (S3 or RiakCS)
PLAZMA
• Plazma stores metadata of data files in PostgreSQL hosted
by Amazon RDS.
PLAZMA
• Plazma stores metadata of data files in PostgreSQL hosted
by Amazon RDS. 

• This PostgreSQL manages the index, file path on S3,
transaction and deleted files.
LOG
LOG
WHY POSTGRESQL?
• GiST index easily enables us to do complicated index search
on data_set_id and time ranges.
CREATE INDEX
plazma_index
ON partitions USING gist (
data_set_id,
index_range(
first_index_key,
last_index_key, ‘[]')
);
PARTITIONING
• To make the best of high throughput by Presto parallel
processing, it is necessary to distribute data source too.

• Distributing data source evenly can contribute the high
throughput and performance stability. 

• Two basic partitioning method

• Key range partitioning -> Time-Index partitioning
• Hash partitioning -> User Defined Partitioning
PARTITIONING
• A partition record in Plazma represents a file stored in S3 with some additional
information

• Data Set ID

• Range Index Key

• Record Count

• File Size

• Checksum

• File Path
PARTITIONING
• All partitions in Plazma are indexed by time when it is
generated. Time index is recorded as UNIX epoch. 

• A partition keeps first_index_key and last_index_key
to specifies the range where the partition includes. 

• Plazma index is constructed as multicolumn index by using
GiST index of PostgreSQL. 

(https://www.postgresql.org/docs/current/static/gist.html)

• (data_set_id, index_range(first_index_key, last_index_key))
CLOUD STORAGE IN TD
data_set_id first_index_key last_index_key path
1 1533276065 1533276071 s3://path
2 1533276071 1533276077 s3://path
4 1533276077 1533276085 s3://path
4 1533276085 1533276092 s3://path
5 1533276092 1533276098 s3://path
5 1533276098 1533276103 s3://path
5 1533276103 1533276108 s3://path
5 1533276108 1533276114 s3://path
…
PostgreSQL
Amazon S3
LIFECYCLE OF PARTITION
• Plazma has two storage management layer.

At the beginning, records are put on realtime storage layer
in raw format. (msgpack.gz)
Realtime Storage Archive Storage
time: 100
time: 4000
time: 3800
time: 300
time: 500
LIFECYCLE OF PARTITION
• Every one hour, a specific map reduce job called Log Merge
Job runs to merge same time range records into one
partition in archive storage.
Realtime Storage Archive Storage
time: 100
time: 4000
time: 3800
time: 300
time: 500
time: 0~3599
time:
3600~7200
MR
LIFECYCLE OF PARTITION
• Query execution engine like Presto needs to fetch the data
from both realtime storage and archive storage. But basically
it should be efficient to read the data from archive storage.

• Inspired by

C-Store paper

M. Stonebraker
Realtime Storage Archive Storage
time: 100
time: 4000
time: 3800
time: 300
time: 500
time: 0~3599
time:
3600~7200
MR
TRANSACTION AND PARTITIONING
TRANSACTION AND PARTITIONING
• Consistency is the most important factor for enterprise
analytics workload. Therefore MPP engine like Presto and
backend storage MUST always guarantee the consistency.

→ UPDATE is done atomically by Plazma

• At the same time, we want to achieve high throughput by
distributing workload to multiple worker nodes.

→ Data files are partitioned in Plazma
PLAZMA TRANSACTION
• Plazma supports transaction for the query that has side-effect
(e.g. INSERT INTO/CREATE TABLE).

• Transaction of Plazma means the atomic operation on the
appearance of the data on S3, not actual file.
• Transaction is composed of two phases

• Uploading uncommitted partitions

• Commit transaction by moving uncommitted partitions
PLAZMA TRANSACTION
• Multiple worker try to upload files to S3

asynchronously.
Uncommitted Committed
PostgreSQL
PLAZMA TRANSACTION
• After uploading is done, insert a record in uncommitted 

table in PostgreSQL respectively.
Uncommitted Committed
PostgreSQL
PLAZMA TRANSACTION
• After uploading is done, insert a record in uncommitted 

table in PostgreSQL respectively.
Uncommitted Committed
PostgreSQL
p1
p2
PLAZMA TRANSACTION
• After all upload tasks are completed, coordinator tries 

to commit the transaction by moving 

all records in uncommitted to committed.
Uncommitted Committed
p1
p2
p3
PostgreSQL
PLAZMA TRANSACTION
• After all upload tasks are completed, coordinator tries 

to commit the transaction by moving 

all records in uncommitted to committed.
Uncommitted Committed
p1
p2
p3
PostgreSQL
PLAZMA DELETE
• Delete query is handled in similar way. First newly created

partitions are uploaded excluding deleted 

records.
Uncommitted Committed
p1
p2
p3
p1’
p2’
p3’
PostgreSQL
PLAZMA DELETE
• When transaction is committed, the records in committed
table is replaced by uncommitted records 

with different file path.
Uncommitted Committed
p1’
p2’
p3’
PostgreSQL
PRESTO IN TREASURE DATA
WHAT IS PRESTO?
• Presto is an open source scalable distributed SQL 

engine for huge OLAP workloads

• Mainly developed by Facebook, Teradata

• Used by FB, Uber, Netflix etc

• In-Memory processing

• Pluggable architecture

Hive, Cassandra, Kafka etc
PRESTO CONNECTOR
• Presto connector is the plugin for providing the access way
to various kind of existing data storage from Presto.

• Connector is responsible for managing metadata/
transaction/data accessor.
http://prestodb.io/
PRESTO CONNECTOR
• Hive Connector

Use metastore as metadata and S3/HDFS as storage.

• Kafka Connector

Querying Kafka topic as table. Each message as interpreted as row
in a table.

• Redis Connector

Key/value pair is interpreted as a row in Presto.

• Cassandra Connector

Support Cassandra 2.1.5 or later.
PRESTO CONNECTOR
• Black Hole Connector

Works like /dev/null or /dev/zero in Unix like system. Used for
catastrophic test or integration test.

• Memory Connector

Metadata and data are stored in RAM on worker nodes. 

Still experimental connector mainly used for test.

• System Connector

Provides information about the cluster state and running
query metrics. It is useful for runtime monitoring.
CONNECTOR DETAIL
PRESTO CONNECTOR
• Plugin defines an interface 

to bootstrap your connector 

creation.

• Also provides the list of 

UDFs available your 

Presto cluster.

• ConnectorFactory is able to

provide multiple connector implementations.
Plugin
ConnectorFactory
Connector
getConnectorFactories()
create(connectorId,…)
PRESTO CONNECTOR
• Connector provides classes to manage metadata, storage
accessor and table access control.

• ConnectorSplitManager create 

data source metadata to be 

distributed multiple worker 

node. 

• ConnectorPage

[Source|Sink]Provider

is provided to split 

operator. Connector
Connector
Metadata
Connector
SplitManager
Connector
PageSource
Provider
Connector
PageSink
Provider
Connector
Access
Control
PRESTO CONNECTOR
• Call beginInsert from 

ConnectorMetadata

• ConnectorSplitManager creates

splits that includes metadata of 

actual data source (e.g. file path)

• ConnectorPageSource

Provider downloads the

file from data source in 

parallel

• finishInsert in ConnectorMetadata

commit the transaction
Connector
Metadata
beginInsert
getSplits
Connector
PageSource
Provider
Connector
PageSource
Provider
Connector
PageSource
Provider
Connector
Metadata
finishInsert
Operators…
Connector
SplitManager
PRESTO ON CLOUD STORAGE
• Distributed execution engine like Presto cannot make use of
data locality any more on cloud storage. 

• Read/Write of data can be a dominant factor of query
performance, stability and money.

→ Connector should be implemented to take care of 

network IO cost.
TIME INDEX PARTITIONING
• By using multicolumn index on time range in Plazma, Presto
can filter out unnecessary partitions through predicate push
down

• TD_TIME_RANGE UDF tells Presto the hint which partitions
should be fetched from Plazmas.

• e.g. TD_TIME_RANGE(time, ‘2017-08-31 12:30:00’, NULL, ‘JST’)
• ConnectorSplitManager select the necessary partitions
and calculates the split distribution plan.
TIME INDEX PARTITIONING
• Select metadata records from realtime storage and archive
storage according to given time range.



SELECT * FROM rt/ar WHERE start < time AND time < end;
Connector

SplitManger
time: 0~3599
time:
3600~7200
time: 8000
time: 8200
time: 9000
time: 8800
Realtime Storage Archive Storage
TIME INDEX PARTITIONING
• A split is responsible to download multiple files on S3 in
order to reduce overhead. 

• ConnectorSplitManager calculates file assignment to
each split based on given statistics information (e.g. file size,
the number of columns, record count)
f1
f2
f3
Connector

SplitManger
Split1
Split2
TIME INDEX PARTITIONING
SELECT 10 cols in a range
0 sec
23 sec
45 sec
68 sec
90 sec
113 sec
135 sec
158 sec
180 sec
60days 50days 40days 30days 20days 10days
TD_TIME_RANGE
TIME INDEX PARTITIONING
SELECT 10 cols in a range
0 splits
8 splits
15 splits
23 splits
30 splits
38 splits
45 splits
53 splits
60 splits
6years~ 5years 4years 3years 2years 1 year 6month
split
CHALLENGE
• Time-Index partitioning worked very well because

• Most logs from web page, IoT devices have originally the
time when it is created.

• OLAP workload from analysts often limited by specific
time range (e.g. in the last week, during a campaign).

• But it lacks the flexibility to make an index on the column
other than time. This is required especially in digital
marketing, DMP use cases.
USER DEFINED PARTITIONING
USER DEFINED PARTITIONING
• Now evaluating user defined partitioning with Presto.

• User defined partitioning allows customer to set index on
arbitrary data attribute flexibly. 

• User defined partitioning can co-exist with time-index
partitioning as secondary index.
SELECT
COUNT(1)
FROM audience 

WHERE 

TD_TIME_RANGE(time, ‘2017-09-04’, ‘2017-09-07’)

AND

audience.room = ‘E’
BUCKETING
• Similar mechanism with Hive bucketing

• Bucket is a logical group of partition files by specified
bucketing column.
table
bucket bucket bucket bucket
partition
partition
partition
partition
partition
partition
partition
partition
partition
partition
partition
partition
time range 1
time range 2
time range 3
time range 4
BUCKETING
• PlazmaDB defines the hash function type on partitioning key
and total bucket count which is fixed in advance.
Connector
SplitManager
SELECT COUNT(1) FROM audience 

WHERE 

TD_TIME_RANGE(time, ‘2017-09-04’, ‘2017-09-07’)

AND

audience.room = ‘E’
table
bucket1 bucket2 bucket3
partition
partition
partition
partition
partition
partition
partition
partition
partition
BUCKETING
• ConnectorSplitManager select the proper partition from
PostgreSQL with given time range and bucket key.
Connector
SplitManager
SELECT COUNT(1) FROM audience 

WHERE 

TD_TIME_RANGE(time, ‘2017-09-04’, ‘2017-09-07’)

AND

audience.room = ‘E’
table
bucket1 bucket2 bucket3
partition
partition
partition
partition
partition
partition
partition
partition
partition
hash(‘E’) -> bucket2
1504483200 < time
&& time < 1504742400
USER DEFINED PARTITIONING
1h 1h 1h 1h1h
time
c1
v1
v2
v3
… WHERE c1 = ‘v1’ AND time = …
1h 1h 1h 1h1h
time
c1
v1
v2
v3
… WHERE c1 = ‘v1’ AND time = …
• User can specify the partitioning strategy based on their usage using partitioning key column 

max time range.
USER DEFINED PARTITIONING
• We can skip to read several unnecessary partitions. This
architecture very fit to digital marketing use cases.

• Creating user segment

• Aggregation by channel

• Still make use of time index partitioning.
PERFORMANCE COMPARISON
SQLs on TPC-H (scaled factor=1000)
elapsedtime(sec)
0 sec
75 sec
150 sec
225 sec
300 sec
count1_filter groupby hashjoin
87.279
36.569
1.04
266.71
69.374
19.478
NORMAL UDP
PERFORMANCE COMPARISON
SQLs on TPC-H (scaled factor=1000)
elapsedtime
0 sec
20 sec
40 sec
60 sec
80 sec
between mod_predicate count_distinct
NORMAL UDP
REPARTITIONING
REPARTITIONING
• Many small partition files can have a memory pressure on
partition metadata in coordinator.
table
bucket bucket bucket bucket
time range 1
time range 2
time range 3
time range 4
REPARTITIONING
• Merging scattered partitions can make query performance
better significantly.
table
bucket bucket bucket bucket
time range 1
time range 2
time range 3
time range 4
CREATE TABLE stella.partition.[remerge|split]
WITH ([max_file_size=‘256MB’,
max_record_count=1000000]*)
AS SELECT * FROM stella.partition.sources WHERE
account_id = 1 AND
table_schema = sample_datasets AND
table_name = www_access AND
TD_TIME_RANGE(time, start, end);
DATA DRIVEN REPARTITIONING
• There is a bunch of metric data of customer workload so that
we can make use of that to get an insight for repartitioning.

• We are now designing a new system to optimize the data
layout including:

- Index

- Cache

- Partitions

• Continuous optimization of storage layout is our next goal 

to support enterprise use cases including IoT.
FUTURE WORKS
• Self-Driving Databases (https://blog.treasuredata.com/blog/2018/04/27/self-driving-databases-current-and-future/)
• Repartitioning leveraged by data analysis
• Reindexing based on customer workload
• Partition cache
• Precomputing subexpression
• “Selecting Subexpressions to Materialize at Datacenter Scale“
• https://www.microsoft.com/en-us/research/publication/thou-shall-not-recompute-selecting-subexpressions-materialize-datacenter-scale-2/
RECAP
• Presto provides a plugin mechanism called connector.
• Though Presto itself is highly scalable distributed engine, connector is
also responsible for efficient query execution.
• Plazma has some desirable features to be integrated with such kind of
connector because of
• Transaction support
• Time-Index Partitioning
• User Defined Partitioning
Real World Storage in Treasure Data

Weitere ähnliche Inhalte

Was ist angesagt?

Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesDatabricks
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinDataWorks Summit
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentDataWorks Summit/Hadoop Summit
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopGruter
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkDongwon Kim
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Future of Data Meetup
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkRDataWorks Summit
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisliang chen
 

Was ist angesagt? (20)

Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with Zeppelin
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysis
 

Ähnlich wie Real World Storage in Treasure Data

Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageKai Sasaki
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Web Services
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3uzzal basak
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...Amazon Web Services
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming apphadooparchbook
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBKai Sasaki
 

Ähnlich wie Real World Storage in Treasure Data (20)

Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
 
Postgres Toolkit
Postgres ToolkitPostgres Toolkit
Postgres Toolkit
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
Presto
PrestoPresto
Presto
 
Oracle Tracing
Oracle TracingOracle Tracing
Oracle Tracing
 

Mehr von Kai Sasaki

Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤Kai Sasaki
 
Infrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed systemInfrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed systemKai Sasaki
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisKai Sasaki
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoKai Sasaki
 
20180522 infra autoscaling_system
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_systemKai Sasaki
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.jsKai Sasaki
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Embulk makes Japan visible
Embulk makes Japan visibleEmbulk makes Japan visible
Embulk makes Japan visibleKai Sasaki
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopKai Sasaki
 
図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure CodingKai Sasaki
 
Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~Kai Sasaki
 
How I tried MADE
How I tried MADEHow I tried MADE
How I tried MADEKai Sasaki
 
Reading kernel org
Reading kernel orgReading kernel org
Reading kernel orgKai Sasaki
 
Kernel bootstrap
Kernel bootstrapKernel bootstrap
Kernel bootstrapKai Sasaki
 
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案Kai Sasaki
 
Kernel resource
Kernel resourceKernel resource
Kernel resourceKai Sasaki
 
Kernel overview
Kernel overviewKernel overview
Kernel overviewKai Sasaki
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出Kai Sasaki
 

Mehr von Kai Sasaki (20)

Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤
 
Infrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed systemInfrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed system
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData Analysis
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future Presto
 
20180522 infra autoscaling_system
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_system
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Embulk makes Japan visible
Embulk makes Japan visibleEmbulk makes Japan visible
Embulk makes Japan visible
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
 
図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding
 
Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~
 
How I tried MADE
How I tried MADEHow I tried MADE
How I tried MADE
 
Reading kernel org
Reading kernel orgReading kernel org
Reading kernel org
 
Reading drill
Reading drillReading drill
Reading drill
 
Kernel ext4
Kernel ext4Kernel ext4
Kernel ext4
 
Kernel bootstrap
Kernel bootstrapKernel bootstrap
Kernel bootstrap
 
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案
 
Kernel resource
Kernel resourceKernel resource
Kernel resource
 
Kernel overview
Kernel overviewKernel overview
Kernel overview
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
 

Kürzlich hochgeladen

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 

Kürzlich hochgeladen (20)

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 

Real World Storage in Treasure Data

  • 1. T R E A S U R E D A T A REAL WORLD DISTRIBUTED DATABASE ON CLOUD TECHNOLOGY BDI Research Group, Aug 7th Kai Sasaki Senior Software Engineer at Arm Treasure Data
  • 2. ABOUT ME • Kai Sasaki (佐々⽊木 海海) • Senior Software Engineer at Arm Treasure Data • Ex. Yahoo Japan Corporation • Presto/Hadoop/TensorFlow contributor • Hivemall committer • Started GA Tech OMSCS 
 (http://www.omscs.gatech.edu/)
  • 3. TREASURE DATA Data Analytics Platform Unify all your raw data in scalable and secure platform. Supporting 100+ integrations to enable you to easily connect all your data sources in real-time. Founded in 2011. Live with OSS • Fluentd • Embulk • Digdag • Hivemall and more https://www.treasuredata.com/opensource/
  • 4. TREASURE DATA Data Analytics Platform Our customers exists across industries. - Wish - LG - Subaru - KIRIN etc… Support vast amount of use cases. - Automotive - Retail - Digital Marketing - IoT
  • 5. TREASURE DATA Data Analytics Platform Presto: 11.7+ million Hive: 1.3+ million Total Record: 1064+ trillion Streaming: 60% Data Connector: 30% Bulk Import: 10% Imported Records: 60+ billion Integrations: 114
  • 6. TREASURE DATA Treasure CDP Customer Data Platform (CDP) is a marketer-based management system. Client can unify customer database with various kind of data sources and combine it to find a specific customer profile.
  • 10. AGENDA • Cloud Storage of Treasure Data • Distributed Query Processing Engine • Presto/Hive • Enterprise Level Storage System
  • 12. CLOUD STORAGE IN TD • Our Treasure Data storage service is built on cloud storage like S3. (Plazma) • Presto/Hive just provides a distributed query execution layer. It requires us to make our storage system also scalable. • On the other hand, we should make use of maintainability and availability provided cloud service provider (IaaS).
  • 13. PLAZMA • We built a thin storage layer on existing cloud storage and relational database, called Plazma. • Plazma is a central component that stores all customer data for analysis in Treasure Data. • Plazma consists of two components • Metadata (PostgreSQL) • Storage (S3 or RiakCS)
  • 14. PLAZMA • Plazma stores metadata of data files in PostgreSQL hosted by Amazon RDS.
  • 15. PLAZMA • Plazma stores metadata of data files in PostgreSQL hosted by Amazon RDS. • This PostgreSQL manages the index, file path on S3, transaction and deleted files. LOG LOG
  • 16. WHY POSTGRESQL? • GiST index easily enables us to do complicated index search on data_set_id and time ranges. CREATE INDEX plazma_index ON partitions USING gist ( data_set_id, index_range( first_index_key, last_index_key, ‘[]') );
  • 17. PARTITIONING • To make the best of high throughput by Presto parallel processing, it is necessary to distribute data source too. • Distributing data source evenly can contribute the high throughput and performance stability. • Two basic partitioning method • Key range partitioning -> Time-Index partitioning • Hash partitioning -> User Defined Partitioning
  • 18. PARTITIONING • A partition record in Plazma represents a file stored in S3 with some additional information • Data Set ID • Range Index Key • Record Count • File Size • Checksum • File Path
  • 19. PARTITIONING • All partitions in Plazma are indexed by time when it is generated. Time index is recorded as UNIX epoch. • A partition keeps first_index_key and last_index_key to specifies the range where the partition includes. • Plazma index is constructed as multicolumn index by using GiST index of PostgreSQL. 
 (https://www.postgresql.org/docs/current/static/gist.html) • (data_set_id, index_range(first_index_key, last_index_key))
  • 20. CLOUD STORAGE IN TD data_set_id first_index_key last_index_key path 1 1533276065 1533276071 s3://path 2 1533276071 1533276077 s3://path 4 1533276077 1533276085 s3://path 4 1533276085 1533276092 s3://path 5 1533276092 1533276098 s3://path 5 1533276098 1533276103 s3://path 5 1533276103 1533276108 s3://path 5 1533276108 1533276114 s3://path … PostgreSQL Amazon S3
  • 21. LIFECYCLE OF PARTITION • Plazma has two storage management layer.
 At the beginning, records are put on realtime storage layer in raw format. (msgpack.gz) Realtime Storage Archive Storage time: 100 time: 4000 time: 3800 time: 300 time: 500
  • 22. LIFECYCLE OF PARTITION • Every one hour, a specific map reduce job called Log Merge Job runs to merge same time range records into one partition in archive storage. Realtime Storage Archive Storage time: 100 time: 4000 time: 3800 time: 300 time: 500 time: 0~3599 time: 3600~7200 MR
  • 23. LIFECYCLE OF PARTITION • Query execution engine like Presto needs to fetch the data from both realtime storage and archive storage. But basically it should be efficient to read the data from archive storage. • Inspired by
 C-Store paper
 M. Stonebraker Realtime Storage Archive Storage time: 100 time: 4000 time: 3800 time: 300 time: 500 time: 0~3599 time: 3600~7200 MR
  • 25. TRANSACTION AND PARTITIONING • Consistency is the most important factor for enterprise analytics workload. Therefore MPP engine like Presto and backend storage MUST always guarantee the consistency. → UPDATE is done atomically by Plazma • At the same time, we want to achieve high throughput by distributing workload to multiple worker nodes. → Data files are partitioned in Plazma
  • 26. PLAZMA TRANSACTION • Plazma supports transaction for the query that has side-effect (e.g. INSERT INTO/CREATE TABLE). • Transaction of Plazma means the atomic operation on the appearance of the data on S3, not actual file. • Transaction is composed of two phases • Uploading uncommitted partitions • Commit transaction by moving uncommitted partitions
  • 27. PLAZMA TRANSACTION • Multiple worker try to upload files to S3
 asynchronously. Uncommitted Committed PostgreSQL
  • 28. PLAZMA TRANSACTION • After uploading is done, insert a record in uncommitted 
 table in PostgreSQL respectively. Uncommitted Committed PostgreSQL
  • 29. PLAZMA TRANSACTION • After uploading is done, insert a record in uncommitted 
 table in PostgreSQL respectively. Uncommitted Committed PostgreSQL p1 p2
  • 30. PLAZMA TRANSACTION • After all upload tasks are completed, coordinator tries 
 to commit the transaction by moving 
 all records in uncommitted to committed. Uncommitted Committed p1 p2 p3 PostgreSQL
  • 31. PLAZMA TRANSACTION • After all upload tasks are completed, coordinator tries 
 to commit the transaction by moving 
 all records in uncommitted to committed. Uncommitted Committed p1 p2 p3 PostgreSQL
  • 32. PLAZMA DELETE • Delete query is handled in similar way. First newly created
 partitions are uploaded excluding deleted 
 records. Uncommitted Committed p1 p2 p3 p1’ p2’ p3’ PostgreSQL
  • 33. PLAZMA DELETE • When transaction is committed, the records in committed table is replaced by uncommitted records 
 with different file path. Uncommitted Committed p1’ p2’ p3’ PostgreSQL
  • 35. WHAT IS PRESTO? • Presto is an open source scalable distributed SQL 
 engine for huge OLAP workloads • Mainly developed by Facebook, Teradata • Used by FB, Uber, Netflix etc • In-Memory processing • Pluggable architecture
 Hive, Cassandra, Kafka etc
  • 36. PRESTO CONNECTOR • Presto connector is the plugin for providing the access way to various kind of existing data storage from Presto. • Connector is responsible for managing metadata/ transaction/data accessor. http://prestodb.io/
  • 37. PRESTO CONNECTOR • Hive Connector
 Use metastore as metadata and S3/HDFS as storage. • Kafka Connector
 Querying Kafka topic as table. Each message as interpreted as row in a table. • Redis Connector
 Key/value pair is interpreted as a row in Presto. • Cassandra Connector
 Support Cassandra 2.1.5 or later.
  • 38. PRESTO CONNECTOR • Black Hole Connector
 Works like /dev/null or /dev/zero in Unix like system. Used for catastrophic test or integration test. • Memory Connector
 Metadata and data are stored in RAM on worker nodes. 
 Still experimental connector mainly used for test. • System Connector
 Provides information about the cluster state and running query metrics. It is useful for runtime monitoring.
  • 40. PRESTO CONNECTOR • Plugin defines an interface 
 to bootstrap your connector 
 creation. • Also provides the list of 
 UDFs available your 
 Presto cluster. • ConnectorFactory is able to
 provide multiple connector implementations. Plugin ConnectorFactory Connector getConnectorFactories() create(connectorId,…)
  • 41. PRESTO CONNECTOR • Connector provides classes to manage metadata, storage accessor and table access control. • ConnectorSplitManager create 
 data source metadata to be 
 distributed multiple worker 
 node. • ConnectorPage
 [Source|Sink]Provider
 is provided to split 
 operator. Connector Connector Metadata Connector SplitManager Connector PageSource Provider Connector PageSink Provider Connector Access Control
  • 42. PRESTO CONNECTOR • Call beginInsert from 
 ConnectorMetadata • ConnectorSplitManager creates
 splits that includes metadata of 
 actual data source (e.g. file path) • ConnectorPageSource
 Provider downloads the
 file from data source in 
 parallel • finishInsert in ConnectorMetadata
 commit the transaction Connector Metadata beginInsert getSplits Connector PageSource Provider Connector PageSource Provider Connector PageSource Provider Connector Metadata finishInsert Operators… Connector SplitManager
  • 43. PRESTO ON CLOUD STORAGE • Distributed execution engine like Presto cannot make use of data locality any more on cloud storage. • Read/Write of data can be a dominant factor of query performance, stability and money. → Connector should be implemented to take care of 
 network IO cost.
  • 44. TIME INDEX PARTITIONING • By using multicolumn index on time range in Plazma, Presto can filter out unnecessary partitions through predicate push down • TD_TIME_RANGE UDF tells Presto the hint which partitions should be fetched from Plazmas. • e.g. TD_TIME_RANGE(time, ‘2017-08-31 12:30:00’, NULL, ‘JST’) • ConnectorSplitManager select the necessary partitions and calculates the split distribution plan.
  • 45. TIME INDEX PARTITIONING • Select metadata records from realtime storage and archive storage according to given time range.
 
 SELECT * FROM rt/ar WHERE start < time AND time < end; Connector
 SplitManger time: 0~3599 time: 3600~7200 time: 8000 time: 8200 time: 9000 time: 8800 Realtime Storage Archive Storage
  • 46. TIME INDEX PARTITIONING • A split is responsible to download multiple files on S3 in order to reduce overhead. • ConnectorSplitManager calculates file assignment to each split based on given statistics information (e.g. file size, the number of columns, record count) f1 f2 f3 Connector
 SplitManger Split1 Split2
  • 47. TIME INDEX PARTITIONING SELECT 10 cols in a range 0 sec 23 sec 45 sec 68 sec 90 sec 113 sec 135 sec 158 sec 180 sec 60days 50days 40days 30days 20days 10days TD_TIME_RANGE
  • 48. TIME INDEX PARTITIONING SELECT 10 cols in a range 0 splits 8 splits 15 splits 23 splits 30 splits 38 splits 45 splits 53 splits 60 splits 6years~ 5years 4years 3years 2years 1 year 6month split
  • 49. CHALLENGE • Time-Index partitioning worked very well because • Most logs from web page, IoT devices have originally the time when it is created. • OLAP workload from analysts often limited by specific time range (e.g. in the last week, during a campaign). • But it lacks the flexibility to make an index on the column other than time. This is required especially in digital marketing, DMP use cases.
  • 51. USER DEFINED PARTITIONING • Now evaluating user defined partitioning with Presto. • User defined partitioning allows customer to set index on arbitrary data attribute flexibly. • User defined partitioning can co-exist with time-index partitioning as secondary index.
  • 52. SELECT COUNT(1) FROM audience 
 WHERE 
 TD_TIME_RANGE(time, ‘2017-09-04’, ‘2017-09-07’)
 AND
 audience.room = ‘E’
  • 53. BUCKETING • Similar mechanism with Hive bucketing • Bucket is a logical group of partition files by specified bucketing column. table bucket bucket bucket bucket partition partition partition partition partition partition partition partition partition partition partition partition time range 1 time range 2 time range 3 time range 4
  • 54. BUCKETING • PlazmaDB defines the hash function type on partitioning key and total bucket count which is fixed in advance. Connector SplitManager SELECT COUNT(1) FROM audience 
 WHERE 
 TD_TIME_RANGE(time, ‘2017-09-04’, ‘2017-09-07’)
 AND
 audience.room = ‘E’ table bucket1 bucket2 bucket3 partition partition partition partition partition partition partition partition partition
  • 55. BUCKETING • ConnectorSplitManager select the proper partition from PostgreSQL with given time range and bucket key. Connector SplitManager SELECT COUNT(1) FROM audience 
 WHERE 
 TD_TIME_RANGE(time, ‘2017-09-04’, ‘2017-09-07’)
 AND
 audience.room = ‘E’ table bucket1 bucket2 bucket3 partition partition partition partition partition partition partition partition partition hash(‘E’) -> bucket2 1504483200 < time && time < 1504742400
  • 56. USER DEFINED PARTITIONING 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … • User can specify the partitioning strategy based on their usage using partitioning key column 
 max time range.
  • 57. USER DEFINED PARTITIONING • We can skip to read several unnecessary partitions. This architecture very fit to digital marketing use cases. • Creating user segment • Aggregation by channel • Still make use of time index partitioning.
  • 58. PERFORMANCE COMPARISON SQLs on TPC-H (scaled factor=1000) elapsedtime(sec) 0 sec 75 sec 150 sec 225 sec 300 sec count1_filter groupby hashjoin 87.279 36.569 1.04 266.71 69.374 19.478 NORMAL UDP
  • 59. PERFORMANCE COMPARISON SQLs on TPC-H (scaled factor=1000) elapsedtime 0 sec 20 sec 40 sec 60 sec 80 sec between mod_predicate count_distinct NORMAL UDP
  • 61. REPARTITIONING • Many small partition files can have a memory pressure on partition metadata in coordinator. table bucket bucket bucket bucket time range 1 time range 2 time range 3 time range 4
  • 62. REPARTITIONING • Merging scattered partitions can make query performance better significantly. table bucket bucket bucket bucket time range 1 time range 2 time range 3 time range 4
  • 63. CREATE TABLE stella.partition.[remerge|split] WITH ([max_file_size=‘256MB’, max_record_count=1000000]*) AS SELECT * FROM stella.partition.sources WHERE account_id = 1 AND table_schema = sample_datasets AND table_name = www_access AND TD_TIME_RANGE(time, start, end);
  • 64. DATA DRIVEN REPARTITIONING • There is a bunch of metric data of customer workload so that we can make use of that to get an insight for repartitioning. • We are now designing a new system to optimize the data layout including:
 - Index
 - Cache
 - Partitions • Continuous optimization of storage layout is our next goal 
 to support enterprise use cases including IoT.
  • 65. FUTURE WORKS • Self-Driving Databases (https://blog.treasuredata.com/blog/2018/04/27/self-driving-databases-current-and-future/) • Repartitioning leveraged by data analysis • Reindexing based on customer workload • Partition cache • Precomputing subexpression • “Selecting Subexpressions to Materialize at Datacenter Scale“ • https://www.microsoft.com/en-us/research/publication/thou-shall-not-recompute-selecting-subexpressions-materialize-datacenter-scale-2/
  • 66. RECAP • Presto provides a plugin mechanism called connector. • Though Presto itself is highly scalable distributed engine, connector is also responsible for efficient query execution. • Plazma has some desirable features to be integrated with such kind of connector because of • Transaction support • Time-Index Partitioning • User Defined Partitioning