SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Digdagによる大規模データ処理の

自動化とエラー処理
Sadayuki Furuhashi
Workflow Engines Night
Sadayuki Furuhashi
A founder of Treasure Data, Inc. located in Silicon Valley.
OSS projects I founded:
An open-source hacker.
Github: @frsyuki
What’s workload automation?
• あらゆる手作業の自動化
> バッチデータ解析の自動化:
• データロード - ETL - JOIN- 集計処理 - レポート生成 - 通知
> メール送信の自動化
• アドレス一覧の取得 - 対象の絞り込み - テンプレートから

本文を生成 - メール送信 - 完了通知
> システム間のデータ連携の自動化
> サーバ・DB・ネットワーク機器の管理やプロビジョニング
の自動化
> テスト・デプロイの自動化(CI)
求められる機能
• 基本機能
> タスクを依存関係順に実行
> 定期的な実行
> ファイルが作成されたら実行
> 過去分の一括実行(backfill)
> 時刻などの変数を含めて実行
• エラー処理
> 失敗したら通知
> 失敗した場所から再開
• 状態監視
> 実行時間が長ければ通知
> タスクの実行時間を可視化
> 実行ログの収集と保存
• 高速化
> タスクを並列して実行
> 同時実行数の制限
• 開発支援
> ワークフローのバージョン管理
> GUIによるワークフロー開発
> 定型処理を簡単に実行できるライ
ブラリ
> 手元とサーバ上で同じように動く
再現性(手元で動けばサーバでも
動く)
> Dockerイメージを使ってタスクを
実行
Products
OSS
• Makefile
• Jenkins
• Luigi
• Airflow
• Rundeck
• Azkaban
• Grid Engine
• OpenLava
• Obsidian Scheduler
• Hinemos
• StackStorm
• Platform LSM
Proprietary
• Tivoli Workload Scheduler (IBM)
• CA Workload Automation

(CA Technologies)
• JP1/AJS3 (Hitachi)
• Systemwalker Job Workload
Server (Fujitsu)
• Workload Automation
(Automatic)
• BatchMan (Honico)
• Control-M (BMC)
• Schedulix
• ServiceNow Workflow
Challenge: Multiple Cloud & Regions
On-Premises
Different API,
Different tools,
Many scripts.
Challenge: Multiple DB technologies
Amazon S3
Amazon 

Redshift
Amazon EMR
Challenge: Multiple DB technologies
Amazon S3
Amazon 

Redshift
Amazon EMR
> Hi!
> I'm a new technology!
Challenge: Modern complex data analytics
Ingest
Application logs
User attribute data
Ad impressions
3rd-party cookie data
Enrich
Removing bot access
Geo location from IP
address
Parsing User-Agent
JOIN user attributes
to event logs
Model
A/B Testing
Funnel analysis
Segmentation
analysis
Machine learning
Load
Creating indexes
Data partitioning
Data compression
Statistics
collection
Utilize
Recommendation
API
Realtime ad bidding
Visualize using BI
applications
Ingest UtilizeEnrich Model Load
Traditional "false" solution
#!/bin/bash
./run_mysql_query.sh
./load_facebook_data.sh
./rsync_apache_logs.sh
./start_emr_cluster.sh
for query in emr/*.sql; do
./run_emr_hive $query
done
./shutdown_emr_cluster.sh
./run_redshift_queries.sh
./call_finish_notification.sh
> Poor error handling
> Write once, Nobody reads
> No alerts on failure
> No alerts on too long run
> No retrying on errors
> No resuming
> No parallel execution
> No distributed execution
> No log collection
> No visualized monitoring
> No modularization
> No parameterization
Solution: Multi-Cloud Workflow Engine
Solves
> Poor error handling
> Write once, Nobody reads
> No alerts on failure
> No alerts on too long run
> No retrying on errors
> No resuming
> No parallel execution
> No distributed execution
> No log collection
> No visualized monitoring
> No modularization
> No parameterization
Example in our case
1. Dump data to
BigQuery
2. load all tables to
Treasure Data
3. Run queries
5. Notify on slack
4. Create reports
on Tableau Server

(on-premises)
Workflow constructs
Key constructs
Operators
> Packaged knowledge to run tasks.
> e.g. pg>, s3>, gcs>, emr>, td>, py>, rb>
Parameters
> Programmable variables for operators.
> e.g. ${session_time}, ${workflow_name},

${JSON.parse(http.last_content)}
Task groups
> Sequence of tasks to organize & modularize
workflows.
Operator library
_export:
td:
database: workflow_temp
+task1:
td>: queries/open.sql
create_table: daily_open
+task2:
td>: queries/close.sql
create_table: daily_close
Standard libraries
redshift>: runs Amazon Redshift queries
emr>: create/shutdowns a cluster & runs
steps
s3_wait>: waits until a file is put on S3
pg>: runs PostgreSQL queries
td>: runs Treasure Data queries
td_for_each>: repeats task for result rows
mail>: sends an email
Open-source libraries
You can release & use open-source
operator libraries.
Task grouping & parallel execution
+load_data:
_parallel: true


+load_users:
redshift>: copy/users.sql


+load_items:
redshift>: copy/items.sql
Parallel execution
Tasks under a same group run in
parallel if _parallel option is set to
true.
Grouping workflows...
Ingest UtilizeEnrich Model Load
+task
+task
+task
+task +task
+task +task
+task
+task
+task +task +task
Grouping workflows
Ingest UtilizeEnrich Model Load
+ingest +enrich
+task +task
+model
+basket_analysis
+task +task
+learn
+load
+task +task+tasks
+task
Parameters & Loops
+send_email_to_active_users:
td_for_each>: list_active.sql
_do:
+send:
email>: tempalte.txt
to: ${td.for_each.addr}
Parameter
A task can propagate parameters to
following tasks
Loop
Generate subtasks dynamically so
that Digdag applies the same set of
operators to different data sets.
Unite Engineering & Analytic Teams
+wait_for_arrival:
s3_wait>: |
bucket/www_${session_date}.csv
+load_table:
redshift>: scripts/copy.sql
Powerful for Engineers
> Comfortable for advanced users
Friendly for Analysts
> Still straight forward for analysts to
understand & leverage workflows
Pushing workflows to a server with Docker image
schedule:
daily>: 01:30:00
timezone: Asia/Tokyo
_export:
docker:
image: my_image:latest
+task:
sh>: ./run_in_docker
Digdag server
> Develop on laptop, push it to a server.
> Workflows run periodically on a server.
> Backfill
> Web editor & monitor
Docker
> Install scripts & dependences in a
Docker image, not on a server.
> Workflows can run anywhere including
developer's laptop.
Amazon ECR Dockerfile & Operator plugin template
• https://github.com/myui/dockernized-digdag-server
• https://github.com/myui/digdag-plugin-example
$ docker pull myui/digdag-server:latest
$ docker run -p 65432:65432 myui/digdag-server
open http://localhost:65432/
Demo
Real-world workflows
Digdag at Treasure Data
3,600 workflows run every day
28,000 tasks run every day
850 active workflows
400,000 workflow executions in total
Example: Customer analysis & alerting
timezone: UTC
schedule:
daily>: 09:00
_export:
mail:
from: 'bizops@example.com'
td:
database: summary
+reports:
td_run>: prepare_users_data
+for_each_users:
td_for_each>: inactive_users.sql
_do:
+alert_email:
mail>: mail.txt
subject: 'Inactive Alert: ${td.each.account_name}'
to: ['${td.each.owner_email}']
timezone: UTC
schedule:
daily>: 09:00
_export:
mail:
from: 'bizops@example.com'
td:
database: summary
+reports:
td_run>: prepare_users_data
+for_each_users:
td_for_each>: inactive_users.sql
_do:
+alert_email:
mail>: mail.txt
subject: 'Inactive Alert: ${td.each.account_name}'
to: ['${td.each.owner_email}']
Example: Customer analysis & alerting
Usage: ${td.each.percentage}%
Account Name: ${td.each.account_name}
Type: Purchase
${td.each.salesforce_link}
Region: ${td.each.region}
Owner: ${td.each.owner_name} (${td.each.owner_email})
Account: ${td.each.account_name}
Status: ${td.each.activity_status}
Actual: ${td.each.total_purchase}
Limit: ${td.each.monthly_purchase_limit}
mail.txt
Example: Backend of a BI app
timezone: <%= ev @timezone %>
<% if @schedule then %>
schedule: <%= ev @schedule %>
<% end %>
_export:
td:
database: <%= ev @database %>
all_mode: ${

(moment(session_time).dayOfYear() - 1)
% 3 == 0
}
+all_load:
if>: ${all_mode == "true"}
_do:
+create_all_records:
td>: segment_web_access.sql
create_table: "cdp_tmp_web_access"
_retry: 5
+rename_tmp_table:
td_ddl>:
rename_tables:
- from: "cdp_tmp_web_access"
to: "cdp_web_access"
_retry: 5
+get_all_count:
td>: incremental_count.sql
table_name: "cdp_web_access"
store_last_results: true
_retry: 5
+syndicate_loop:
loop>: ${Math.ceil(
td.last_results.total_count / 20000
)}
_do:
td>: incremental_select.sql
table_name: "cdp_web_access"
result_connection: cdp_web_access
result_settings:
id: 1
_retry: 5
Example: Moving Spark app to production
_export:
td:
database: digdag_demo_${session_date_compact}
+setup:
td_ddl>:
create_databases: ["${td.database}"]
+ingestion:
_parallel: true
+items_from_access_logs:
+wait_for_arrival:
s3_wait>: digdag-demo-bucket/www_login_$
{session_date_compact}.csv
+load_logs:
td_load>: s3_import_1479918530
+facebook_ads:
td_load>:
facebook_ads_reporting_import_1479843958
+items_from_aurora:
td_load>: mysql_import_1479918544
+enrichment:
_parallel: 5
+ip_location_to_user:
# ip_location, user
td>: queries/ip_location_to_user.sql
create_table: ip_location_to_user
+item_to_click_count:
# item, click_count
td>: queries/item_to_click_count.sql
create_table: item_to_click_count
+item_to_item_count:
# item_1, item_2, count
td>: queries/item_to_item_count.sql
create_table: item_to_item_count
+modeling:
emr>:
cluster: j-OD82XANWFYQ8
staging: s3://digdag-demo-data/emr/staging/
steps:
- type: spark
application: spark/target/scala-2.11/simple-td-
spark-project_2.11-1.0.jar
submit_options: ["--class", "ItemRecommends"]
jars: [td-spark-assembly-0.1.jar]
- type: spark
application: spark/target/scala-2.11/simple-td-
spark-project_2.11-1.0.jar
submit_options: ["--class",
"LocationRecommends"]
jars: [td-spark-assembly-0.1.jar]
+loading:
_parallel: true
+load_location_recommends:
redshift>: copy/copy_location_recommends.sql
+load_item_recommends:
redshift>: copy/copy_item_recommends.sql
Deployment & Fault tolerance
HA deployment of Digdag
Digdag
server
PostgreSQL
It's just like a web application.
Digdag
client
All task state
API &
scheduler &
executor
Visual UI
HA deployment of Digdag
PostgreSQL
Stateless servers + Replicated DB
Digdag
client
API &
scheduler &
executor
PostgreSQL
All task state
Digdag
server
Digdag
server
HTTP Load
Balancer
Visual UI
HA
HA deployment of Digdag
Digdag
server
PostgreSQL
Isolating API and execution for reliability
Digdag
client
API
PostgreSQL
HA
Digdag
server
Digdag
server
Digdag
server
scheduler &

executor
HTTP Load
Balancer
All task state
$ digdag server --disable-local-agent 

--disable-executor-loop
$ digdag server --max-task-threads 100
Single-server task logs
Digdag
server
PostgreSQL
Digdag
client
HTTP Load
Balancer
Local disks
A server writes logs

to a local disk
The same server

serves the logs.
$ digdag --task-log <dir>
$ digdag log <attempt-id> -f
Centralized task log storage
Digdag
server
PostgreSQL
Digdag
client
Digdag
server
HTTP Load
Balancer
AWS S3
A server uploads logs
A server pre-signs

the download URL
log-server.type = s3
log-server.s3.bucket = my-digdag-log-bucket
log-server.s3.path = logs/
$ digdag log <attempt-id> -f
Client downloads logs

directly from S3
Sadayuki Furuhashi
https://digdag.io
Visit my website!

Weitere ähnliche Inhalte

Was ist angesagt?

「速」を落とさないコードレビュー
「速」を落とさないコードレビュー「速」を落とさないコードレビュー
「速」を落とさないコードレビューTakafumi ONAKA
 
イミュータブルデータモデルの極意
イミュータブルデータモデルの極意イミュータブルデータモデルの極意
イミュータブルデータモデルの極意Yoshitaka Kawashima
 
データサイエンティスト養成読本の解説+書き忘れたこと
データサイエンティスト養成読本の解説+書き忘れたことデータサイエンティスト養成読本の解説+書き忘れたこと
データサイエンティスト養成読本の解説+書き忘れたことTokoroten Nakayama
 
組織にテストを書く文化を根付かせる戦略と戦術
組織にテストを書く文化を根付かせる戦略と戦術組織にテストを書く文化を根付かせる戦略と戦術
組織にテストを書く文化を根付かせる戦略と戦術Takuto Wada
 
インフラエンジニアの綺麗で優しい手順書の書き方
インフラエンジニアの綺麗で優しい手順書の書き方インフラエンジニアの綺麗で優しい手順書の書き方
インフラエンジニアの綺麗で優しい手順書の書き方Shohei Koyama
 
エンジニアの個人ブランディングと技術組織
エンジニアの個人ブランディングと技術組織エンジニアの個人ブランディングと技術組織
エンジニアの個人ブランディングと技術組織Takafumi ONAKA
 
DBスキーマもバージョン管理したい!
DBスキーマもバージョン管理したい!DBスキーマもバージョン管理したい!
DBスキーマもバージョン管理したい!kwatch
 
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11Sadayuki Furuhashi
 
なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)
なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)
なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)Mikiya Okuno
 
Python 3.9からの新定番zoneinfoを使いこなそう
Python 3.9からの新定番zoneinfoを使いこなそうPython 3.9からの新定番zoneinfoを使いこなそう
Python 3.9からの新定番zoneinfoを使いこなそうRyuji Tsutsui
 
SQLアンチパターン 幻の第26章「とりあえず削除フラグ」
SQLアンチパターン 幻の第26章「とりあえず削除フラグ」SQLアンチパターン 幻の第26章「とりあえず削除フラグ」
SQLアンチパターン 幻の第26章「とりあえず削除フラグ」Takuto Wada
 
MongoDBが遅いときの切り分け方法
MongoDBが遅いときの切り分け方法MongoDBが遅いときの切り分け方法
MongoDBが遅いときの切り分け方法Tetsutaro Watanabe
 
マルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのことマルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのことAmazon Web Services Japan
 
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのかTechon Organization
 
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)Takuto Wada
 
雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニング雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニングyoku0825
 
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方Kentaro Yoshida
 
PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門泰 増田
 
工数把握のすすめ 〜WorkTimeプラグインの使い方〜
工数把握のすすめ 〜WorkTimeプラグインの使い方〜工数把握のすすめ 〜WorkTimeプラグインの使い方〜
工数把握のすすめ 〜WorkTimeプラグインの使い方〜Tomohisa Kusukawa
 

Was ist angesagt? (20)

「速」を落とさないコードレビュー
「速」を落とさないコードレビュー「速」を落とさないコードレビュー
「速」を落とさないコードレビュー
 
イミュータブルデータモデルの極意
イミュータブルデータモデルの極意イミュータブルデータモデルの極意
イミュータブルデータモデルの極意
 
データサイエンティスト養成読本の解説+書き忘れたこと
データサイエンティスト養成読本の解説+書き忘れたことデータサイエンティスト養成読本の解説+書き忘れたこと
データサイエンティスト養成読本の解説+書き忘れたこと
 
組織にテストを書く文化を根付かせる戦略と戦術
組織にテストを書く文化を根付かせる戦略と戦術組織にテストを書く文化を根付かせる戦略と戦術
組織にテストを書く文化を根付かせる戦略と戦術
 
インフラエンジニアの綺麗で優しい手順書の書き方
インフラエンジニアの綺麗で優しい手順書の書き方インフラエンジニアの綺麗で優しい手順書の書き方
インフラエンジニアの綺麗で優しい手順書の書き方
 
エンジニアの個人ブランディングと技術組織
エンジニアの個人ブランディングと技術組織エンジニアの個人ブランディングと技術組織
エンジニアの個人ブランディングと技術組織
 
DBスキーマもバージョン管理したい!
DBスキーマもバージョン管理したい!DBスキーマもバージョン管理したい!
DBスキーマもバージョン管理したい!
 
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
 
なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)
なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)
なぜ、いま リレーショナルモデルなのか(理論から学ぶデータベース実践入門読書会スペシャル)
 
Python 3.9からの新定番zoneinfoを使いこなそう
Python 3.9からの新定番zoneinfoを使いこなそうPython 3.9からの新定番zoneinfoを使いこなそう
Python 3.9からの新定番zoneinfoを使いこなそう
 
SQLアンチパターン 幻の第26章「とりあえず削除フラグ」
SQLアンチパターン 幻の第26章「とりあえず削除フラグ」SQLアンチパターン 幻の第26章「とりあえず削除フラグ」
SQLアンチパターン 幻の第26章「とりあえず削除フラグ」
 
MongoDBが遅いときの切り分け方法
MongoDBが遅いときの切り分け方法MongoDBが遅いときの切り分け方法
MongoDBが遅いときの切り分け方法
 
マルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのことマルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのこと
 
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
 
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
 
雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニング雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニング
 
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方
 
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajpAt least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
 
PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門
 
工数把握のすすめ 〜WorkTimeプラグインの使い方〜
工数把握のすすめ 〜WorkTimeプラグインの使い方〜工数把握のすすめ 〜WorkTimeプラグインの使い方〜
工数把握のすすめ 〜WorkTimeプラグインの使い方〜
 

Andere mochten auch

Network visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetryNetwork visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetrypphaal
 
5 g network &amp; technology
5 g network &amp; technology5 g network &amp; technology
5 g network &amp; technologyFrikha Nour
 
Using Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking AccelerationUsing Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking AccelerationNetronome
 
Nfv orchestration open stack summit may2015 aricent
Nfv orchestration open stack summit may2015 aricentNfv orchestration open stack summit may2015 aricent
Nfv orchestration open stack summit may2015 aricentAricent
 
大規模環境のOpenStack アップグレードの考え方と実施のコツ
大規模環境のOpenStackアップグレードの考え方と実施のコツ大規模環境のOpenStackアップグレードの考え方と実施のコツ
大規模環境のOpenStack アップグレードの考え方と実施のコツTomoya Hashimoto
 
Monitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backMonitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backIcinga
 
Treasure Data Cloud Data Platform
Treasure Data Cloud Data PlatformTreasure Data Cloud Data Platform
Treasure Data Cloud Data Platforminside-BigData.com
 
NFV : Virtual Network Function Architecture
NFV : Virtual Network Function ArchitectureNFV : Virtual Network Function Architecture
NFV : Virtual Network Function Architecturesidneel
 
【AWS初心者向けWebinar】AWSから始める動画配信
【AWS初心者向けWebinar】AWSから始める動画配信【AWS初心者向けWebinar】AWSから始める動画配信
【AWS初心者向けWebinar】AWSから始める動画配信Amazon Web Services Japan
 
Cloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper ContrailCloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper Contrailbuildacloud
 
Contrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at ScaleContrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at ScaleMarketingArrowECS_CZ
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けRecruit Technologies
 

Andere mochten auch (18)

Network visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetryNetwork visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetry
 
5 g network &amp; technology
5 g network &amp; technology5 g network &amp; technology
5 g network &amp; technology
 
AWS Data Collection & Storage
AWS Data Collection & StorageAWS Data Collection & Storage
AWS Data Collection & Storage
 
NFV Tutorial
NFV TutorialNFV Tutorial
NFV Tutorial
 
NFV and OpenStack
NFV and OpenStackNFV and OpenStack
NFV and OpenStack
 
Using Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking AccelerationUsing Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking Acceleration
 
Nfv orchestration open stack summit may2015 aricent
Nfv orchestration open stack summit may2015 aricentNfv orchestration open stack summit may2015 aricent
Nfv orchestration open stack summit may2015 aricent
 
大規模環境のOpenStack アップグレードの考え方と実施のコツ
大規模環境のOpenStackアップグレードの考え方と実施のコツ大規模環境のOpenStackアップグレードの考え方と実施のコツ
大規模環境のOpenStack アップグレードの考え方と実施のコツ
 
Monitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backMonitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to back
 
Treasure Data Cloud Data Platform
Treasure Data Cloud Data PlatformTreasure Data Cloud Data Platform
Treasure Data Cloud Data Platform
 
NFV evolution towards 5G
NFV evolution towards 5GNFV evolution towards 5G
NFV evolution towards 5G
 
Design Principles for 5G
Design Principles for 5GDesign Principles for 5G
Design Principles for 5G
 
NFV : Virtual Network Function Architecture
NFV : Virtual Network Function ArchitectureNFV : Virtual Network Function Architecture
NFV : Virtual Network Function Architecture
 
【AWS初心者向けWebinar】AWSから始める動画配信
【AWS初心者向けWebinar】AWSから始める動画配信【AWS初心者向けWebinar】AWSから始める動画配信
【AWS初心者向けWebinar】AWSから始める動画配信
 
Cloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper ContrailCloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper Contrail
 
Contrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at ScaleContrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at Scale
 
170827 jtf garafana
170827 jtf garafana170827 jtf garafana
170827 jtf garafana
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 

Ähnlich wie Digdagによる大規模データ処理の自動化とエラー処理

Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious DataBenjamin Shum
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Hydra - Getting Started
Hydra - Getting StartedHydra - Getting Started
Hydra - Getting Startedabramsm
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performanceEngine Yard
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSSmartNews, Inc.
 
Tools for Solving Performance Issues
Tools for Solving Performance IssuesTools for Solving Performance Issues
Tools for Solving Performance IssuesOdoo
 
Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann
Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann
Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann Danny Abukalam
 
Ajax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesAjax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesDoris Chen
 
Expanding your impact with programmability in the data center
Expanding your impact with programmability in the data centerExpanding your impact with programmability in the data center
Expanding your impact with programmability in the data centerCisco Canada
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopBrian Christner
 
Embulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダSadayuki Furuhashi
 

Ähnlich wie Digdagによる大規模データ処理の自動化とエラー処理 (20)

Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious Data
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Hydra - Getting Started
Hydra - Getting StartedHydra - Getting Started
Hydra - Getting Started
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 
Tools for Solving Performance Issues
Tools for Solving Performance IssuesTools for Solving Performance Issues
Tools for Solving Performance Issues
 
Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann
Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann
Matt Jarvis - Unravelling Logs: Log Processing with Logstash and Riemann
 
Ajax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesAjax Performance Tuning and Best Practices
Ajax Performance Tuning and Best Practices
 
Expanding your impact with programmability in the data center
Expanding your impact with programmability in the data centerExpanding your impact with programmability in the data center
Expanding your impact with programmability in the data center
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Iac d.damyanov 4.pptx
Iac d.damyanov 4.pptxIac d.damyanov 4.pptx
Iac d.damyanov 4.pptx
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
 
Embulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダ
 

Mehr von Sadayuki Furuhashi

Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Sadayuki Furuhashi
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupSadayuki Furuhashi
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsSadayuki Furuhashi
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
Fluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreFluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreSadayuki Furuhashi
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoSadayuki Furuhashi
 
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualWhat's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualSadayuki Furuhashi
 
How we use Fluentd in Treasure Data
How we use Fluentd in Treasure DataHow we use Fluentd in Treasure Data
How we use Fluentd in Treasure DataSadayuki Furuhashi
 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into HadoopSadayuki Furuhashi
 
Programming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack ProjectProgramming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack ProjectSadayuki Furuhashi
 

Mehr von Sadayuki Furuhashi (20)

Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes Meetup
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Fluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreFluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect More
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualWhat's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
 
How we use Fluentd in Treasure Data
How we use Fluentd in Treasure DataHow we use Fluentd in Treasure Data
How we use Fluentd in Treasure Data
 
Fluentd meetup at Slideshare
Fluentd meetup at SlideshareFluentd meetup at Slideshare
Fluentd meetup at Slideshare
 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into Hadoop
 
Fluentd meetup
Fluentd meetupFluentd meetup
Fluentd meetup
 
upload test 1
upload test 1upload test 1
upload test 1
 
Programming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack ProjectProgramming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack Project
 

Kürzlich hochgeladen

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Kürzlich hochgeladen (20)

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Digdagによる大規模データ処理の自動化とエラー処理

  • 2. Sadayuki Furuhashi A founder of Treasure Data, Inc. located in Silicon Valley. OSS projects I founded: An open-source hacker. Github: @frsyuki
  • 3. What’s workload automation? • あらゆる手作業の自動化 > バッチデータ解析の自動化: • データロード - ETL - JOIN- 集計処理 - レポート生成 - 通知 > メール送信の自動化 • アドレス一覧の取得 - 対象の絞り込み - テンプレートから
 本文を生成 - メール送信 - 完了通知 > システム間のデータ連携の自動化 > サーバ・DB・ネットワーク機器の管理やプロビジョニング の自動化 > テスト・デプロイの自動化(CI)
  • 4. 求められる機能 • 基本機能 > タスクを依存関係順に実行 > 定期的な実行 > ファイルが作成されたら実行 > 過去分の一括実行(backfill) > 時刻などの変数を含めて実行 • エラー処理 > 失敗したら通知 > 失敗した場所から再開 • 状態監視 > 実行時間が長ければ通知 > タスクの実行時間を可視化 > 実行ログの収集と保存 • 高速化 > タスクを並列して実行 > 同時実行数の制限 • 開発支援 > ワークフローのバージョン管理 > GUIによるワークフロー開発 > 定型処理を簡単に実行できるライ ブラリ > 手元とサーバ上で同じように動く 再現性(手元で動けばサーバでも 動く) > Dockerイメージを使ってタスクを 実行
  • 5. Products OSS • Makefile • Jenkins • Luigi • Airflow • Rundeck • Azkaban • Grid Engine • OpenLava • Obsidian Scheduler • Hinemos • StackStorm • Platform LSM Proprietary • Tivoli Workload Scheduler (IBM) • CA Workload Automation
 (CA Technologies) • JP1/AJS3 (Hitachi) • Systemwalker Job Workload Server (Fujitsu) • Workload Automation (Automatic) • BatchMan (Honico) • Control-M (BMC) • Schedulix • ServiceNow Workflow
  • 6. Challenge: Multiple Cloud & Regions On-Premises Different API, Different tools, Many scripts.
  • 7. Challenge: Multiple DB technologies Amazon S3 Amazon 
 Redshift Amazon EMR
  • 8. Challenge: Multiple DB technologies Amazon S3 Amazon 
 Redshift Amazon EMR > Hi! > I'm a new technology!
  • 9. Challenge: Modern complex data analytics Ingest Application logs User attribute data Ad impressions 3rd-party cookie data Enrich Removing bot access Geo location from IP address Parsing User-Agent JOIN user attributes to event logs Model A/B Testing Funnel analysis Segmentation analysis Machine learning Load Creating indexes Data partitioning Data compression Statistics collection Utilize Recommendation API Realtime ad bidding Visualize using BI applications Ingest UtilizeEnrich Model Load
  • 10. Traditional "false" solution #!/bin/bash ./run_mysql_query.sh ./load_facebook_data.sh ./rsync_apache_logs.sh ./start_emr_cluster.sh for query in emr/*.sql; do ./run_emr_hive $query done ./shutdown_emr_cluster.sh ./run_redshift_queries.sh ./call_finish_notification.sh > Poor error handling > Write once, Nobody reads > No alerts on failure > No alerts on too long run > No retrying on errors > No resuming > No parallel execution > No distributed execution > No log collection > No visualized monitoring > No modularization > No parameterization
  • 11. Solution: Multi-Cloud Workflow Engine Solves > Poor error handling > Write once, Nobody reads > No alerts on failure > No alerts on too long run > No retrying on errors > No resuming > No parallel execution > No distributed execution > No log collection > No visualized monitoring > No modularization > No parameterization
  • 12. Example in our case 1. Dump data to BigQuery 2. load all tables to Treasure Data 3. Run queries 5. Notify on slack 4. Create reports on Tableau Server
 (on-premises)
  • 14. Key constructs Operators > Packaged knowledge to run tasks. > e.g. pg>, s3>, gcs>, emr>, td>, py>, rb> Parameters > Programmable variables for operators. > e.g. ${session_time}, ${workflow_name},
 ${JSON.parse(http.last_content)} Task groups > Sequence of tasks to organize & modularize workflows.
  • 15. Operator library _export: td: database: workflow_temp +task1: td>: queries/open.sql create_table: daily_open +task2: td>: queries/close.sql create_table: daily_close Standard libraries redshift>: runs Amazon Redshift queries emr>: create/shutdowns a cluster & runs steps s3_wait>: waits until a file is put on S3 pg>: runs PostgreSQL queries td>: runs Treasure Data queries td_for_each>: repeats task for result rows mail>: sends an email Open-source libraries You can release & use open-source operator libraries.
  • 16. Task grouping & parallel execution +load_data: _parallel: true 
 +load_users: redshift>: copy/users.sql 
 +load_items: redshift>: copy/items.sql Parallel execution Tasks under a same group run in parallel if _parallel option is set to true.
  • 17. Grouping workflows... Ingest UtilizeEnrich Model Load +task +task +task +task +task +task +task +task +task +task +task +task
  • 18. Grouping workflows Ingest UtilizeEnrich Model Load +ingest +enrich +task +task +model +basket_analysis +task +task +learn +load +task +task+tasks +task
  • 19. Parameters & Loops +send_email_to_active_users: td_for_each>: list_active.sql _do: +send: email>: tempalte.txt to: ${td.for_each.addr} Parameter A task can propagate parameters to following tasks Loop Generate subtasks dynamically so that Digdag applies the same set of operators to different data sets.
  • 20. Unite Engineering & Analytic Teams +wait_for_arrival: s3_wait>: | bucket/www_${session_date}.csv +load_table: redshift>: scripts/copy.sql Powerful for Engineers > Comfortable for advanced users Friendly for Analysts > Still straight forward for analysts to understand & leverage workflows
  • 21. Pushing workflows to a server with Docker image schedule: daily>: 01:30:00 timezone: Asia/Tokyo _export: docker: image: my_image:latest +task: sh>: ./run_in_docker Digdag server > Develop on laptop, push it to a server. > Workflows run periodically on a server. > Backfill > Web editor & monitor Docker > Install scripts & dependences in a Docker image, not on a server. > Workflows can run anywhere including developer's laptop.
  • 22. Amazon ECR Dockerfile & Operator plugin template • https://github.com/myui/dockernized-digdag-server • https://github.com/myui/digdag-plugin-example $ docker pull myui/digdag-server:latest $ docker run -p 65432:65432 myui/digdag-server open http://localhost:65432/
  • 23. Demo
  • 25. Digdag at Treasure Data 3,600 workflows run every day 28,000 tasks run every day 850 active workflows 400,000 workflow executions in total
  • 26. Example: Customer analysis & alerting timezone: UTC schedule: daily>: 09:00 _export: mail: from: 'bizops@example.com' td: database: summary +reports: td_run>: prepare_users_data +for_each_users: td_for_each>: inactive_users.sql _do: +alert_email: mail>: mail.txt subject: 'Inactive Alert: ${td.each.account_name}' to: ['${td.each.owner_email}']
  • 27. timezone: UTC schedule: daily>: 09:00 _export: mail: from: 'bizops@example.com' td: database: summary +reports: td_run>: prepare_users_data +for_each_users: td_for_each>: inactive_users.sql _do: +alert_email: mail>: mail.txt subject: 'Inactive Alert: ${td.each.account_name}' to: ['${td.each.owner_email}'] Example: Customer analysis & alerting Usage: ${td.each.percentage}% Account Name: ${td.each.account_name} Type: Purchase ${td.each.salesforce_link} Region: ${td.each.region} Owner: ${td.each.owner_name} (${td.each.owner_email}) Account: ${td.each.account_name} Status: ${td.each.activity_status} Actual: ${td.each.total_purchase} Limit: ${td.each.monthly_purchase_limit} mail.txt
  • 28. Example: Backend of a BI app timezone: <%= ev @timezone %> <% if @schedule then %> schedule: <%= ev @schedule %> <% end %> _export: td: database: <%= ev @database %> all_mode: ${
 (moment(session_time).dayOfYear() - 1) % 3 == 0 } +all_load: if>: ${all_mode == "true"} _do: +create_all_records: td>: segment_web_access.sql create_table: "cdp_tmp_web_access" _retry: 5 +rename_tmp_table: td_ddl>: rename_tables: - from: "cdp_tmp_web_access" to: "cdp_web_access" _retry: 5 +get_all_count: td>: incremental_count.sql table_name: "cdp_web_access" store_last_results: true _retry: 5 +syndicate_loop: loop>: ${Math.ceil( td.last_results.total_count / 20000 )} _do: td>: incremental_select.sql table_name: "cdp_web_access" result_connection: cdp_web_access result_settings: id: 1 _retry: 5
  • 29. Example: Moving Spark app to production _export: td: database: digdag_demo_${session_date_compact} +setup: td_ddl>: create_databases: ["${td.database}"] +ingestion: _parallel: true +items_from_access_logs: +wait_for_arrival: s3_wait>: digdag-demo-bucket/www_login_$ {session_date_compact}.csv +load_logs: td_load>: s3_import_1479918530 +facebook_ads: td_load>: facebook_ads_reporting_import_1479843958 +items_from_aurora: td_load>: mysql_import_1479918544 +enrichment: _parallel: 5 +ip_location_to_user: # ip_location, user td>: queries/ip_location_to_user.sql create_table: ip_location_to_user +item_to_click_count: # item, click_count td>: queries/item_to_click_count.sql create_table: item_to_click_count +item_to_item_count: # item_1, item_2, count td>: queries/item_to_item_count.sql create_table: item_to_item_count +modeling: emr>: cluster: j-OD82XANWFYQ8 staging: s3://digdag-demo-data/emr/staging/ steps: - type: spark application: spark/target/scala-2.11/simple-td- spark-project_2.11-1.0.jar submit_options: ["--class", "ItemRecommends"] jars: [td-spark-assembly-0.1.jar] - type: spark application: spark/target/scala-2.11/simple-td- spark-project_2.11-1.0.jar submit_options: ["--class", "LocationRecommends"] jars: [td-spark-assembly-0.1.jar] +loading: _parallel: true +load_location_recommends: redshift>: copy/copy_location_recommends.sql +load_item_recommends: redshift>: copy/copy_item_recommends.sql
  • 30. Deployment & Fault tolerance
  • 31. HA deployment of Digdag Digdag server PostgreSQL It's just like a web application. Digdag client All task state API & scheduler & executor Visual UI
  • 32. HA deployment of Digdag PostgreSQL Stateless servers + Replicated DB Digdag client API & scheduler & executor PostgreSQL All task state Digdag server Digdag server HTTP Load Balancer Visual UI HA
  • 33. HA deployment of Digdag Digdag server PostgreSQL Isolating API and execution for reliability Digdag client API PostgreSQL HA Digdag server Digdag server Digdag server scheduler &
 executor HTTP Load Balancer All task state $ digdag server --disable-local-agent 
 --disable-executor-loop $ digdag server --max-task-threads 100
  • 34. Single-server task logs Digdag server PostgreSQL Digdag client HTTP Load Balancer Local disks A server writes logs
 to a local disk The same server
 serves the logs. $ digdag --task-log <dir> $ digdag log <attempt-id> -f
  • 35. Centralized task log storage Digdag server PostgreSQL Digdag client Digdag server HTTP Load Balancer AWS S3 A server uploads logs A server pre-signs
 the download URL log-server.type = s3 log-server.s3.bucket = my-digdag-log-bucket log-server.s3.path = logs/ $ digdag log <attempt-id> -f Client downloads logs
 directly from S3