DigdagはなぜYAMLなのか？

•

9 likes•5,622 views

Sadayuki Furuhashi

俺たちのYAMLはYAMLを超える

Technology

What's hot

ムード（気分）は人間の行動に大きく影響します。インターネットサービスにおいて、もし人々のムードを考慮し寄り添えたなら、より魅力的なサービスが提供できるのではないでしょうか。本セッションではヤフーが持つビッグデータを用いて「ムードを推定する」取り組みについてご紹介します。 Yahoo! JAPAN Tech Conference 2022は2022年2月3日、4日に開催しました。 https://techconference.yahoo.co.jp/2022/ アーカイブ動画はこちらからご覧ください。 https://youtu.be/WyZOmUN52-s

ビッグデータから人々のムードを捉える #yjtc

Yahoo!デベロッパーネットワーク

HDFSのスケーラビリティの限界を突破するためのさまざまな取り組み | Hadoop / Spark Conference Japan 2019 #hc...

Yahoo!デベロッパーネットワーク

KYC and identity on blockchain

mosa siru

継続的なモデルモニタリングを実現するKubernetes Operator

Yahoo!デベロッパーネットワーク

2022年2月17日・18日に開催された「Developers Summit 2022」1日目の登壇スライドです。ヤフーのデータソリューション事業やデータサイエンス部門のデザイナーとして、日々の業務においてデータに直接に触れる機会の多いエンジニアの皆さんとデータビジュアライズの観点で共有し実践していることをご紹介します。

データの価値を最大化させるためのデザイン～データビジュアライゼーションの方法～ #devsumi 17-E-2

Yahoo!デベロッパーネットワーク

オンプレML基盤on Kubernetes 〜Yahoo! JAPAN AIPF〜

Yahoo!デベロッパーネットワーク

k8s初心者が gRPC × envoyを導入したら色々苦労した話 #yjbonfire

Yahoo!デベロッパーネットワーク

データドリフトの紹介

Sho Tanaka

Apache Sparkに手を出してヤケドしないための基本～「Apache Spark入門より」～（デブサミ 2016 講演資料）

NTT DATA OSS Professional Services

Yahoo! JAPANのIaaS基盤では200超のOpenStackクラスタが稼働しており、それらのコントロールプレーンをKubernetes上にデプロイしています。IaaSチームで管理している十数のKubernetesクラスタは古いバージョンのまま運用が続けられていたため、現在、段階的にバージョンアップおよびその自動化に取り組んでいます。このようなクラスタ群をメンテナンスする中で、工夫した点や失敗した点、得られた知見を紹介します。 Yahoo! JAPAN Tech Conference 2022は2022年2月3日、4日に開催しました。 https://techconference.yahoo.co.jp/2022/ アーカイブ動画はこちらからご覧ください。 https://youtu.be/F5EQqWOw8So

Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtc

Yahoo!デベロッパーネットワーク

Apache Avro vs Protocol Buffers

Seiya Mizuno

爆速クエリエンジン”Presto”を使いたくなる話

Kentaro Yoshida

HandlerSocket plugin for MySQL

akirahiguchi

Apache Pulsarの概要と近況

Yahoo!デベロッパーネットワーク

ログ解析基盤におけるストリーム処理パイプラインについて

cyberagent

Hadoopの概念と基本的知識

Ken SASAKI

Presto on YARNの導入・運用

cyberagent

Apache Kafka 0.11 の Exactly Once Semantics

Yoshiyasu SAEKI

Hadoop入門

Preferred Networks

Redisの特徴と活用方法について

Yuji Otani

What's hot (20)

ビッグデータから人々のムードを捉える #yjtc

HDFSのスケーラビリティの限界を突破するためのさまざまな取り組み | Hadoop / Spark Conference Japan 2019 #hc...

KYC and identity on blockchain

継続的なモデルモニタリングを実現するKubernetes Operator

データの価値を最大化させるためのデザイン～データビジュアライゼーションの方法～ #devsumi 17-E-2

オンプレML基盤on Kubernetes 〜Yahoo! JAPAN AIPF〜

k8s初心者が gRPC × envoyを導入したら色々苦労した話 #yjbonfire

データドリフトの紹介

Apache Sparkに手を出してヤケドしないための基本～「Apache Spark入門より」～（デブサミ 2016 講演資料）

Yahoo! JAPANのIaaSを支えるKubernetesクラスタ、アップデート自動化への挑戦 #yjtc

Apache Avro vs Protocol Buffers

爆速クエリエンジン”Presto”を使いたくなる話

HandlerSocket plugin for MySQL

Apache Pulsarの概要と近況

ログ解析基盤におけるストリーム処理パイプラインについて

Hadoopの概念と基本的知識

Presto on YARNの導入・運用

Apache Kafka 0.11 の Exactly Once Semantics

Hadoop入門

Redisの特徴と活用方法について

Viewers also liked

Keynote - Fluentd meetup v14

Treasure Data, Inc.

Fluentd Meetup 2016 - ServerEngine Integration & Windows support

Ritta Narita

Fluentd v0.14 Overview

N Masahiro

Fluentd v0.14 Plugin API Details

SATOSHI TAGOMORI

Fluentd at Bay Area Kubernetes Meetup

Sadayuki Furuhashi

Logging for Production Systems in The Container Era

Sadayuki Furuhashi

Fighting Against Chaotically Separated Values with Embulk

Sadayuki Furuhashi

EmbulkとDigdagとデータ分析基盤と

Toru Takahashi

Jenkins 2.0 Pipeline & Blue Ocean

Akihiko Horiuchi

Azkaban

wyukawa

Embulk - 進化するバルクデータローダ

Sadayuki Furuhashi

Facebook Presto presentation

Cyanny LIANG

"fireap" - fast task runner on consul

IKEDA Kiyoshi

grifork - fast propagative task runner -

IKEDA Kiyoshi

Introduction to poloxy - proxy for alerting

IKEDA Kiyoshi

Ansible ではじめるインフラのコード化入門

Sho A

Embulk, an open-source plugin-based parallel bulk data loader

Sadayuki Furuhashi

Presto - Hadoop Conference Japan 2014

Sadayuki Furuhashi

In this session, we discuss how Spark and Presto complement the Netflix big data platform stack that started with Hadoop, and the use cases that Spark and Presto address. Also, we discuss how we run Spark and Presto on top of the Amazon EMR infrastructure; specifically, how we use Amazon S3 as our data warehouse and how we leverage Amazon EMR as a generic framework for data-processing cluster management.

(BDT303) Running Spark and Presto on the Netflix Big Data Platform

Amazon Web Services

Learn how to leverage new workflow management tools to simplify complex data pipelines and ETL jobs spanning multiple systems. In this technical deep dive from Treasure Data, company founder and chief architect walks through the codebase of DigDag, our recently open-sourced workflow management project. He shows how workflows can break large, error-prone SQL statements into smaller blocks that are easier to maintain and reuse. He also demonstrates how a system using ‘last good’ checkpoints can save hours of computation when restarting failed jobs and how to use standard version control systems like Github to automate data lifecycle management across Amazon S3, Amazon EMR, Amazon Redshift, and Amazon Aurora. Finally, you see a few examples where SQL-as-pipeline-code gives data scientists both the right level of ownership over production processes and a comfortable abstraction from the underlying execution engines. This session is sponsored by Treasure Data. AWS Competency Partner

AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)

Amazon Web Services

Viewers also liked (20)

Keynote - Fluentd meetup v14

Fluentd Meetup 2016 - ServerEngine Integration & Windows support

Fluentd v0.14 Overview

Fluentd v0.14 Plugin API Details

Fluentd at Bay Area Kubernetes Meetup

Logging for Production Systems in The Container Era

Fighting Against Chaotically Separated Values with Embulk

EmbulkとDigdagとデータ分析基盤と

Jenkins 2.0 Pipeline & Blue Ocean

Azkaban

Embulk - 進化するバルクデータローダ

Facebook Presto presentation

"fireap" - fast task runner on consul

grifork - fast propagative task runner -

Introduction to poloxy - proxy for alerting

Ansible ではじめるインフラのコード化入門

Embulk, an open-source plugin-based parallel bulk data loader

Presto - Hadoop Conference Japan 2014

(BDT303) Running Spark and Presto on the Netflix Big Data Platform

AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)

More from Sadayuki Furuhashi

Scripting Embulk Plugins

Sadayuki Furuhashi

Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019

Sadayuki Furuhashi

Making KVS 10x Scalable

Sadayuki Furuhashi

Automating Workflows for Analytics Pipelines

Sadayuki Furuhashi

Talk at RubyKaigi 2015. Plugin architecture is known as a technique that brings extensibility to a program. Ruby has good language features for plugins. RubyGems.org is an excellent platform for plugin distribution. However, creating plugin architecture is not as easy as writing code without it: plugin loader, packaging, loosely-coupled API, and performance. Loading two versions of a gem is a unsolved challenge that is solved in Java on the other hand. I have designed some open-source software such as Fluentd and Embulk. They provide most of functions by plugins. I will talk about their plugin-based architecture.

Plugin-based software design with Ruby and RubyGems

Sadayuki Furuhashi

Embuk internals

Sadayuki Furuhashi

Understanding Presto - Presto meetup @ Tokyo #1

Sadayuki Furuhashi

Prestogres internals

Sadayuki Furuhashi

Presto+MySQLで分散SQL

Sadayuki Furuhashi

Fluentd - Set Up Once, Collect More

Sadayuki Furuhashi

Prestogres, ODBC & JDBC connectivity for Presto

Sadayuki Furuhashi

What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual

Sadayuki Furuhashi

How we use Fluentd in Treasure Data

Sadayuki Furuhashi

Fluentd meetup at Slideshare

Sadayuki Furuhashi

How to collect Big Data into Hadoop

Sadayuki Furuhashi

Fluentd meetup

Sadayuki Furuhashi

upload test 1

Sadayuki Furuhashi

Programming Tools and Techniques #369 - The MessagePack Project

Sadayuki Furuhashi

Gumi study7 messagepack

Sadayuki Furuhashi

gumiStudy#7 The MessagePack Project

Sadayuki Furuhashi

More from Sadayuki Furuhashi (20)

Scripting Embulk Plugins

Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019

Making KVS 10x Scalable

Automating Workflows for Analytics Pipelines

Plugin-based software design with Ruby and RubyGems

Embuk internals

Understanding Presto - Presto meetup @ Tokyo #1

Prestogres internals

Presto+MySQLで分散SQL

Fluentd - Set Up Once, Collect More

Prestogres, ODBC & JDBC connectivity for Presto

What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual

How we use Fluentd in Treasure Data

Fluentd meetup at Slideshare

How to collect Big Data into Hadoop

Fluentd meetup

upload test 1

Programming Tools and Techniques #369 - The MessagePack Project

Gumi study7 messagepack

gumiStudy#7 The MessagePack Project

DigdagはなぜYAMLなのか？

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

More from Sadayuki Furuhashi

More from Sadayuki Furuhashi (20)