SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
JRuby with Java Code
in Data Processing World
JRubyConf.EU at 31 Jul 2015
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, Norikra, MessagePack-Ruby,...
Docker logging driver for Fluentd (docker v1.8)
Treasure Data, Inc.
https://jobs.lever.co/treasure-data
We're hiring!
OSS team (developer / community manager)
Distributed system engineer (Hadoop, queue/workers)
Front-end engineer (RoR)
Data Processing World
Data Processing World
Java
Data Processing World
Data Processing World
Hadoop, Spark, Tez, Flink, Storm, Kafka, ...
Hive, Pig, Drill, Impala, Presto, ....
Java + Scala, Clojure + C++, ....
Data Processing World
on JVM
Data Processing World
Many CPU cores, Large memory, High rate Disk I/O, ...
High throughput data processing
Hadoop YARN/MapReduce/HDFS API compatibility
Two OSS using Java&JRuby
Norikra:
Stream Processing with SQL for everybody
Server software, written in JRuby, runs on JVM
Open source software (GPLv2)
http://norikra.github.io/
https://github.com/norikra/norikra
Distributed on rubygems.org
"gem i norikra"
What Norikra does:
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/", "status":200,
"bytes":300, "duration":0.03,
"referer":"...", "user-agent":"...."
path:"/", s:301
1
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/download/a", "status":200,
"bytes":10240, "duration":0.53,
"referer":"...", "user-agent":"...."
path:"/", s:301
path:"/download/a", s:10240
2
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/", "status":404,
"bytes":0, "duration":0.08,
"referer":"...", "user-agent":"...."
path:"/", s:301
path:"/download/a", s:10240
3
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/", "status":200,
"bytes":301, "duration":0.01,
"referer":"...", "user-agent":"...."
path:"/", s:602
path:"/download/a", s:10240
4
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/download/b", "status":200,
"bytes":678, "duration":0.11,
"referer":"...", "user-agent":"...."
path:"/", s:602
path:"/download/a", s:10240
path:"/download/b", s:678
5
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/download/b", "status":200,
"bytes":678, "duration":0.13,
"referer":"...", "user-agent":"...."
path:"/", s:602
path:"/download/a", s:10240
path:"/download/b", s:1356
6
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/", "status":200,
"bytes":301, "duration":0.02,
"referer":"...", "user-agent":"...."
path:"/", s:903
path:"/download/a", s:10240
path:"/download/b", s:1356
7
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/", "status":200,
"bytes":301, "duration":0.09,
"referer":"...", "user-agent":"...."
path:"/", s:1204
path:"/download/a", s:10240
path:"/download/b", s:1356
8
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/download/a", "status":200,
"bytes":10240, "duration":1.1,
"referer":"...", "user-agent":"...."
path:"/", s:1204
path:"/download/a", s:20480
path:"/download/b", s:1356
9
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
{"path":"/", "status":200,
"bytes":301, "duration":0.05,
"referer":"...", "user-agent":"...."
path:"/", s:1505
path:"/download/a", s:20480
path:"/download/b", s:1356
10
SELECT path, SUM(bytes) AS s
FROM www_access_logs.win:length_batch(10)
WHERE status=200
GROUP BY path ORDER BY s DESC
10
{"path":"/download/a", "s":20480}
{"path":"/", "s":1505}
{"path":"/download/b", "s":1356}
Norikra and Java
Norikra is written in JRuby, and using Esper
Key factor: productivity (33days until first release)
Esper:Java library, provides Complex Event Processing
SQL parser, executor
Many features and good performance
Licensed under GPLv2
Plugins
as rubygems
Norikra Server (on JVM)
Esper (Query Engine)
Type Definition

Manager
Output Event
Pool
Norikra Engine
RPC Server

mizuno (Jetty + Rack)
Rack RPC Handler
Listener
UDF
UDF
User-Defined Functions
"gem i norikra-udf-xxx"
written in Java, or JRuby (compiled to Java)
works in Esper instance: must be a Java class
Listener
handler for output data of queries, written in JRuby
"gem i norikra-listener-xxx"
Embulk
"Embulk is a open-source bulk data loader
that helps data transfer between various
databases, storages, file formats, and
cloud services."
http://www.embulk.org/docs/
Embulk:
makes painful data integration work relaxed
Plugin-based parallel bulk data loader
Open source software (Apache License v2.0)
http://www.embulk.org/
https://github.com/embulk/embulk
Distributed as .jar or on rubygems.org
Plugins are on rubygems.org
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed
http://www.slideshare.net/HiroshiNakamura/embulk-20150411
HDFS
MySQL
Amazon S3
Embulk
CSV Files
SequenceFile
Salesforce.com
Elasticsearch
Cassandra
Hive
Redis
✓ Parallel execution
✓ Data validation
✓ Error recovery
✓ Deterministic behavior
✓ Idempotet retrying
Plugins Plugins
bulk load
#ccc_cd4 / #embulk
InputPlugin OutputPlugin
Executor plugin
Filter plugin
Filter plugin
Filter plugins
records
Threads,
MapReduce
records
convert, …
input, … output.
29
records
config
#ccc_cd4 / #embulk
InputPlugin
FileInput plugin
OutputPlugin
FileOutput plugin
Encoder plugin
Formatter plugin
Decoder plugin
Parser plugin
HDFS, S3,

Riak CS, …
gzip, bzip2,

aes, …
CSV, JSON,

pcap, …
buffer
buffer
buffer
buffer
Filter plugin
Filter plugin
Filter plugins
recordsrecords
Executor plugin
30
records
config
Embulk and Java
Embulk core is written in Java
mainly for performance
Embulk plugins:
are loaded over API based on JRuby
are written in JRuby or Java
JRuby for early release
Java for performance
InputPlugin
module Embulk
class InputExample < InputPlugin
Plugin.register_input('example', self)
def self.transaction(config, &control)
# read config
task = {
'message' =>
config.param('message', :string, default: nil)
}
threads = config.param('threads', :int, default:
2)
columns = [
Column.new(0, 'col0', :long),
Column.new(1, 'col1', :double),
Column.new(2, 'col2', :string),
]
# BEGIN here
commit_reports = yield(task, columns, threads)
# COMMIT here
puts "Example input finished"
return {}
end
def run(task, schema, index, page_builder)
puts "Example input thread #{@index}…"
10.times do |i|
@page_builder.add([i, 10.0, "example"])
end
@page_builder.finish
commit_report = { }
return commit_report
end
end
end
OutputPlugin
module Embulk
class OutputExample < OutputPlugin
Plugin.register_output('example', self)
def self.transaction(
config, schema,
processor_count, &control)
# read config
task = {
'message' =>
config.param('message', :string, default: "record")
}
puts "Example output started."
commit_reports = yield(task)
puts "Example output finished. Commit
reports = #{commit_reports.to_json}"
return {}
end
def initialize(task, schema, index)
puts "Example output thread #{index}..."
super
@message = task.prop('message', :string)
@records = 0
end
def add(page)
page.each do |record|
hash = Hash[schema.names.zip(record)]
puts "#{@message}: #{hash.to_json}"
@records += 1
end
end
def finish
end
def abort
end
def commit
commit_report = {
"records" => @records
}
return commit_report
end
end
end
Plugin management: Norikra
Esper instance
Engine
Plugin management
UDF Listener
plugins as gems
plugin loader written in
JRuby
Java JRuby
Plugin management: Embulk
Embulk core
Plugin management
input/output/filter
parser/formatter
Java JRuby
decoder/encoder
file-input/output
executor
plugins as gems
plugin loader written in
JRuby
Pluggable software
on JVM & Java API
Java? Scala? Clojure? JRuby?: JRuby
Plugin packaging: jar? gem?: gem
rubygem.org >>> maven central (or others)
especially for plugin authors
Plugin loader: Class Loader? "require"?: require
JRuby in Japan
Not so many users :(
CRuby is super major software in Japan
Java -> Ruby -> Scala? Golang?
Make your software pluggable.
Make eco-system&community.
with JRuby!
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellN Masahiro
 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and FluentdN Masahiro
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalkN Masahiro
 
Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdSATOSHI TAGOMORI
 
Async and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRubyAsync and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRubyJoe Kutner
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layerKiyoto Tamura
 
Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Yuta Iwama
 
Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)N Masahiro
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsSadayuki Furuhashi
 
Dexador Rises
Dexador RisesDexador Rises
Dexador Risesfukamachi
 

Was ist angesagt? (20)

The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and Fluentd
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
Docker.io
Docker.ioDocker.io
Docker.io
 
Fluentd introduction at ipros
Fluentd introduction at iprosFluentd introduction at ipros
Fluentd introduction at ipros
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalk
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentd
 
Async and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRubyAsync and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRuby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016
 
Fluentd meetup
Fluentd meetupFluentd meetup
Fluentd meetup
 
On Centralizing Logs
On Centralizing LogsOn Centralizing Logs
On Centralizing Logs
 
Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Dexador Rises
Dexador RisesDexador Rises
Dexador Rises
 

Ähnlich wie JRuby with Java Code in Data Processing World

Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
Node.js - async for the rest of us.
Node.js - async for the rest of us.Node.js - async for the rest of us.
Node.js - async for the rest of us.Mike Brevoort
 
Web program-peformance-optimization
Web program-peformance-optimizationWeb program-peformance-optimization
Web program-peformance-optimizationxiaojueqq12345
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applicationsTom Croucher
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkRde:code 2017
 
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011Nick Sieger
 
Writing Redis in Python with asyncio
Writing Redis in Python with asyncioWriting Redis in Python with asyncio
Writing Redis in Python with asyncioJames Saryerwinnie
 
Python RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutionsPython RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutionsSolution4Future
 
soft-shake.ch - Hands on Node.js
soft-shake.ch - Hands on Node.jssoft-shake.ch - Hands on Node.js
soft-shake.ch - Hands on Node.jssoft-shake.ch
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017iguazio
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Using the Azure Container Service in your company
Using the Azure Container Service in your companyUsing the Azure Container Service in your company
Using the Azure Container Service in your companyJan de Vries
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)Eran Duchan
 

Ähnlich wie JRuby with Java Code in Data Processing World (20)

Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Node.js - async for the rest of us.
Node.js - async for the rest of us.Node.js - async for the rest of us.
Node.js - async for the rest of us.
 
Web program-peformance-optimization
Web program-peformance-optimizationWeb program-peformance-optimization
Web program-peformance-optimization
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applications
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
JS everywhere 2011
JS everywhere 2011JS everywhere 2011
JS everywhere 2011
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
NodeJS
NodeJSNodeJS
NodeJS
 
Rack
RackRack
Rack
 
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
 
Writing Redis in Python with asyncio
Writing Redis in Python with asyncioWriting Redis in Python with asyncio
Writing Redis in Python with asyncio
 
Python RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutionsPython RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutions
 
Node.js vs Play Framework
Node.js vs Play FrameworkNode.js vs Play Framework
Node.js vs Play Framework
 
soft-shake.ch - Hands on Node.js
soft-shake.ch - Hands on Node.jssoft-shake.ch - Hands on Node.js
soft-shake.ch - Hands on Node.js
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Using the Azure Container Service in your company
Using the Azure Container Service in your companyUsing the Azure Container Service in your company
Using the Azure Container Service in your company
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)
 

Mehr von SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speedSATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of RubySATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script ConfusingSATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubySATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamSATOSHI TAGOMORI
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd SeasonSATOSHI TAGOMORI
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToSATOSHI TAGOMORI
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraSATOSHI TAGOMORI
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"SATOSHI TAGOMORI
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 

Mehr von SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

JRuby with Java Code in Data Processing World

  • 1. JRuby with Java Code in Data Processing World JRubyConf.EU at 31 Jul 2015 Satoshi Tagomori (@tagomoris)
  • 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, Norikra, MessagePack-Ruby,... Docker logging driver for Fluentd (docker v1.8) Treasure Data, Inc.
  • 3. https://jobs.lever.co/treasure-data We're hiring! OSS team (developer / community manager) Distributed system engineer (Hadoop, queue/workers) Front-end engineer (RoR)
  • 7. Data Processing World Hadoop, Spark, Tez, Flink, Storm, Kafka, ... Hive, Pig, Drill, Impala, Presto, ....
  • 8. Java + Scala, Clojure + C++, .... Data Processing World on JVM
  • 9. Data Processing World Many CPU cores, Large memory, High rate Disk I/O, ... High throughput data processing Hadoop YARN/MapReduce/HDFS API compatibility
  • 10. Two OSS using Java&JRuby
  • 11. Norikra: Stream Processing with SQL for everybody Server software, written in JRuby, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra Distributed on rubygems.org "gem i norikra"
  • 12. What Norikra does: SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC
  • 13. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/", "status":200, "bytes":300, "duration":0.03, "referer":"...", "user-agent":"...." path:"/", s:301 1
  • 14. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/download/a", "status":200, "bytes":10240, "duration":0.53, "referer":"...", "user-agent":"...." path:"/", s:301 path:"/download/a", s:10240 2
  • 15. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/", "status":404, "bytes":0, "duration":0.08, "referer":"...", "user-agent":"...." path:"/", s:301 path:"/download/a", s:10240 3
  • 16. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/", "status":200, "bytes":301, "duration":0.01, "referer":"...", "user-agent":"...." path:"/", s:602 path:"/download/a", s:10240 4
  • 17. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/download/b", "status":200, "bytes":678, "duration":0.11, "referer":"...", "user-agent":"...." path:"/", s:602 path:"/download/a", s:10240 path:"/download/b", s:678 5
  • 18. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/download/b", "status":200, "bytes":678, "duration":0.13, "referer":"...", "user-agent":"...." path:"/", s:602 path:"/download/a", s:10240 path:"/download/b", s:1356 6
  • 19. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/", "status":200, "bytes":301, "duration":0.02, "referer":"...", "user-agent":"...." path:"/", s:903 path:"/download/a", s:10240 path:"/download/b", s:1356 7
  • 20. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/", "status":200, "bytes":301, "duration":0.09, "referer":"...", "user-agent":"...." path:"/", s:1204 path:"/download/a", s:10240 path:"/download/b", s:1356 8
  • 21. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/download/a", "status":200, "bytes":10240, "duration":1.1, "referer":"...", "user-agent":"...." path:"/", s:1204 path:"/download/a", s:20480 path:"/download/b", s:1356 9
  • 22. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC {"path":"/", "status":200, "bytes":301, "duration":0.05, "referer":"...", "user-agent":"...." path:"/", s:1505 path:"/download/a", s:20480 path:"/download/b", s:1356 10
  • 23. SELECT path, SUM(bytes) AS s FROM www_access_logs.win:length_batch(10) WHERE status=200 GROUP BY path ORDER BY s DESC 10 {"path":"/download/a", "s":20480} {"path":"/", "s":1505} {"path":"/download/b", "s":1356}
  • 24. Norikra and Java Norikra is written in JRuby, and using Esper Key factor: productivity (33days until first release) Esper:Java library, provides Complex Event Processing SQL parser, executor Many features and good performance Licensed under GPLv2
  • 25. Plugins as rubygems Norikra Server (on JVM) Esper (Query Engine) Type Definition Manager Output Event Pool Norikra Engine RPC Server mizuno (Jetty + Rack) Rack RPC Handler Listener UDF UDF User-Defined Functions "gem i norikra-udf-xxx" written in Java, or JRuby (compiled to Java) works in Esper instance: must be a Java class Listener handler for output data of queries, written in JRuby "gem i norikra-listener-xxx"
  • 26. Embulk "Embulk is a open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services." http://www.embulk.org/docs/
  • 27. Embulk: makes painful data integration work relaxed Plugin-based parallel bulk data loader Open source software (Apache License v2.0) http://www.embulk.org/ https://github.com/embulk/embulk Distributed as .jar or on rubygems.org Plugins are on rubygems.org http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed http://www.slideshare.net/HiroshiNakamura/embulk-20150411
  • 28. HDFS MySQL Amazon S3 Embulk CSV Files SequenceFile Salesforce.com Elasticsearch Cassandra Hive Redis ✓ Parallel execution ✓ Data validation ✓ Error recovery ✓ Deterministic behavior ✓ Idempotet retrying Plugins Plugins bulk load
  • 29. #ccc_cd4 / #embulk InputPlugin OutputPlugin Executor plugin Filter plugin Filter plugin Filter plugins records Threads, MapReduce records convert, … input, … output. 29 records config
  • 30. #ccc_cd4 / #embulk InputPlugin FileInput plugin OutputPlugin FileOutput plugin Encoder plugin Formatter plugin Decoder plugin Parser plugin HDFS, S3,
 Riak CS, … gzip, bzip2,
 aes, … CSV, JSON,
 pcap, … buffer buffer buffer buffer Filter plugin Filter plugin Filter plugins recordsrecords Executor plugin 30 records config
  • 31. Embulk and Java Embulk core is written in Java mainly for performance Embulk plugins: are loaded over API based on JRuby are written in JRuby or Java JRuby for early release Java for performance
  • 32. InputPlugin module Embulk class InputExample < InputPlugin Plugin.register_input('example', self) def self.transaction(config, &control) # read config task = { 'message' => config.param('message', :string, default: nil) } threads = config.param('threads', :int, default: 2) columns = [ Column.new(0, 'col0', :long), Column.new(1, 'col1', :double), Column.new(2, 'col2', :string), ] # BEGIN here commit_reports = yield(task, columns, threads) # COMMIT here puts "Example input finished" return {} end def run(task, schema, index, page_builder) puts "Example input thread #{@index}…" 10.times do |i| @page_builder.add([i, 10.0, "example"]) end @page_builder.finish commit_report = { } return commit_report end end end
  • 33. OutputPlugin module Embulk class OutputExample < OutputPlugin Plugin.register_output('example', self) def self.transaction( config, schema, processor_count, &control) # read config task = { 'message' => config.param('message', :string, default: "record") } puts "Example output started." commit_reports = yield(task) puts "Example output finished. Commit reports = #{commit_reports.to_json}" return {} end def initialize(task, schema, index) puts "Example output thread #{index}..." super @message = task.prop('message', :string) @records = 0 end def add(page) page.each do |record| hash = Hash[schema.names.zip(record)] puts "#{@message}: #{hash.to_json}" @records += 1 end end def finish end def abort end def commit commit_report = { "records" => @records } return commit_report end end end
  • 34. Plugin management: Norikra Esper instance Engine Plugin management UDF Listener plugins as gems plugin loader written in JRuby Java JRuby
  • 35. Plugin management: Embulk Embulk core Plugin management input/output/filter parser/formatter Java JRuby decoder/encoder file-input/output executor plugins as gems plugin loader written in JRuby
  • 36. Pluggable software on JVM & Java API Java? Scala? Clojure? JRuby?: JRuby Plugin packaging: jar? gem?: gem rubygem.org >>> maven central (or others) especially for plugin authors Plugin loader: Class Loader? "require"?: require
  • 37. JRuby in Japan Not so many users :( CRuby is super major software in Japan Java -> Ruby -> Scala? Golang?
  • 38. Make your software pluggable. Make eco-system&community. with JRuby! Thanks!