Making an application horizontally scalable in 30 minutes. This presentation describes how a linear processing application (mail merge) can be converted into a horizontally scalable using Redis and provides some context why a multi-process approach is preferable to a multi-threaded approach.
2. Target
Create a Zip file of PDF’s
based on a CSV data file
‣ Linear version
‣ Making it scale with Redis
parse csv
create pdf
create pdf
...
create pdf
zip
4. Simple Templating with String Interpolation
invoice.html
<<Q
<div class="title">
INVOICE #{invoice_nr}
‣ Merge data into HTML
•
template =
File.new('invoice.html').
read
•
html =
eval("<<QQQn#{template}
nQQQ”)
</div>
<div class="address">
#{name}</br>
#{street}</br>
#{zip} #{city}</br>
</div>
Q
5. Step 1: linear
‣ Create PDF
• prince xml using princely gem
• http://www.princexml.com
• p = Princely.new
p.add_style_sheets('invoice.css')
p.pdf_from_string(html)
6. Step 1: linear
‣ Create ZIP
• Zip::ZipOutputstream.
open(zipfile_name)do |zos|
files.each do |file, content|
zos.new_entry(file)
zos.puts content
end
end
7. Full Code
require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, ".csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval("<<WTFMFn#{template}nWTFMF")!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = "../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip"!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry "#{name}.pdf"!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!
9. Step 2: from linear ...
parse csv
create pdf
create pdf
...
create pdf
zip
10. Step 2: ...to parallel
parse csv
create pdf
create pdf
zip
Threads
?
create pdf
11. Multi Threaded
‣ Advantage
• Lightweight (minimal overhead)
‣ Challenges (or why is it hard)
• Hard to code: most data structures are not thread safe by default, they
need synchronized access
• Hard to test: different execution paths , timings
• Hard to maintain
‣ Limitation
• single machine - not a solution for horizontal scalability
beyond the multi core cpu
12. Step 2: ...to parallel
parse csv
?
create pdf
create pdf
zip
create pdf
13. Multi Process
• scale across machines
• advanced support for debugging and monitoring at the
OS level
• simpler (code, testing, debugging, ...)
• slightly more overhead
BUT
14. But
all this assumes
“shared state across processes”
MemCached
parse csv
SQL?
shared state
create pdf
create pdf
create pdf
shared state
File System
zip
… OR …
Terra Cotta
15. Hello Redis
‣ Shared Memory Key Value Store with
High Level Data Structure support
• String (String, Int, Float)
• Hash (Map, Dictionary)
• List (Queue)
• Set
• ZSet (ordered by member or score)
16. About Redis
• Single threaded : 1 thread to serve them all
• (fit) Everything in memory
•
“Transactions” (multi exec)
•
Expiring keys
•
LUA Scripting
•
Publisher-Subscriber
•
Auto Create and Destroy
•
Pipelining
•
But … full clustering (master-master) is not available (yet)
17. Hello Redis
‣ redis-cli
•
•
•
•
set name “pascal” =
“pascal”
incr counter = 1
incr counter = 2
hset pascal name
“pascal”
•
hset pascal address
“merelbeke”
•
•
sadd persons pascal
smembers persons =
[pascal]
•
•
•
•
•
•
•
keys *
type pascal = hash
lpush todo “read” = 1
lpush todo “eat” = 2
lpop todo = “eat”
rpoplpush todo done =
“read”
lrange done 0 -1 =
“read”
19. Spread the Work
parse csv
process
1
zip
counter
Queue with data
create pdf
process
create pdf
process
...
20. Ruby on Redis
‣
Put PDF Create Input data on a Queue and do the counter
bookkeeping
!
docs.each do |doc|!
data = YAML::dump(doc)!
!r.lpush 'pdf:queue’, data!
r.incr ctr” # bookkeeping!
end!
22. Ruby on Redis
‣
Read PDF input data from Queue and do the counter bookkeeping
and put each created PDF in a Redis hash and signal if ready
while (true)!
_, msg = r.brpop 'pdf:queue’!
!doc = YAML::load(msg)!
#name of hash, key=docname, value=pdf!
r.hset(‘pdf:pdfs’, doc[0], create_pdf(*doc))
!
ctr = r.decr ‘ctr’
!
r.rpush ready, done if ctr == 0!
end!
23. Zip When Done
parse csv
process
ready
zip
3
Hash with pdfs
create pdf
process
create pdf
process
...
24. Ruby on Redis
‣
Wait for the ready signal
Fetch all pdf ’s
And zip them
!
r.brpop ready“ # wait for signal!
pdfs = r.hgetall ‘pdf:pdfs‘ # fetch hash!
create_zip pdfs # zip it
25. More Parallelism
parse csv
zip
ready
ready
ready
counter
counter
counter
hash
hash Pdfs
Hash with
Queue with data
create pdf
create pdf
...
26. Ruby on Redis
‣
Put PDF Create Input data on a Queue and do the counter
bookkeeping
# unique id for this input file!
UUID = SecureRandom.uuid!
docs.each do |doc|!
data = YAML::dump([UUID, doc])!
!r.lpush 'pdf:queue’, data!
r.incr ctr:#{UUID}” # bookkeeping!
end!
27. Ruby on Redis
‣
Read PDF input data from Queue and do the counter bookkeeping and
put each created PDF in a Redis hash
while (true)!
_, msg = r.brpop 'pdf:queue’!
uuid, doc = YAML::load(msg)!
r.hset(uuid, doc[0], create_pdf(*doc))!
ctr = r.decr ctr:#{uuid}
!
r.rpush ready:#{uuid}, done if ctr == 0
end!
!
28. Ruby on Redis
‣
Wait for the ready signal
Fetch all pdf ’s
And zip them
!
r.brpop ready:#{UUID}“ # wait for signal!
pdfs = r.hgetall(‘pdf:pdfs‘) # fetch hash!
create_zip(pdfs) # zip it
29. Full Code
require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!
LINEAR
require 'csv’!
require 'zip/zip'!
require 'redis'!
require 'yaml'!
require 'securerandom'!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv)!
UUID = SecureRandom.uuid!
!
r = Redis.new!
my_counter = ctr:#{UUID}!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
docs.each do |doc| # distribute!!
r.lpush 'pdf:queue' , YAML::dump([UUID, doc])!
r.incr my_counter!
end!
!
r.brpop ready:#{UUID} #collect!!
create_zip(r.hgetall(UUID)) !
!
# clean up!
r.del my_counter!
r.del UUID !
puts All done!”!
MAIN
require 'redis'!
require 'princely'!
require 'yaml’!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
r = Redis.new!
while (true)!
_, msg = r.brpop 'pdf:queue'!
uuid, doc = YAML::load(msg)!
r.hset(uuid , doc[0] , create_pdf(*doc))!
ctr = r.decr ctr:#{uuid} !
r.rpush ready:#{uuid}, done if ctr == 0!
end!
WORKER
Key functions (create pdf and create zip)
remain unchanged.
Distribution code highlighted
31. Multi Language Participants
parse csv
zip
counter
counter
counter
Queue with data
create pdf
hash
hash pdfs
Hash with
create pdf
...
32. Conclusions
From Linear To Multi Process Distributed
Is easy with
Redis Shared Memory High Level Data Structures
Atomic Counter for bookkeeping
Queue for work distribution
Queue as Signal
Hash for result sets