Presented at All Things Open 2023
Presented by Lydia Cupery - HubSpot
Title: Scaling Web Applications with Background Jobs: Takeaways from Generating a Huge PDF
Abstract: Do you need to perform time-consuming or CPU-intensive processes in your web application but are concerned about performance? That’s where background jobs come in. By offloading resource-intensive tasks to separate worker processes, you can improve the scalability of your web application.
In this talk, I'll share my experience of using background jobs to scale our web application. I'll discuss the challenges my team faced that led us to adopt background jobs. Then, I'll share practical tips on how to design background jobs for CPU-intensive or time-consuming processes, such as generating huge PDFs and batch emailing. I'll wrap up by going over the performance and cost tradeoffs of background jobs.
I'll use Typescript, Express, and Heroku as examples in this talk, but the concepts and best practices that I'll share are applicable to other languages and tools.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
5. Motion
design
Physical
Computing
Download
Invoices
Fetch the invoice data from the database
Transform data to shape of PDF data
Put completed PDF somewhere client can
access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
SERVER
write customer
invoices PDF
reference to
created PDF
CLIENT
6.
7. Fetch the invoice data from the database
Transform data
Put completed PDF somewhere client can access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
9. First attempt - create one large PDF.
Write each customer statement to the main pdf.
It works but it takes way too long.
10. Okay, what if we create all the PDFs at the same time? And then
merge them?
We’ll use a Promise.all and store all the individual PDFs in memory.
11. What if we processed two or three invoices at a time, and write
each of those as pdfs to the file system?
Then, we could merge those PDFs together with pdf-lib.
12. What if we used an external service? Is there any external
service that could help us out with our bottlenecks?
Write the Invoice PDF for each customer
Combine all Invoice PDFs
…perhaps for combining PDFs!
13. Cuts down on the time to combine PDFs, but generating the individual PDFs
and writing the PDFs to the file system still takes too long.
14. Takes Too Long
Generating & writing PDFs to the
file systems takes too long.
Generate too many pdfs in parallel,
use a lot of memory, the dyno runs
out of memory.
Uses Too Much Memory
VS
15. What if we keep increasing memory, and upgrade our dynos to ones with more
memory?
invoice report please!
report.pdf
16. The system could still run out of memory and start failing as soon as more
than one user tries to generate a PDF at once.
invoice report please!
invoice report please!
ERROR
please load X page!
ERROR
report.pdf
18. “Background jobs can dramatically
improve the scalability of a web
app by enabling it to offload
slow or CPU-intensive tasks from
its front-end.”
19. Browser Web Server Background Service
request invoices PDF schedule generate
invoice PDF
in-progress
generate
invoice PDF
is it done yet?
nope!
is it done yet?
nope!
is it done yet?
yes! here it is
24. Browser Web Server Background Service
request invoices PDF schedule generate
invoice PDF
in-progress
generate
invoice PDF
is it done yet?
nope!
is it done yet?
nope!
is it done yet?
yes! here it is
25. Sorted
Customers with
Invoices
fetch invoice data
generate invoices
customerIds: A-E
fetch invoice data
generate invoices
customerIds: F-O
fetch invoice data
generate invoices
customerIds: P-Z
combine generated
invoices
JOB QUEUE
customers A-E
customers F-O
customers P-Z
customer
list/
WORKER_COUNT
29. Progress Indication
With no background job (assuming the server doesn’t time out
or run out of memory) :
Browser Web Server
request invoice PDF
here it is!
generate
pdf…
31. Browser Web Server Background Service
request invoices PDF schedule generate
invoice PDF
in-progress
generate
invoice PDF
jobs
is it done yet?
nope! 30% there
is it done yet?
nope! 70% there
is it done yet?
yes! here it is
32. Generate Partial Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
JOB PROGRESS
1 .1
33. Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .2
34. Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .4
35. Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .6
36. Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .8
37. Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
● updateProgress(0.9)
JOB PROGRESS
1 .9
38. Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
● updateProgress(0.9)
● upload generated file to s3
● updateProgress(1)
JOB PROGRESS
1 1
46. Communicating with an External Service
With no background job:
Browser Web Server
send out customer
emails, please!
TIMEOUT
��🏻♀
Mailgun
send these
emails, please!
��
47. Browser Web Server
send out customer
emails, please!
send these
emails, please!
in-progress
Mailgun
is it done yet?
nope! 30% there
is it done yet?
nope! 70% there
is it done yet?
yes - all emails
are sent!
��
Background
Service
send these
emails,
please!
48. Recap - With a background job you can…
speed things up
show progress
support simultaneous users
have less timeouts/errors
save jobs for later
59. Fetch the invoice data from the database
Transform data to shape of PDF data
Put completed PDF somewhere client can access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
60. Fetch list of customers with invoices from the database
Transform fetched data to shape of PDF data
Put completed PDF somewhere client can access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
do not need to fetch all
invoice data
Fetch invoice data for customers fetching 1/10 amount of data
transforming 1/10 amount of data
writing PDF for
1/10 customers
worker dyno
worker dyno