SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Scaling Web Applications with Background Jobs:
Takeaways from Generating a Huge PDF
Lydia Cupery
All Things Open, 2023
Hi!
Nice to meet you 󰗝
Lydia Cupery
lydia-cupery.com
Lydia Cupery
The Problem
Motion
design
Physical
Computing
Download
Invoices
Fetch the invoice data from the database
Transform data to shape of PDF data
Put completed PDF somewhere client can
access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
SERVER
write customer
invoices PDF
reference to
created PDF
CLIENT
Fetch the invoice data from the database
Transform data
Put completed PDF somewhere client can access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
Write the Invoice PDF for each customer
Combine all Invoice PDFs
First attempt - create one large PDF.
Write each customer statement to the main pdf.
It works but it takes way too long.
Okay, what if we create all the PDFs at the same time? And then
merge them?
We’ll use a Promise.all and store all the individual PDFs in memory.
What if we processed two or three invoices at a time, and write
each of those as pdfs to the file system?
Then, we could merge those PDFs together with pdf-lib.
What if we used an external service? Is there any external
service that could help us out with our bottlenecks?
Write the Invoice PDF for each customer
Combine all Invoice PDFs
…perhaps for combining PDFs!
Cuts down on the time to combine PDFs, but generating the individual PDFs
and writing the PDFs to the file system still takes too long.
Takes Too Long
Generating & writing PDFs to the
file systems takes too long.
Generate too many pdfs in parallel,
use a lot of memory, the dyno runs
out of memory.
Uses Too Much Memory
VS
What if we keep increasing memory, and upgrade our dynos to ones with more
memory?
invoice report please!
report.pdf
The system could still run out of memory and start failing as soon as more
than one user tries to generate a PDF at once.
invoice report please!
invoice report please!
ERROR
please load X page!
ERROR
report.pdf
Make it a Background Job!
“Background jobs can dramatically
improve the scalability of a web
app by enabling it to offload
slow or CPU-intensive tasks from
its front-end.”
Browser Web Server Background Service
request invoices PDF schedule generate
invoice PDF
in-progress
generate
invoice PDF
is it done yet?
nope!
is it done yet?
nope!
is it done yet?
yes! here it is
Architecture Overview
web process
(web dyno)
data store
(redis)
library to
implement queue
system on top of
redis (BullMQ)
worker processes
(worker dynos)
With a Background Job You Can…
With background
jobs you can…
Speed Things Up
Browser Web Server Background Service
request invoices PDF schedule generate
invoice PDF
in-progress
generate
invoice PDF
is it done yet?
nope!
is it done yet?
nope!
is it done yet?
yes! here it is
Sorted
Customers with
Invoices
fetch invoice data
generate invoices
customerIds: A-E
fetch invoice data
generate invoices
customerIds: F-O
fetch invoice data
generate invoices
customerIds: P-Z
combine generated
invoices
JOB QUEUE
customers A-E
customers F-O
customers P-Z
customer
list/
WORKER_COUNT
generate invoices
customerIds: A-E
generate invoices
customerIds: F-O
generate invoices
customerIds: P-Z
combine generated
invoices
JOB QUEUE
fetch data
(A-E)
write
invoices
upload “batch
invoices xxx - 1”
fetch data
(F-O)
write
invoices
upload “batch
invoices xxx - 2”
fetch data
(P-Z)
write
invoices
upload “batch
invoices xxx - 3”
combine s3 files to
“batch invoices xxx”
upload “batch
invoices xxx”
Job
Partial Job Partial Job Partial Job
Combine Partial Job Outputs
output output
output
With background
jobs you can…
Show Progress
Progress Indication
With no background job (assuming the server doesn’t time out
or run out of memory) :
Browser Web Server
request invoice PDF
here it is!
generate
pdf…
Indicating Progress
Browser Web Server Background Service
request invoices PDF schedule generate
invoice PDF
in-progress
generate
invoice PDF
jobs
is it done yet?
nope! 30% there
is it done yet?
nope! 70% there
is it done yet?
yes! here it is
Generate Partial Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
JOB PROGRESS
1 .1
Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .2
Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .4
Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .6
Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
JOB PROGRESS
1 .8
Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
● updateProgress(0.9)
JOB PROGRESS
1 .9
Generate Invoices Job
(input, updateProgress) =>
● fetch data to generate
invoices
● updateProgress(0.1)
● generate the invoices
○ updateProgress each time an
invoice generated
● updateProgress(0.9)
● upload generated file to s3
● updateProgress(1)
JOB PROGRESS
1 1
view job progress
updateProgress(X) updates job progress
aggregate progress
across jobs requests job
progress
With background
jobs you can…
Support Simultaneous Users
Before background jobs…
invoice report please!
invoice report please!
ERROR
please load X page!
ERROR
report.pdf
invoice report please!
please load X page!
working on that!
progress?
progress?
invoice report please!
working on that!
progress?
here you go!
With background
jobs you can…
Save Jobs for Later
invoice report please!
working on that!
progress?
progress?
invoice report please!
working on that!
progress?
With background
jobs you can…
Have Less Timeouts/Errors
Communicating with an External Service
With no background job:
Browser Web Server
send out customer
emails, please!
TIMEOUT
��🏻♀
Mailgun
send these
emails, please!
��
Browser Web Server
send out customer
emails, please!
send these
emails, please!
in-progress
Mailgun
is it done yet?
nope! 30% there
is it done yet?
nope! 70% there
is it done yet?
yes - all emails
are sent!
��
Background
Service
send these
emails,
please!
Recap - With a background job you can…
speed things up
show progress
support simultaneous users
have less timeouts/errors
save jobs for later
Should you use a background job?
You might want a background job for…
CPU-intensive jobs Jobs communicating externally
I/O intensive jobs Scheduled jobs
Tips
Struggling with app responsiveness?
Try a background job.
Don’t recreate the wheel.
Use a library with a robust queueing system.
Does speed matter? It probably does.
Parallelize.
The job queue makes it easy to show users
progress. Do so!
Find the optimal number of workers and
optimal amount of resources per
worker(see next slide…)
1X
1X
1X
$25 x 6 = $150
512 MB x 6 = 3GB
$50 x 3 = $150
1 GB x 3 3GB
1X
1X
1X
2X
2X
2X
$250 x 1 = $250
2.5 GB x 1 = 2.5 GB
Perf M
What about the PDF?
Fetch the invoice data from the database
Transform data to shape of PDF data
Put completed PDF somewhere client can access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
Fetch list of customers with invoices from the database
Transform fetched data to shape of PDF data
Put completed PDF somewhere client can access
Write the Invoice PDF for each customer
Combine all Invoice PDFs
do not need to fetch all
invoice data
Fetch invoice data for customers fetching 1/10 amount of data
transforming 1/10 amount of data
writing PDF for
1/10 customers
worker dyno
worker dyno
Thank You!
Lydia Cupery
lydia-cupery.com
Lydia Cupery

Weitere ähnliche Inhalte

Ähnlich wie Scaling Web Applications with Background

LITE 2018 – A Deep Dive Into the API [Iain Brown]
LITE 2018 – A Deep Dive Into the API [Iain Brown]LITE 2018 – A Deep Dive Into the API [Iain Brown]
LITE 2018 – A Deep Dive Into the API [Iain Brown]
getadministrate
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance Lecture
Vishwanath Ramdas
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
CLARA CAMPROVIN
 

Ähnlich wie Scaling Web Applications with Background (20)

Deploying Machine Learning in production without servers - #serverlessCPH
Deploying Machine Learning in production without servers - #serverlessCPHDeploying Machine Learning in production without servers - #serverlessCPH
Deploying Machine Learning in production without servers - #serverlessCPH
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Computerized Accounting System
Computerized Accounting SystemComputerized Accounting System
Computerized Accounting System
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
Service workers and their role in PWAs
Service workers and their role in PWAsService workers and their role in PWAs
Service workers and their role in PWAs
 
On the importance of done
On the importance of doneOn the importance of done
On the importance of done
 
Sapphire 2013 Presentation - Streamlining SAP Transactions for Barcode Scanne...
Sapphire 2013 Presentation - Streamlining SAP Transactions for Barcode Scanne...Sapphire 2013 Presentation - Streamlining SAP Transactions for Barcode Scanne...
Sapphire 2013 Presentation - Streamlining SAP Transactions for Barcode Scanne...
 
Office Add-ins community call-March 2019
Office Add-ins community call-March 2019Office Add-ins community call-March 2019
Office Add-ins community call-March 2019
 
Virtualization Commputing
Virtualization CommputingVirtualization Commputing
Virtualization Commputing
 
Serverless is more findev than devops
Serverless is more findev than devopsServerless is more findev than devops
Serverless is more findev than devops
 
web, spa vs traditional - 2016
web, spa vs traditional - 2016web, spa vs traditional - 2016
web, spa vs traditional - 2016
 
From 10 Deploys Per Year to 4 Per Day at DBS Bank: How Pivotal Platform Can R...
From 10 Deploys Per Year to 4 Per Day at DBS Bank: How Pivotal Platform Can R...From 10 Deploys Per Year to 4 Per Day at DBS Bank: How Pivotal Platform Can R...
From 10 Deploys Per Year to 4 Per Day at DBS Bank: How Pivotal Platform Can R...
 
PuppetConf 2017: Puppet & Google Cloud: From Nothing to Production in 10 minu...
PuppetConf 2017: Puppet & Google Cloud: From Nothing to Production in 10 minu...PuppetConf 2017: Puppet & Google Cloud: From Nothing to Production in 10 minu...
PuppetConf 2017: Puppet & Google Cloud: From Nothing to Production in 10 minu...
 
Max Voloshin - "Organization of frontend development for products with micros...
Max Voloshin - "Organization of frontend development for products with micros...Max Voloshin - "Organization of frontend development for products with micros...
Max Voloshin - "Organization of frontend development for products with micros...
 
Redesigning a large B2B website - The FusionCharts revamping story
Redesigning a large B2B website - The FusionCharts revamping storyRedesigning a large B2B website - The FusionCharts revamping story
Redesigning a large B2B website - The FusionCharts revamping story
 
LITE 2018 – A Deep Dive Into the API [Iain Brown]
LITE 2018 – A Deep Dive Into the API [Iain Brown]LITE 2018 – A Deep Dive Into the API [Iain Brown]
LITE 2018 – A Deep Dive Into the API [Iain Brown]
 
APIfying an ERP - ongoing saga
APIfying an ERP - ongoing sagaAPIfying an ERP - ongoing saga
APIfying an ERP - ongoing saga
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance Lecture
 
Web Performance, Scalability, and Testing Techniques - Boston PHP Meetup
Web Performance, Scalability, and Testing Techniques - Boston PHP MeetupWeb Performance, Scalability, and Testing Techniques - Boston PHP Meetup
Web Performance, Scalability, and Testing Techniques - Boston PHP Meetup
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
 

Mehr von All Things Open

Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
All Things Open
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
All Things Open
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
All Things Open
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
All Things Open
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
All Things Open
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
All Things Open
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
All Things Open
 

Mehr von All Things Open (20)

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
 
Building AlmaLinux OS without RHEL sources code
Building AlmaLinux OS without RHEL sources codeBuilding AlmaLinux OS without RHEL sources code
Building AlmaLinux OS without RHEL sources code
 

Kürzlich hochgeladen

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Kürzlich hochgeladen (20)

Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 

Scaling Web Applications with Background

  • 1. Scaling Web Applications with Background Jobs: Takeaways from Generating a Huge PDF Lydia Cupery All Things Open, 2023
  • 2. Hi! Nice to meet you 󰗝 Lydia Cupery lydia-cupery.com Lydia Cupery
  • 4.
  • 5. Motion design Physical Computing Download Invoices Fetch the invoice data from the database Transform data to shape of PDF data Put completed PDF somewhere client can access Write the Invoice PDF for each customer Combine all Invoice PDFs SERVER write customer invoices PDF reference to created PDF CLIENT
  • 6.
  • 7. Fetch the invoice data from the database Transform data Put completed PDF somewhere client can access Write the Invoice PDF for each customer Combine all Invoice PDFs
  • 8. Write the Invoice PDF for each customer Combine all Invoice PDFs
  • 9. First attempt - create one large PDF. Write each customer statement to the main pdf. It works but it takes way too long.
  • 10. Okay, what if we create all the PDFs at the same time? And then merge them? We’ll use a Promise.all and store all the individual PDFs in memory.
  • 11. What if we processed two or three invoices at a time, and write each of those as pdfs to the file system? Then, we could merge those PDFs together with pdf-lib.
  • 12. What if we used an external service? Is there any external service that could help us out with our bottlenecks? Write the Invoice PDF for each customer Combine all Invoice PDFs …perhaps for combining PDFs!
  • 13. Cuts down on the time to combine PDFs, but generating the individual PDFs and writing the PDFs to the file system still takes too long.
  • 14. Takes Too Long Generating & writing PDFs to the file systems takes too long. Generate too many pdfs in parallel, use a lot of memory, the dyno runs out of memory. Uses Too Much Memory VS
  • 15. What if we keep increasing memory, and upgrade our dynos to ones with more memory? invoice report please! report.pdf
  • 16. The system could still run out of memory and start failing as soon as more than one user tries to generate a PDF at once. invoice report please! invoice report please! ERROR please load X page! ERROR report.pdf
  • 17. Make it a Background Job!
  • 18. “Background jobs can dramatically improve the scalability of a web app by enabling it to offload slow or CPU-intensive tasks from its front-end.”
  • 19. Browser Web Server Background Service request invoices PDF schedule generate invoice PDF in-progress generate invoice PDF is it done yet? nope! is it done yet? nope! is it done yet? yes! here it is
  • 21. web process (web dyno) data store (redis) library to implement queue system on top of redis (BullMQ) worker processes (worker dynos)
  • 22. With a Background Job You Can…
  • 23. With background jobs you can… Speed Things Up
  • 24. Browser Web Server Background Service request invoices PDF schedule generate invoice PDF in-progress generate invoice PDF is it done yet? nope! is it done yet? nope! is it done yet? yes! here it is
  • 25. Sorted Customers with Invoices fetch invoice data generate invoices customerIds: A-E fetch invoice data generate invoices customerIds: F-O fetch invoice data generate invoices customerIds: P-Z combine generated invoices JOB QUEUE customers A-E customers F-O customers P-Z customer list/ WORKER_COUNT
  • 26. generate invoices customerIds: A-E generate invoices customerIds: F-O generate invoices customerIds: P-Z combine generated invoices JOB QUEUE fetch data (A-E) write invoices upload “batch invoices xxx - 1” fetch data (F-O) write invoices upload “batch invoices xxx - 2” fetch data (P-Z) write invoices upload “batch invoices xxx - 3” combine s3 files to “batch invoices xxx” upload “batch invoices xxx”
  • 27. Job Partial Job Partial Job Partial Job Combine Partial Job Outputs output output output
  • 28. With background jobs you can… Show Progress
  • 29. Progress Indication With no background job (assuming the server doesn’t time out or run out of memory) : Browser Web Server request invoice PDF here it is! generate pdf…
  • 31. Browser Web Server Background Service request invoices PDF schedule generate invoice PDF in-progress generate invoice PDF jobs is it done yet? nope! 30% there is it done yet? nope! 70% there is it done yet? yes! here it is
  • 32. Generate Partial Invoices Job (input, updateProgress) => ● fetch data to generate invoices ● updateProgress(0.1) JOB PROGRESS 1 .1
  • 33. Generate Invoices Job (input, updateProgress) => ● fetch data to generate invoices ● updateProgress(0.1) ● generate the invoices ○ updateProgress each time an invoice generated JOB PROGRESS 1 .2
  • 34. Generate Invoices Job (input, updateProgress) => ● fetch data to generate invoices ● updateProgress(0.1) ● generate the invoices ○ updateProgress each time an invoice generated JOB PROGRESS 1 .4
  • 35. Generate Invoices Job (input, updateProgress) => ● fetch data to generate invoices ● updateProgress(0.1) ● generate the invoices ○ updateProgress each time an invoice generated JOB PROGRESS 1 .6
  • 36. Generate Invoices Job (input, updateProgress) => ● fetch data to generate invoices ● updateProgress(0.1) ● generate the invoices ○ updateProgress each time an invoice generated JOB PROGRESS 1 .8
  • 37. Generate Invoices Job (input, updateProgress) => ● fetch data to generate invoices ● updateProgress(0.1) ● generate the invoices ○ updateProgress each time an invoice generated ● updateProgress(0.9) JOB PROGRESS 1 .9
  • 38. Generate Invoices Job (input, updateProgress) => ● fetch data to generate invoices ● updateProgress(0.1) ● generate the invoices ○ updateProgress each time an invoice generated ● updateProgress(0.9) ● upload generated file to s3 ● updateProgress(1) JOB PROGRESS 1 1
  • 39. view job progress updateProgress(X) updates job progress aggregate progress across jobs requests job progress
  • 40. With background jobs you can… Support Simultaneous Users
  • 41. Before background jobs… invoice report please! invoice report please! ERROR please load X page! ERROR report.pdf
  • 42. invoice report please! please load X page! working on that! progress? progress? invoice report please! working on that! progress? here you go!
  • 43. With background jobs you can… Save Jobs for Later
  • 44. invoice report please! working on that! progress? progress? invoice report please! working on that! progress?
  • 45. With background jobs you can… Have Less Timeouts/Errors
  • 46. Communicating with an External Service With no background job: Browser Web Server send out customer emails, please! TIMEOUT ��🏻♀ Mailgun send these emails, please! ��
  • 47. Browser Web Server send out customer emails, please! send these emails, please! in-progress Mailgun is it done yet? nope! 30% there is it done yet? nope! 70% there is it done yet? yes - all emails are sent! �� Background Service send these emails, please!
  • 48. Recap - With a background job you can… speed things up show progress support simultaneous users have less timeouts/errors save jobs for later
  • 49. Should you use a background job?
  • 50. You might want a background job for… CPU-intensive jobs Jobs communicating externally I/O intensive jobs Scheduled jobs
  • 51. Tips
  • 52. Struggling with app responsiveness? Try a background job.
  • 53. Don’t recreate the wheel. Use a library with a robust queueing system.
  • 54. Does speed matter? It probably does. Parallelize.
  • 55. The job queue makes it easy to show users progress. Do so!
  • 56. Find the optimal number of workers and optimal amount of resources per worker(see next slide…)
  • 57. 1X 1X 1X $25 x 6 = $150 512 MB x 6 = 3GB $50 x 3 = $150 1 GB x 3 3GB 1X 1X 1X 2X 2X 2X $250 x 1 = $250 2.5 GB x 1 = 2.5 GB Perf M
  • 59. Fetch the invoice data from the database Transform data to shape of PDF data Put completed PDF somewhere client can access Write the Invoice PDF for each customer Combine all Invoice PDFs
  • 60. Fetch list of customers with invoices from the database Transform fetched data to shape of PDF data Put completed PDF somewhere client can access Write the Invoice PDF for each customer Combine all Invoice PDFs do not need to fetch all invoice data Fetch invoice data for customers fetching 1/10 amount of data transforming 1/10 amount of data writing PDF for 1/10 customers worker dyno worker dyno