7. introducing Celery
asynchronous job queue/task queue
based on distributed message passing
focused on real time operation
works for scheduled tasks too!
open source and integrated with Django
8. how it works?
worker 1
website .
(view) celery MB
.
.
worker n
results
9. task
@celery.task
def add(x, y):
return x + y
...
add.delay(2, 2) #somewhere in a view
atomic
ideally idempotent
same environment as the website
real-time or scheduled
11. few tips about tasks
granularity
data locality
state
“asserting the world is the responsibility of the task”
12. subtasks
tasks spawned from within another task
calling:
add.subtask(1, 1).delay()
add.s(1, 1).delay() # this is a shortcut
the primitives:
chains, groups, chords, maps, chunks
16. subtasks: chords
same as groups, but apply callback on results
@task
def ts(numbers):
return sum(numbers)
chord(add.s(i,i) for i in range(3))(ts.s())
6 # sum([0, 2, 4])
chord(headers)(callback)
20. case study: korect
quiz exam management and automatic paper
processing using OMR
existing solution: desktop app, ~2 tests/minute
using ReportLab, PyPDF, OpenCV
21. django + celery version
~ 20 tests/minute
same machine, 4 worker processes
parallelized parts:
- print file generation
- paper scanning
- correcting and grading
- question usage report
22. example - generate print file
add a Download object to db
delay the task (not the real task):
chord([test_pdf for t in tests])(merge_pdf)
update page with dw status
...
at the end of merge_pdf:
update status, flash user