Celery is a really good framework for doing background task processing in Python (and other languages). While it is ridiculously easy to use celery, doing complex task flow has been a challenge in celery. (w.r.t task trees/graphs/dependecies etc.)
This talk introduces the audience to these challenges in celery and also explains how these can be fixed programmatically and by using latest features in Celery (3+)
2. @mahendra
● Python developer for 6 years
● FOSS enthusiast/volunteer for 14 years
● Bangalore LUG and Infosys LUG
● FOSS.in and LinuxBangalore/200x
● Celery user for 3 years
● Contributions
● patches, testing new releases
● Zookeeper msg transport for kombu
● Kafka support (in-progress)
3. Quick Intro to Celery
● Asynchronous task/job queue
● Uses distributed message passing
● Tasks are run asynchronously on worker nodes
● Results are passed back to the caller (if any)
6. Uses of Celery
● Asynchronous task processing
● Handling long running / heavy jobs
● Image resizing, video transcode, PDF generation
● Offloading heavy web backend operations
● Scheduling tasks to be run at a particular time
● Cron for python
7. Advanced Uses
● Task Routing
● Task retries, timeout and revoking
● Task Canvas – combining tasks
● Task co-ordination
● Dependencies
● Task trees or graphs
● Batch tasks
● Progress monitoring
● Tricks
● DB conflict management
8. Sending tasks to a particular worker
Worker 1
(Windows)
windows
Worker 2
windows (Windows)
Sender Msg Q
.
linux
.
.
Worker N
(Linux)
9. Routing tasks – Use cases
● Priority execution
● Based on hardware capabilities
● Special cards available for video capture
● Making use of GPUs (CUDA)
● Based on OS (for eg. Playready encryption)
● Based on location
● Moving compute closer to data (Hadoop-ish)
● Sending tasks to different data centers
● Sequencing operations (CouchDB conflicts)
10. Sample Code
from celery.task import task
@task(queue = 'windows')
def drm_encrypt(audio_file, key_phrase):
...
r = drm_encrypt.apply_async( args = [afile, key],
queue = 'windows' )
#Start celery worker with queues options
$ celery worker -Q windows
12. Retrying tasks
● You can specify the number of times a task can
be retried.
● The cases for retrying a task must be handled
within code. Celery will not do it automatically
● The tasks should be designed to be idempotent
13. Handling worker failures
@task( acks_late = True )
def drm_encrypt(audio_file, key_phrase):
try:
playready.encrypt(...)
except Exception, exc:
raise drm_encrypt.retry(exc=exc, countdown=5)
● This is used where the task must be resend in case of
worker or node failure
● The ack message to the message queue is sent after the
task finishes executing
14. Worker processes
Worker 1
(Windows)
windows
Worker 2
windows (Windows)
Sender Msg Q
.
linux
.
.
Worker N
(Linux)
Process 1
Process 2
Process N
15. Worker processes
Worker 1
(Windows)
windows
Worker 2
windows (Windows)
Sender Msg Q
.
linux
.
.
Worker N
(Linux)
Process 1
Process 2
Process N
16. Worker process
● In every worker node, celery starts a pool of
worker processes
● The number is determined by the concurrency
setting (or autodetected – for full CPU usage)
● Each processes can be configured to restart
after running x number of tasks
● Disabled by default
● Alternately eventlet can be used instead of
processes (discuss later)
17. Revoking tasks
celery.control.revoke( task_id,
terminate = False,
signal = 'SIGKILL' )
●
revoke() works by sending a broadcast
message to all workers
● If a task has not yet run, workers will keep this
task_id in memory and ensure that it does not
run
● If a task is running, revoke() will not work
unless terminate = True
18. Task expiration
task.apply_async( expires = x )
x can be
* in seconds
* a specific datetime()
● Global time limits can be configured in settings
● Soft time limit – the task receives an exception
which can be used to cleanup
● Hard time limit – the worker running the task is
killed and is replaced with another one.
20. Task Canvas
● Chains – Linking one task to another
● Groups – Execute several tasks in parallel
● Chord – execute a task after a set of tasks has
finished
● Map and starmap – Similar to map() function
● Chunks – divide an iterable of work into chunks
● Chunks + Chord/chain can be used for map-
reduce
Best shown in a demo
22. Task Trees
● Home grown solution (our current approach)
● Use db models and keep track of trees
● Better approach
● Use celery-tasktree
● http://pypi.python.org/pypi/celery-tasktree
23. Celery Batches
● Collect jobs and execute it in a batch.
● Can be used for stats collection
● Batch execution is done once
● a configured timeout is reached OR
● a configured number of tasks have been received
● Useful for reducing n/w and db loads