David Arcos presented tips for improving Django performance and scalability. He began with basic concepts like the Pareto principle and database performance. He stressed the importance of measuring performance to identify bottlenecks. Specific tips included adding database indexes, bulk operations, caching, and using queues to run slow tasks asynchronously. Arcos concluded by emphasizing the need to measure, optimize bottlenecks, and measure again to verify improvements.
2. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Abstract
Tips and best practices for avoiding scalability
issues and performance bottlenecks in Django
â 1) Basic concepts: the theory
â 2) Measuring: how to find bottlenecks
â 3) Tips and tricks
â 4) Conclusion (yes, it scales!)
3. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Hi!
â I'm David Arcos
â Python/Django developer since 2008
â Co-organizer at Python Barcelona
â CTO at Lead Ratings
4. David Arcos - @DZPMEfficient Django â #EuroPython 2016
â
âWe improve your sales conversions, using
predictive algorithms to rate the leadsâ
â
Prediction API, âMachine Learning as a Serviceâ
â
http://lead-ratings.com
6. David Arcos - @DZPMEfficient Django â #EuroPython 2016
The Pareto Principle
"For many events, roughly 80% of the effects
come from 20% of the causes"
7. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Prioritize and focus
Focus on the few tasks that will have the most impact
8. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Basic scalability
âPotential to be enlarged to handle a growing
amount of workâ
â
Stateless app servers
â Load balance them, scale horizontally
â
Keep the state on the database(s)
â This is the difficult part! Each system is different
9. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Database performance
â
Do less requests:
â Less reads
â Less writes
â
Do faster requests:
â Indexed fields
â De-normalize
10. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Templates
â
Cache them
â
Jinja2 is a bit faster than the default engine
â but cache them anyways
â
You can do fragment caching (for blocks)
11. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Cache
â
Generic approach: cache at each stack level
â
The cache documentation is excellent
â
Beware of the cache invalidation!
12. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Cache
â
Generic approach: cache at each stack level
â
The cache documentation is excellent
â
Beware of the cache invalidation!
13. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Bottlenecks
â
Where is your bottleneck?
â
CPU bound or I/O bound?
â CPU? Run heavy calculations in async workers
â Memory? Compress objects before caching
â Database? Read from db replicas
â
How to find it?
15. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Can't improve what you don't measure
â
Measure your system to find bottlenecks
â
Optimize those bottlenecks
â
Verify the improvements
â
Rinse and repeat!
16. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Monitoring
â
System: load, CPU, memory...
â
Database: q/s, response time, size
â
Cache: q/s, hit rate
â
Queue: length
â
Custom: metrics for your app
17. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Profiling
â
The cProfile module provides profiling of
Python programs by collecting data:
â Number of calls, running time, time per call...
18. David Arcos - @DZPMEfficient Django â #EuroPython 2016
timeit
â
The timeit module is a simple way to time
execution time of small bits of Python code:
19. David Arcos - @DZPMEfficient Django â #EuroPython 2016
ipdb
â
Like pdb, but for ipython
â tab completion, syntax highlighting, better
tracebacks, better introspectionâŠ
â
Use ipdb.set_trace() to add a breakpoint and
jump in with the debugger
20. David Arcos - @DZPMEfficient Django â #EuroPython 2016
django-debug-toolbar
â
Display debug information about the current
request/response
â
Panels, very modular
21. David Arcos - @DZPMEfficient Django â #EuroPython 2016
django-debug-toolbar-line-profiler
â
A toolbar panel for profiling
Django Debug Panel
â
Chrome extension
â
For AJAX requests and non-HTML responses
22. David Arcos - @DZPMEfficient Django â #EuroPython 2016
3) Tips and tricks
23. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Add db indexes
â
Single (db_index) or multiple (index_together)
â
Be sure to profile and measure!
â Sometimes itâs not obvious (i.e., admin)
â Huge difference, i.e. from 15s to 3 ms (3.5M rows)
â
But: uses more space, slower writes
24. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Do bulk operations
â
Will greatly reduce the number of SQL queries:
â Model.objects.bulk_create()
â qs.update() <- maybe with F() expressions
â qs.delete()
25. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Get related objects
â
Return FK fields in same query:
â qs.select_related()
â
Return M2M fields, extra query:
â qs.prefetch_related()
26. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Slow admin?
â
Use list_select_related
â
Overwrite get_queryset() with prefetch_related
â
Is ordering using an index? Same for search_fields
â
readonly_fields will avoid FK/M2M queries
â
Use the raw_id_fields widget (or better:
django-salmonella)
â
Extend admin/filter.html to show filters as <select>
27. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Cachalot
â
Caches your Django ORM queries and
automatically invalidates them
28. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Queues and workers
â
Do slow stuff later
â
Some operations can be queued, and executed
asynchronously in workers
â
Use Celery
29. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Cached sessions
â
Use SESSION_ENGINE to set cached sessions:
â Non-persistent: donât hit the DB
â Persistent: donât hit the DB⊠so often
30. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Persistent connections
â
Use CONN_MAX_AGE to set the lifetime of a
database connection (persistence)
31. David Arcos - @DZPMEfficient Django â #EuroPython 2016
UUIDs
â
Use UUID for Primary Keys (instead of
incremental IDs)
â Guaranteed uniqueness, avoid collisions
â UUIDs are well-indexed
â
Easier db sharding
32. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Slow tests?
â
Skip migrations: --keepdb
â
Run in parallel: --parallel
â
Disable unused middlewares, installed_apps,
password hashers, logging, etcâŠ
â
Use mocking whenever possible
33. David Arcos - @DZPMEfficient Django â #EuroPython 2016
4) Conclusions
â
Measure first
â
Optimize only the bottleneck
â
Go for the low-hanging fruit
â
Measure again
34. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Good resources
â
The official Django documentation
â
Book: âHigh Performance Djangoâ
â
Blog: âInstagram Engineeringâ
â
âLatency Numbers Every Programmer Should Knowâ
35. David Arcos - @DZPMEfficient Django â #EuroPython 2016
Thanks for attending!
- Get the slides at http://slideshare.net/DZPM
- We are looking for engineers and data scientists!