High Performance Django

High Performance Django
David Cramer
http://www.davidcramer.net/
http://www.ibegin.com/
Curse
•  Peak daily traffic of approx. 15m pages, 150m hits.
•  Average monthly traffic 120m pages, 6m uniques.
•  Python, MySQL, Squid, memcached, mod_python, lighty.
•  Most developers came strictly from PHP (myself included).
•  12 web servers, 4 database servers, 2 squid caches.

iBegin
•  Massive amounts of data, 100m+ rows.
•  Python, PHP, MySQL, mod_wsgi.
•  Small team of developers.
•  Complex database partitioning/synchronization tasks.
•  Attempting to not branch off of Django. 

Areas of Concern
•  Database (ORM)
•  Webserver (Resources, Handling Millions of Reqs)
•  Caching (Invalidation, Cache Dump)
•  Template Rendering (Logic Separation)
•  Profiling
Tools of the Trade
•  Webserver (Apache, Nginx, Lighttpd)
•  Object Cache (memcached)
•  Database (MySQL, PostgreSQL, …)
•  Page Cache (Squid, Nginx, Varnish)
•  Load Balancing (Nginx, Perlbal)

How We Did It
•  “Primary” web servers serving Django using mod_python.
•  Media servers using Django on lighttpd.
•  Static served using additional instances of lighttpd.
•  Load balancers passing requests to multiple Squids.
•  Squids passing requests to multiple web servers.

Lessons Learned
•  Don’t be afraid to experiment. You’re not limited to a one.
•  mod_wsgi is a huge step forward from mod_python.
•  Serving static files using different software can help.
•  Send proper HTTP headers where they are needed.
•  Use services like S3, Akamai, Limelight, etc..

Webserver Software
Python Scripts Static Content
•  Apache (wsgi, mod_py, •  Apache
fastcgi) •  Lighttpd
•  Lighttpd (fastcgi) •  Tinyhttpd
•  Nginx (fastcgi) •  Nginx
Reverse Proxies Software Load Balancers
•  Nginx •  Nginx
•  Squid •  Perlbal
•  Varnish
Database (ORM)
•  Won’t make your queries efficient. Make your own indexes.
•  select_related() can be good, as well as bad.
•  Inherited ordering (Meta: ordering) will get you.
•  Hundreds of queries on a page is never a good thing.
•  Know when to not use the ORM.

Handling JOINs
class Category(models.Model):
name = models.CharField()
created_by = models.ForeignKey(User)
class Poll(models.Model):
category = models.ForeignKey(Category)
created_by = models.ForeignKey(User)
# We need to output a page listing all Poll's with

# their name and category's name.
def a_bad_example(request):
# We have just caused Poll to JOIN with User and Category,
# which will also JOIN with User a second time.
my_polls = Poll.objects.all().select_related()
return render_to_response('polls.html', locals(), request)
def a_good_example(request):
# Use select_related explicitly in each case.
poll = Poll.objects.all().select_related('category')
return render_to_response('polls.html', locals(), request)
Template Rendering
•  Sandboxed engines are typically slower by nature.
•  Keep logic in views and template tags.
•  Be aware of performance in loops, and groupby (regroup).
•  Loaded templates can be cached to avoid disk reads.
•  Switching template engines is easy, but may not give you
any worthwhile performance gain.

Template Engines
Caching
•  Two flavors of caching: object cache and browser cache.
•  Django provides built-in support for both.
•  Invalidation is a headache without a well thought out plan.
•  Caching isn’t a solution for slow loading pages or improper indexes.
•  Use a reverse proxy in between the browser and your web servers:
Squid, Varnish, Nginx, etc..

Cache With a Plan
•  Build your pages to use proper cache headers.
•  Create a plan for object cache expiration, and invalidation.
•  For typical web apps you can serve the same cached page
for both anonymous and authenticated users.
•  Contain commonly used querysets in managers for
transparent caching and invalidation.

Cache Commonly Used Items
def my_context_processor(request):
# We access object_list every time we use our context processors so
# it makes sense to cache this, no?
cache_key = ‘mymodel:all’
object_list = cache.get(cache_key)
if object_list is None:
object_list = MyModel.objects.all()
cache.set(cache_key, object_list)
return {‘object_list’: object_list}
# Now that we are caching the object list we are going to want to invalidate it
class MyModel(models.Model):
def save(self, *args, **kwargs):

super(MyModel, self).save(*args, **kwargs)
# save it before you update the cache
cache.set(‘mymodel:all’, MyModel.objects.all())
Profiling Code
•  Finding the bottleneck can be time consuming.
•  Tools exist to help identify common problematic areas.
–  cProfile/Profile Python modules.
–  PDB (Python Debugger)

Profiling Code With cProfile
import sys
try: import cProfile as profile
except ImportError: import profile
try: from cStringIO import StringIO
except ImportError: import StringIO
from django.conf import settings
class ProfilerMiddleware(object):
def can(self, request):
return settings.DEBUG and 'prof' in request.GET and (not settings.INTERNAL_IPS or request.META['REMOTE_ADDR'] in
settings.INTERNAL_IPS)
def process_view(self, request, callback, callback_args, callback_kwargs):
if self.can(request):
self.profiler = profile.Profile()
args = (request,) + callback_args
return self.profiler.runcall(callback, *args, **callback_kwargs)
def process_response(self, request, response):
self.profiler.create_stats()
out = StringIO()
old_stdout, sys.stdout = sys.stdout, out
self.profiler.print_stats(1)
sys.stdout = old_stdout
response.content = '<pre>%s</pre>' % out.getvalue()
return response
http://localhost:8000/?prof
Profiling Database Queries
from django.db import connection
class DatabaseProfilerMiddleware(object):
def can(self, request):
return settings.DEBUG and 'dbprof' in request.GET \
and (not settings.INTERNAL_IPS or \
request.META['REMOTE_ADDR'] in settings.INTERNAL_IPS)
def process_response(self, request, response):

out = StringIO()
out.write('time\tsql\n')
total_time = 0
for query in reversed(sorted(connection.queries, key=lambda x: x['time'])):
total_time += float(query['time'])*1000
out.write('%s\t%s\n' % (query['time'], query['sql']))
response.content = '<pre style="white-space:pre-wrap">%d queries executed in %.3f seconds\n\n%s</pre>' %

(len(connection.queries), total_time/1000, out.getvalue())
return response
http://localhost:8000/?dbprof
Summary
•  Database efficiency is the typical problem in web apps.
•  Develop and deploy a caching plan early on.
•  Use profiling tools to find your problematic areas. Don’t pre-
optimize unless there is good reason.
•  Find someone who knows more than me to configure your
server software. 
Thanks!
Slides and code available online at:

http://www.davidcramer.net/djangocon

High Performance Django

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

High Performance Django

Enviado por

Direitos autorais:

Formatos disponíveis

High Performance Django

• Average monthly traffic 120m pages, 6m uniques.

• Python, MySQL, Squid, memcached, mod_python, lighty.

• Most developers came strictly from PHP (myself included).

• 12 web servers, 4 database servers, 2 squid caches.

• Python, PHP, MySQL, mod_wsgi.

• Small team of developers.

• Complex database partitioning/synchronization tasks.

• Attempting to not branch off of Django. 

• Webserver (Resources, Handling Millions of Reqs)

• Caching (Invalidation, Cache Dump)

• Template Rendering (Logic Separation)

• Object Cache (memcached)

• Database (MySQL, PostgreSQL, …)

• Page Cache (Squid, Nginx, Varnish)

• Load Balancing (Nginx, Perlbal)

• “Primary” web servers serving Django using mod_python.

• Media servers using Django on lighttpd.

• Static served using additional instances of lighttpd.

• Load balancers passing requests to multiple Squids.

• Squids passing requests to multiple web servers.

• mod_wsgi is a huge step forward from mod_python.

• Serving static files using different software can help.

• Send proper HTTP headers where they are needed.

• Use services like S3, Akamai, Limelight, etc..

• select_related() can be good, as well as bad.

• Inherited ordering (Meta: ordering) will get you.

• Hundreds of queries on a page is never a good thing.

• Know when to not use the ORM.

# We need to output a page listing all Poll's with

• Keep logic in views and template tags.

• Be aware of performance in loops, and groupby (regroup).

• Loaded templates can be cached to avoid disk reads.

• Switching template engines is easy, but may not give you

any worthwhile performance gain.

• Django provides built-in support for both.

• Invalidation is a headache without a well thought out plan.

• Caching isn’t a solution for slow loading pages or improper indexes.

Squid, Varnish, Nginx, etc..

• Create a plan for object cache expiration, and invalidation.

for both anonymous and authenticated users.

• Contain commonly used querysets in managers for

transparent caching and invalidation.

def save(self, *args, **kwargs):

• Tools exist to help identify common problematic areas.

– cProfile/Profile Python modules.

– PDB (Python Debugger)

def process_response(self, request, response):

response.content = '<pre style="white-space:pre-wrap">%d queries executed in %.3f seconds\n\n%s</pre>' %

• Develop and deploy a caching plan early on.

• Use profiling tools to find your problematic areas. Don’t pre-

optimize unless there is good reason.

• Find someone who knows more than me to configure your

Slides and code available online at:

Você também pode gostar

•  Average monthly traffic 120m pages, 6m uniques.

•  Python, MySQL, Squid, memcached, mod_python, lighty.

•  Most developers came strictly from PHP (myself included).

•  12 web servers, 4 database servers, 2 squid caches.

•  Python, PHP, MySQL, mod_wsgi.

•  Small team of developers.

•  Complex database partitioning/synchronization tasks.

•  Attempting to not branch off of Django. 

•  Webserver (Resources, Handling Millions of Reqs)

•  Caching (Invalidation, Cache Dump)

•  Template Rendering (Logic Separation)

•  Object Cache (memcached)

•  Database (MySQL, PostgreSQL, …)

•  Page Cache (Squid, Nginx, Varnish)

•  Load Balancing (Nginx, Perlbal)

•  “Primary” web servers serving Django using mod_python.

•  Media servers using Django on lighttpd.

•  Static served using additional instances of lighttpd.

•  Load balancers passing requests to multiple Squids.

•  Squids passing requests to multiple web servers.

•  mod_wsgi is a huge step forward from mod_python.

•  Serving static files using different software can help.

•  Send proper HTTP headers where they are needed.

•  Use services like S3, Akamai, Limelight, etc..

•  select_related() can be good, as well as bad.

•  Inherited ordering (Meta: ordering) will get you.

•  Hundreds of queries on a page is never a good thing.

•  Know when to not use the ORM.

•  Keep logic in views and template tags.

•  Be aware of performance in loops, and groupby (regroup).

•  Loaded templates can be cached to avoid disk reads.

•  Switching template engines is easy, but may not give you

•  Django provides built-in support for both.

•  Invalidation is a headache without a well thought out plan.

•  Caching isn’t a solution for slow loading pages or improper indexes.

•  Create a plan for object cache expiration, and invalidation.

•  Contain commonly used querysets in managers for

•  Tools exist to help identify common problematic areas.

–  cProfile/Profile Python modules.

–  PDB (Python Debugger)

•  Develop and deploy a caching plan early on.

•  Use profiling tools to find your problematic areas. Don’t pre-

•  Find someone who knows more than me to configure your