Skip to content

Profiling Datastore Access on App Engine

December 15, 2008

When using an ORM it’s sometimes not obvious when the datastore is accessed. For example in Rietveld the models have some custom methods and when used in templates it’s sometimes hard to guess if you’re using an db.Property already fetched from the datastore or if an attribute is a method that executes further calls to the datastore API behind the scenes.

Take this snippet:

{%if patch.no_base_file%}
  Do something
{%else%}
  Do something else
{%endif%}

Looks pretty straight forward, but “no_base_file” is a method that fetches another object from the datastore and returns True or False depending on the related object. Imagine this in a forloop and you get easily a bunch of calls to the datastore without actually knowing about them…

Datastore access during template rendering is just one aspect of an application where it isn’t always obvious what actually happens on the datastore. In general, when using attributes and methods – or in other words, when using an object-relational mapper – datastore access gets easily out of sight. With larger applications you’ll soon reach a point where you need to profile datastore access.

When I first started to write some profiling code, I’ve chosen the monkey patch approach and replaced datastore.Put/Get/Delete with wrapper functions that do some logging. This thread on the App Engine mailing list pointed me to a middleware (see “Debugging datastore access”) that uses the same approach. Soon it turned out that patching each datastore related function doesn’t cover all calls to the datastore API. There are GQL queries and reference properties too…

But if you dig a little deeper, you’ll soon encounter apiproxy_stub_map.MakeSyncCall. According to it’s doc string “The APIProxy entry point” – a handy function for our purposes with the following signature:

MakeSyncCall(service, call, request, response)

Let’s have a closer look at it’s arguments (again, copied from the doc string):

service: string representing which service to call
  call: string representing which function to call
  request: protocol buffer for the request
  response: protocol buffer for the response

The service we are interested in is “datastore_v3”. So when patching this function we should have a single place where every datastore access must come across. “call” is perfect for logging, it’s a string and determines the action (e.g. “Get”, “Put”, …). “request” and “response” are not the WSGI ones, they are the datastore query and the query result. Both are protocol buffer objects that have nice string representations. So the collecting profiling data is easy as:

profile_data.append((call, str(request), str(response))

BTW, it works both when using the SDK and in production environment. The other services are “images”, “mail”, “memcache”, “urlfetch” and “user”.

Now that we have a bunch of profiling data, we’ve a new problem too. For some pages the collected data is too large to add it at the end of each response. Besides that, adding profiling data to the response has other bad side-effects: It breaks AJAX/JSON responses or responses processed by command line scripts and it gets lost when a page controller does a redirect.

You can avoid this by storing the profiling data on a per-call basis somewhere in memcache and use a separate AJAX-driven page to view it. Of course, it’s still required to cut some data for large response or request objects if it’s larger than 1MB to be able to store it in memcache.
Here’s a screenshot showing some data collected from the tasks example application:

And finally it took just a single look at the collected data to spot some expensive code parts like in the example above.

(Monkey) patching App Engine is always critical. So this approach is not meant to be used in production environment, even though it works in both environments. A known problem is when the SDK reloads some modules some objects may become invalid in the profiler code.

Refer to this App Engine Cookbook recipe if you want to use this datastore profiler in your own applications.

2 Comments
  1. Antonin Hildebrand permalink
    December 15, 2008 9:55 am

    Thanks for great post Andi!

    I’m considering adding optional GAE support for database profiling into FirePython. Your example can be valuable start point for to move things forward.

  2. Ubaldo permalink
    December 15, 2008 3:35 pm

    Hi Antonin Hildebrand

    I’ve been using your firepython code, looks great. Adding support for this datastore profiling will be great.

    BTW, is there a way to clear the firepython output?

Comments are closed.

%d bloggers like this: