Skip to content

Blog moved

Blog moved here:

Sorry for the inconvenience, but now I can use ReStructured Text to write posts :)

App Engine Tracebacks via E-Mail

April 30, 2009

Edit 2009-09-04: Starting with App Engine SDK 1.2.5 released today there’s now a much smarter approach. See the docstring of the ereporter module for details!

Django’s error reporting via e-mail is something I’ve missed for my App Engine applications. If you’ve followed the instructions on how to use Django on App Engine errors are already logged. But honestly, I don’t check the logs very often and sometimes it’s just a silly typo that’s fixed in a minute ;-)

So if you’ve followed the instructions you can simply extend the log_exception function to send an email after the exception is logged:

def log_exception(*args, **kwds):
  """Django signal handler to log an exception."""
  excinfo = sys.exc_info()
  cls, err = excinfo[:2]
  subject = 'Exception in request: %s: %s' % (cls.__name__, err)

  # Send an email to the admins
  if 'request' in kwds:
      repr_request = repr(kwds['request'])
      repr_request = 'Request repr() not available.'
    repr_request = 'Request not available.'
  msg = ('Application: %s\nVersion: %s\n\n%s\n\n%s'
         % (os.getenv('APPLICATION_ID'), os.getenv('CURRENT_VERSION_ID'),
  subject = '[%s] %s' % (os.getenv('APPLICATION_ID'), subject)

Make sure to replace “APP_ADMIN@EXAMPLE.COM” with the email address of an admin for this application and to add these two imports somewhere at the top of your

import traceback
from google.appengine.api import mail

You don’t need to comment this when developing your app to avoid lots of useless mails. The dev server doesn’t fully implement send_mail_to_admins() most likely as there’s no real concept of an designated admin user in the SDK. It just writes a short message to the log, even if the sendmail option is enabled.

P.s.: If you use webapp instead of Django have a look at the seventh slide of this presentation by Ken Ashcraft. He shows a similar implementation for webapp based applications.


SQL Parsing with Python, Pt. II

March 29, 2009

After I’ve described some basics in part I let’s have a closer look to the actual Python module called sqlparse. First off, have a look at the project page on how to download and install this module in case you’re interested. But now, let’s have some fun…

The API of the module is pretty simple. It provides three top-level functions on module level: sqlparse.split(sql) splits sql into separate statements, sqlparse.parse(sql) parses sql and returns a tree-like structure and sqlparse.format(sql, **kwds) returns a beautified version of sql according to kwds.

As mentioned in my previous post, what effort needs to be done to build the return values depends on what lexing and parsing work is needed to find the result. For example sqlparse.split() does the following:

  • Generate a token-type/value stream basically with a copy of Pygments lexer
  • Apply a filter (in terms of a Pygments stream filter) to find statements
  • Serialize the token stream back to unicode

sqlparse.parse() does all the grouping work and sqlparse.format() runs through those groups, modifies them according to the given formatting rules and finally converts it back to unicode.

Here’s an example session in a Python shell:

>>> import sqlparse
>>> # Splitting statements:
>>> sql = 'select * from foo; select * from bar;'
>>> sqlparse.split(sql)
<> # Formatting statemtents:
>>> sql = 'select * from foo where id in (select id from bar);'
>>> print sqlparse.format(sql, reindent=True, keyword_case='upper')
FROM foo
  (SELECT id
   FROM bar);

>>> # Parsing
>>> sql = 'select * from "someschema"."mytable" where id = 1'
>>> res = sqlparse.parse(sql)
>>> res
<<>> stmt = res[0]
>>> stmt.to_unicode()  # converting it back to unicode
<<<u>>> # This is how the internal representation looks like:
>>> stmt.tokens

Now, how does the grouping work? Grouping is done with a set of simple functions. Each function searches for a simple pattern and if it finds one a new group is built. Let’s have a look at the function that finds the WHERE clauses.

def group_where(tlist):
    [group_where(sgroup) for sgroup in tlist.get_sublists()
     if not isinstance(sgroup, Where)]
    idx = 0
    token = tlist.token_next_match(idx, T.Keyword, 'WHERE')
    stopwords = ('ORDER', 'GROUP', 'LIMIT', 'UNION')
    while token:
        tidx = tlist.token_index(token)
        end = tlist.token_next_match(tidx+1, T.Keyword, stopwords)
        if end is None:  # WHERE is at the end of the statement
            end = tlist.tokens[-1]
            end = tlist.tokens[tlist.token_index(end)-1]
        group = tlist.group_tokens(Where, tlist.tokens_between(token, end))
        idx = tlist.token_index(group)
        token = tlist.token_next_match(idx, T.Keyword, 'WHERE')

tlist is a list of tokens, possible subgroups are handled first (bottom-up approach). Then it grabs the first occuring “WHERE” and looks for the next matching stop word. I’m pretty unsure if the stop words approach is right here, but at least it works for now… If it finds a stop word, a group using the class Where is created, the tokens between WHERE and the stop word are attached to it and – still within the while loop – the next WHERE keyword is used as the next starting point.

So why not use a grammar here? At first, this piece of code is pretty simple and easy to maintain. But it can also handle grammatically incorrect statements more lazily, e.g. it’s no problem to add an if clause that – when for example an unexpected token occurs – the function just jumps to the next occurance of WHERE without changing anything or even raising an exception. To achieve this the token classes provide helper functions to inspect the surroundings of an occurrence (in fact, just simple list operations). There’s no limitation what a grouping function can do with the given token list, so you could even “guess” a proper group with some nasty algorithm.

The current problem with this approach is performance. Here are some numbers:

Size split() parse()
100kb (20600 tokens) 0.3 secs. 1.8 secs.
1.9MB (412500 tokens) 5.53 secs. 37 secs.

Most of the performance is lost when giving up the stream-oriented approach in the parsing phase. The numbers are based on the first unrevised working version. I expect performance improvents especially in the way how token lists are handled behind the scenes with upcoming versions. For real life statements the parser behaves quite well. BTW, the Pygments lexer takes about 6 seconds (compared to 5.5 secs. for splitting) for the 1.9MB of SQL.

The non-validating approach is a disadvantage too. You’ll never know if a statement is valid. You can even parse middle high german phrases and receive a result:

>>> sqlparse.parse('swer an rehte güete wendet sîn gemüete')[0].tokens

It’s up to the user of this module to provide suitable input and to interpret the output. Furthermore the parser only supports not every nifty edge of an SQL dialect. Currently it’s mostly ANSI-SQL with some PostgreSQL specific stuff. But it should be easy to implement further grouping functions to provide more SQL varieties.

The splitting feature is currently used by CrunchyFrog and does a pretty good job there. I assume that SQL splitting works stable and reliable in most cases. Beautifiying and parsing is very new in the module and full functionality needs to be proven with time. Luckily the top-level API with it’s three functions is damn simple and keeps the doors open for significant changes behind the scenes if they’re needed.

The sources of the sqlparse module are currently hosted on but may move to Google Code anytime soon. Refer to the project page on Google Code for downloads and how to access the sources.

In addition there’s a simple AppEngine application that exposes the formatting features as an online service.

SQL Parsing with Python, Pt. I

March 29, 2009

Some time ago I was in search for some kind of SQL parser module for Python. As you can guess, my search wasn’t really successfull. pyparsing needs a grammar, but I’m not really interested in writing a full-blown grammar for SQL with it’s various dialects. AFAICT Gadfly implements a customized SQL parser, but I was not able to figure out how to use it. And there’s sqlliterals, but as the name suggests it’s just for identifying literals within statements.

I expect such a module to do the following:

  • It should be fast ;-)
  • scalable – what the parser needs to know about a string containing SQL statements depends on what I want to do with it. If I just want to split that string in separate statements I don’t need to know as much as when I want to know what identifiers occur in it.
  • non-validating –  Parsing shouldn’t fail if the statement is syntactically incorrect as I want to use it for an SQL editor where the statements are (naturally) incorrect most of the time.
  • some beautifying is a nice-to-have – We all know SQL generated by a script and copied from a logging output to a database front-end – it always looks ugly and is pretty unreadable.

The only thing I’ve found that comes close to my needs is Pygments (yes, the syntax highlighter). It does a really good job highlighting SQL, so it “understands” at least something. To be concrete, it has a rock-solid and fast lexer. There’s a FAQ entry if it’s possible to use Pygments for progamming language processing. The answer is:

The Pygments lexing machinery is quite powerful can be used to build lexers for basically all languages. However, parsing them is not possible, though some lexers go some steps in this direction in order to e.g. highlight function names differently.

Also, error reporting is not the scope of Pygments. It focuses on correctly highlighting syntactically valid documents, not finding and compensating errors.

There’s a nice distinction between lexers and parser in this answer. The former does lexical and the latter syntactical analysis of a given stream of characters. For my needs this led to a two step process: first to do the lexing using Pygments mechanism (BTW, read “focuses on […] syntactically valid documents, not finding […] errors” as “non-validating”) and then to add parsing on top of the token stream.

So I stripped the lexing mechanism for SQL out of Pygments to get rid of some overhead not needed for my purposes (e.g. loading of extensions). The only changes to the Pygments tokenizer was to replace a huge regex for finding keywords by a dictionary-based lookup and to add a few new tokens to make live easier in the second processing stage. The achieved performance improvement by replacing the regex doesn’t play a significant role for Pygments, it just speeds up the lexing a bit.

In the second step the very efficient token-type/value stream generated in step 1 is grouped together in nested lists of simple classes. Instantiating a bunch of classes results in a performance loss, but with the benefit of some helpful methods for analyzing the now classified tokens. As the classes are pretty simple most of the performance loss was recaptured by using slots.

The grouping mechanism is again non-validating. It tries to find patterns like identifiers and their aliases or WHERE clauses and their conditions in the token stream, but it just leaves the tokens unchanged if it doesn’t find what it’s looking for. So it actually defines some sort of “grammar”, but very unrestrictive. That means that the quality of answers to questions like “Is there a WHERE clause?” or “Which tables are affected?” heavily depends on the syntactical clearness of the given statement at the time when it’s processed and it’s up to the user or application using the module to intrepet the answers right. As the module is seen as a helper during the develoment process of SQL statements it doesn’t insist on syntactical correctness as the targeted database system would do when the statement is executed.

All in all these are the two (simple) processing steps:

  1. Lexing a given string using regular expressions and generate a token-type/value stream (in fact, this is the Pygments lexer).
  2. Parsing
    1. Instantiate a simple token class for each item in the stream to ease syntactical analysis.
    2. Group those tokens in nested lists by using simple functions by finding patterns.

Depending on what should be done with the statements, only the first step is required. For example, splitting statements or simple formatting rules don’t require parsing.

Pt. II of this blog post will cover the two remaining points: scalibality and the beautifiying goodie. And it will give a closer look at the  module I came up with, including some performance stats. For the impatient: the source code of this module is available here.

Creating Man Pages Using optparse and distutils

March 17, 2009

This blog post describes how to generate a simple man page during build time for Python applications using distutils and optparse. Technically this post describes how to write a custom HelpFormatter and a custom build command.

For GUI applications command line options usually bother me not much. Just in some rare cases, mainly when debugging or feeling unlucky with the startup behavior, I’m interested in these options. Even then I prefer to use “–help”. BTW, for the “real” command line tools like sed, ls or grep this is totally different.

So, due to my lack of interest in man pages for GUI applications I’ve shipped the previous release of CrunchyFrog along with a totally out-dated man page. Moreover I really don’t like writing man pages or keeping them in sync with the actual command line options already documented by OptionParser’s help features…
Read more…

Something completely different

February 6, 2009

Spent half of the evening crawling through sites I’ve rarely visited in the last few months.

There are at least two “new” Sandman songs on MySpace. Ok, they’re not new at all, but mystic… ;-) BTW if you haven’t did so already, support the Mark Sandman Music Project in any way…

While clicking links back and forth I came across a Hi-N-Dry photostream on Flickr. hm.. the new rooms look way too clean compared to the legendary old ones (but that’s maybe just because they’re new…)

While talking about fresh music, you definitely have to listen or to view the Chip Smith Project! What a refreshing music, ol’ Chip Smith must have been a very honorable man!

A cleaner UI for CrunchyFrog

January 29, 2009

Today I’ve released a new version of CrunchyFrog. Besides beeing mainly a bug fix release, CrunchyFrog 0.3.2 introduces some enhancements under the hood.

The most visible is of course a cleaner user interface. The more or less experimental dock implementation was removed in favour of a – at least in my opinion – cleaner interface heavily inspired by GEdit.

Along with this changes I’ve cleaned up some requirements: Removed modules that where only used in a few places and made others optional. The latter includes the various GNOME bindings. This changes were made to provide initial support for running CrunchyFrog with other desktop environments and on different platforms.

With this release a new package archiv on launchpad opened it’s doors. If you’re an Ubuntu (or Debian) user you can use this PPA to update your CrunchyFrog installation from this repository.

Read the full announcement for more information about what has changed since the last release.

Any further enhancements like a terminal plugin to access native database command lines within the application or a somewhat improved schema browsing will go into the next major release.

Thanks to everyone for giving feedback, reporting bugs and for the good ideas during the last months!

Have fun with it and – as always – feel free to report any bugs or feature requests… ,-)

Profiling Datastore Access on App Engine

December 15, 2008

When using an ORM it’s sometimes not obvious when the datastore is accessed. For example in Rietveld the models have some custom methods and when used in templates it’s sometimes hard to guess if you’re using an db.Property already fetched from the datastore or if an attribute is a method that executes further calls to the datastore API behind the scenes.

Take this snippet:

{%if patch.no_base_file%}
  Do something
  Do something else

Looks pretty straight forward, but “no_base_file” is a method that fetches another object from the datastore and returns True or False depending on the related object. Imagine this in a forloop and you get easily a bunch of calls to the datastore without actually knowing about them…
Read more…