Rangaku getting form

I spent the past three days doing a lot of coding for my Dutch-Japanese dictionary site Rangaku (or better known as Kouyou). I am happy to see that things are finally starting to pull together now. I had intended to publish things much sooner, but I had a lot of catching up to do on the Python front. Partially thanks to my new job and my own endeavours in my spare time I am mastering Python at least on a level I feel I am reasonably comfortable with it. Of course, the release of various tools like Werkzeug, Genshi or SQLAlchemy helped me a lot as well.

Tags: , , , , ,

Comments

Python’s sys.stdout loses encoding

When you use Python with sys.stdout you might run into a problem where sys.stdout.encoding suddenly becomes None. This happens due to the fact that upon using a pipe or redirection, at least under Unix, it falls back to not knowing anything about the target. In order to work around this you can add a fallback to use locale.getpreferredencoding(). So if you use encode() on a string you can do something like:

from locale import getpreferredencoding
 
text = u"Something special"
 
print text.encode(sys.stdout.encoding or getpreferredencoding() or 'ascii', 'replace')

This is how we currently use it within Babel as well for printing the locale list.

Tags: ,

Comments

SQLAlchemy and simple WHERE clauses using AND

These posts use code from the Trac project. I’m using the question mark notation for in-place variable substitution, this is where you normally use either direct variables or an indirection mechanism.

If you have done SQL before you are familiar with the syntax such as:

SELECT name FROM auth_cookie WHERE cookie = ? AND ipnr = ?;

So, how does one do this with SQLAlchemy?

With SQLAlchemy (SA in short) you first declare a schema within Python:

auth_cookie = Table('auth_cookie', metadata,
        Column('cookie', String, primary_key=True),
        Column('name', String, primary_key=True),
        Column('ipnr', String, primary_key=True),
        Column('time', Integer))

Next you import this schema (living within Trac as trac/db/schema.py) as follows:

from trac.db.schema import auth_cookie

This allows direct manipulation using direct calls to auth_cookie. So for a SQL select we need to extend our code as follows:

from sqlalchemy import select

This allows us to build an almost equivalent statement as follows:

statement = select([auth_cookie.c.name], auth_cookie.c.cookie==?)

To add the AND clause SA has a very simple function to add into your code:

from sqlalchemy import and_, select

This allows us to extend the previous statement as such:

statement = select([auth_cookie.c.name], and_(auth_cookie.c.cookie==?, auth_cookie.c.ipnr==?)

Similarly there’s an or_() function as well, which works exactly the same.

Now the difficulty arose due to the fact this SQL query changed its WHERE-clause depending on an if/else. The regular case was the first statement we created, the other case added the cookie’s IP number into the equation. So how to deal with that?

statement = select([auth_cookie.c.name], auth_cookie.c.cookie==?)
if self.check_ip:
    statement.append_whereclause(and_(auth_cookie.c.ipnr==?))

As you can see, depending on whether or not check_ip is set it changes the statement in-place and expands the WHERE-clause with an AND for ipnr.

Tags: , , ,

Comments

FreeBSD, SQLite, FTS2 and SQLAlchemy

I was trying to use the SQLite 3.4.1 installed port with Python and SQLAlchemy and the moment I wanted to create a table within the database Python crashed.

After a bunch of debugging it turns out that enabling the FTS2 option of the port causes these crashes. The sqlite3Fts2InitHashTable() call is where it fails. I notified the port maintainer and in the mean time rebuilt without FTS2.

Tags: , , , ,

Comments

Easily amused, I guess

From Python’s PEP-3099:

Simple is better than complex. This idea extends to the parser. Restricting Python’s grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, like Perl.

Well, I found it funny at least.

Tags:

Comments

CherryPy, lighttpd and flup

For a personal project I found myself writing bits and pieces to do URL dispatch handling (using Routes) and then found myself having to write more and more specific handling cases that I just knew was probably already taken care of with some sort of framework. Now, Django and TurboGears, however great, seemed to be overkill for this since I will design some stuff from scratch since my demands are just very particular. Then I remembered CherryPy, which advocates itself as quite a bare-bones HTTP framework.

CherryPy uses a regular expression syntax for URL dispatching that people familiar with Django might recognise. It is a decent way to dispatch URLs, but once you’ve seen Routes’ maps using such regular expressions feels hackish. Well, at least to me, use whatever works for you.

There was only one issue, how on earth did I put all this in a FastCGI setup? I had previously, for my own script, used flup’s fcgi WSGIServer class to kickstart my application. This means that my Lighttpd environment configures a Python file as a FastCGI script and creates a Unix domain socket to connect through. This worked quite well, so I set out to convert my old way to see how to use CherryPy and Routes. The first hurdle I encountered was that using Routes with CherryPy is not documented well (of course, it is/was not at the time of this writing). Nowhere in that page does it mention the magic incantation to switch dispatchers from the default to, say, RoutesDispatcher. Using some Google-magic as well as discussing this with Alec Thomas (of Trac fame) I arrived at the following Python code to switch the dispatcher around:

import cherrypy
from project.controllers import *
dispatcher = cherrypy.dispatch.RoutesDispatcher()
dispatcher.connect('home', '', controller=HomeController())
config = {'/': {'request.dispatch': dispatcher}}
app = cherrypy.tree.mount(None, config=config)

Next I started to pass app to my WSGIServer class to run it. I noticed that something strange was actually happening when I checked with sockstat on my FreeBSD machine if there were any Unix domain sockets left after stopping Lighttpd. And indeed, there was a stray socket left. Now that was funny, since normally after Lighttpd has sent a termination signal (SIGTERM) to the spawned processes, they shut down and the socket gets cleaned up. What was left was something like this:

USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
www      python     955   0  stream /tmp/labs.sock-0

So the file descriptor 0 socket is still left. This is, according to Unix tradition, standard input, so in effect it is still waiting to handle data. But wait a second, we told it to shut down, but it didn’t completely. Using the top command I looked for the process id (PID) and found a line like this:

PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
955 www          9  20    0 14784K 11556K kserel 0   1:25  0.00% python

Normally when everything is still running normally it displays as:

PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
955 www         10  20    0 14784K 11556K kserel 0   1:25  0.00% python

Notice the difference in the thread (THR) column (going from 10 to 9). So apparently Lighttpd sends a SIGTERM to the process and it succeeds in killing off one thread and subsequently lets the other nine stay and wait for new requests to serve. Now, this would not be potentially bad, were it not that every stop/start cycle spawns another process with ten threads and thus wasting valuable resources. So clearly this problem had to be solved.

The current code I had in place was (partially lifted from an older Trac FCGI start script):

app = cherrypy.tree.mount(None, config=config)
cherrypy.engine.start(blocking=False)
WSGIServer(app).run()

The traceback should get printed to the browser if the WSGIServer cannot be started for whatever reason or if it raises an exception.

I finally realised that apparently it had to be CherryPy that was not shutting down as it should, especially since this worked with flup’s WSGIServer and my own code before! Furthermore, in a thread I started on the CherryPy-users group over at Google, Robert Brewer pointed me (mistakenly as he later pointed out) to a page detailing the CherryPy HTTPServer API. Even though it was not correct in this case –CherryPy is not the controller in this case– it did point out one thing a thread_pool attribute set to a default value of 10! So this really confirmed my thought that CherryPy was not getting closed down as it should.

The solution to such a problem, as with most things in life I guess, was rather an anti-climax, the code above had to changed to be like this:

app = cherrypy.tree.mount(None, config=config)
cherrypy.engine.start(blocking=False)
try:
    WSGIServer(app).run()
finally:
    cherrypy.engine.stop()

And that’s it!

Update 2007-06-02 10:57: Stripped the exceptions, they actually do not add much in this case.

Tags: , , , , ,

Comments (2)

WordPress, MySQL, UTF-8 or why some links might temporarily not work

So I found out that MySQL had defaulted to latin1_swedish_ci when I first started this weblog database. Sily me for expecting a saner default like UTF-8.

I spent the past two days converting data. The majority of the tables were no problem, but wp_posts.post_name is tied with something which causes a key error to be displayed. I worked around this problem by writing both a PHP and Python script that took the current data from the table’s column, escape as needed, URL decode it as necessary, store it, alter the table to utf8_unicode_t, and pump back the data.

The reason I first had a Python version was that I did not even think of using Python. I guess I was looking to intently at the WordPress sources and got stuck in thinking ‘PHP’. After many hours of frustrating around with PHP’s APIs I went to Python and wrote a resulting script in a fraction of the time.

When I stared to verify the data in my mysql console output I was wondering what I was missing since I saw with a SELECT post_name FROM wp_posts; only ???? instead of kanji. The question marks are normally replacement characters used when conversion went ok but with small issues. Silly me for forgetting I had not done a SET NAMES utf8;.

Now I am walking all links to see if they’re actually clickable. Seems after you edit them and save them it corrects some database entries.

Of course, it seems my slugs vary wildly. Older entries use some weird underscore based scheme, I wonder if that was a left-over from my Drupal import that I never noticed. It goes against a lot of persistent URL guidelines, but for the sake of consistency I am updating every single post just to be on the safe side. The search engines will correct over time, I just hope I won’t break too many referrers.

Tags: , , , , ,

Comments

Genshi - python templating solution

I recently had the pleasure to provide the Genshi project with the name. It comes from Japanese 原糸 which means a ‘thread for weaving’, which matched the purpose of the project well.

Tags: , ,

Comments

Snakes and rubies

Python’s Django is a wonderful framework in the same kind of way as Ruby on Rails.

I advise all of you Python coders to try it. You’ll love it.

Tags: , , ,

Comments

Finally…

Finally have a working PHP again after the PHP folks totally screwed over PHP 4.4.1’s apache2handler. Thankfully the FreeBSD port reverted this piece of code.

And I will migrate my PHP stuff to Django and such frameworks in the future. PHP’s braindead decisions by its developers has annoyed me a bit too much by now.

Tags: , , , ,

Comments