Tag Archives: Python

Anything related to Python programming

JSONP with Werkzeug

So I had implemented a simple JSON data server with Werkzeug for a classroom experiment. Unfortunately in my haste to get everything up and running I totally forgot about the fact that, since we cannot allow uploads to this server of various custom made webpages, using jQuery’s $.ajax() everything just fails since it will then be a cross-site scripting request.

So, normally you would do something like the following in order to return JSON data:

return json.dumps(data)

Which would be used with the $.ajax() call in a way like the following:

$.ajax({
  type: "POST",
  url: "http://example.com/json/something",
  data: "parameter=value",
  dataType: "json",
  error: function(XMLHttpRequest, textStatus, errorThrown){},
  success: function(data, msg){}
});

Which is perfectly fine for scripts getting and using the data on the same host/domain. But, as said before, this will fail with warnings similar to: "Access to restricted URI denied" code: "1012" nsresult: "0xdeadc0de (NS_ERROR_DOM_BAD_URI)".

One way out of this is using JSONP. jQuery has a $.getJSON() function, which loads JSON data using a HTTP GET request. Now, the simplistic way to convert your code would be to change it as such:

$.getJSON("http://example.com/json/something",
  function(data){}
);

But this causes another issue. Since $.getJSON() GETs the JSON data, but doesn’t use eval() on it, but instead pulls the result into script tags, it somehow causes,on Firefox at least, an invalid label error. In order to fix this you need to set up the JSON data server to properly support a callback argument, to use $.getJSON() how it is meant to be used:

$.getJSON("http://example.com/json/something?jsoncallback=?",
  function(data){}
);

In the code above the additional parameter jsoncallback will, thanks to jQuery, get the question mark replaced by an alphanumeric string (typically in the form of jsonp followed by a timestamp). This value should be used to wrap the resulting JSON data with. This means you would have to change the initial Python code to something like this:

return request.args.get('jsoncallback') + '(' + json.dumps(data) + ')'

Of course this causes problems when you want to reuse the code for both AJAX use on the same host/domain and use it from outside. So in order to make both work you can test on whether or not the callback parameter is available and return the appropriate data. I came up with this little snippet for that:

def jsonwrapper(self, request, data):
    callback = request.args.get('jsoncallback')
 
    if callback:
        return callback + '(' + json.dumps(data) + ')'
    else:
        return json.dumps(data)

Easily amused, I guess

From Python’s PEP-3099:

Simple is better than complex. This idea extends to the parser. Restricting Python’s grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, like Perl.

Well, I found it funny at least.

CherryPy, lighttpd and flup

For a personal project I found myself writing bits and pieces to do URL dispatch handling (using Routes) and then found myself having to write more and more specific handling cases that I just knew was probably already taken care of with some sort of framework. Now, Django and TurboGears, however great, seemed to be overkill for this since I will design some stuff from scratch since my demands are just very particular. Then I remembered CherryPy, which advocates itself as quite a bare-bones HTTP framework.

CherryPy uses a regular expression syntax for URL dispatching that people familiar with Django might recognise. It is a decent way to dispatch URLs, but once you’ve seen Routes’ maps using such regular expressions feels hackish. Well, at least to me, use whatever works for you.

There was only one issue, how on earth did I put all this in a FastCGI setup? I had previously, for my own script, used flup‘s fcgi WSGIServer class to kickstart my application. This means that my Lighttpd environment configures a Python file as a FastCGI script and creates a Unix domain socket to connect through. This worked quite well, so I set out to convert my old way to see how to use CherryPy and Routes. The first hurdle I encountered was that using Routes with CherryPy is not documented well (of course, it is/was not at the time of this writing). Nowhere in that page does it mention the magic incantation to switch dispatchers from the default to, say, RoutesDispatcher. Using some Google-magic as well as discussing this with Alec Thomas (of Trac fame) I arrived at the following Python code to switch the dispatcher around:

import cherrypy
from project.controllers import *
dispatcher = cherrypy.dispatch.RoutesDispatcher()
dispatcher.connect('home', '', controller=HomeController())
config = {'/': {'request.dispatch': dispatcher}}
app = cherrypy.tree.mount(None, config=config)

Next I started to pass app to my WSGIServer class to run it. I noticed that something strange was actually happening when I checked with sockstat on my FreeBSD machine if there were any Unix domain sockets left after stopping Lighttpd. And indeed, there was a stray socket left. Now that was funny, since normally after Lighttpd has sent a termination signal (SIGTERM) to the spawned processes, they shut down and the socket gets cleaned up. What was left was something like this:

USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
www      python     955   0  stream /tmp/labs.sock-0

So the file descriptor 0 socket is still left. This is, according to Unix tradition, standard input, so in effect it is still waiting to handle data. But wait a second, we told it to shut down, but it didn’t completely. Using the top command I looked for the process id (PID) and found a line like this:

PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
955 www          9  20    0 14784K 11556K kserel 0   1:25  0.00% python

Normally when everything is still running normally it displays as:

PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
955 www         10  20    0 14784K 11556K kserel 0   1:25  0.00% python

Notice the difference in the thread (THR) column (going from 10 to 9). So apparently Lighttpd sends a SIGTERM to the process and it succeeds in killing off one thread and subsequently lets the other nine stay and wait for new requests to serve. Now, this would not be potentially bad, were it not that every stop/start cycle spawns another process with ten threads and thus wasting valuable resources. So clearly this problem had to be solved.

The current code I had in place was (partially lifted from an older Trac FCGI start script):

app = cherrypy.tree.mount(None, config=config)
cherrypy.engine.start(blocking=False)
WSGIServer(app).run()

The traceback should get printed to the browser if the WSGIServer cannot be started for whatever reason or if it raises an exception.

I finally realised that apparently it had to be CherryPy that was not shutting down as it should, especially since this worked with flup’s WSGIServer and my own code before! Furthermore, in a thread I started on the CherryPy-users group over at Google, Robert Brewer pointed me (mistakenly as he later pointed out) to a page detailing the CherryPy HTTPServer API. Even though it was not correct in this case –CherryPy is not the controller in this case– it did point out one thing a thread_pool attribute set to a default value of 10! So this really confirmed my thought that CherryPy was not getting closed down as it should.

The solution to such a problem, as with most things in life I guess, was rather an anti-climax, the code above had to changed to be like this:

app = cherrypy.tree.mount(None, config=config)
cherrypy.engine.start(blocking=False)
try:
    WSGIServer(app).run()
finally:
    cherrypy.engine.stop()

And that’s it!

Update 2007-06-02 10:57: Stripped the exceptions, they actually do not add much in this case.

WordPress, MySQL, UTF-8 or why some links might temporarily not work

So I found out that MySQL had defaulted to latin1_swedish_ci when I first started this weblog database. Sily me for expecting a saner default like UTF-8.

I spent the past two days converting data. The majority of the tables were no problem, but wp_posts.post_name is tied with something which causes a key error to be displayed. I worked around this problem by writing both a PHP and Python script that took the current data from the table’s column, escape as needed, URL decode it as necessary, store it, alter the table to utf8_unicode_t, and pump back the data.

The reason I first had a Python version was that I did not even think of using Python. I guess I was looking to intently at the WordPress sources and got stuck in thinking ‘PHP’. After many hours of frustrating around with PHP’s APIs I went to Python and wrote a resulting script in a fraction of the time.

When I stared to verify the data in my mysql console output I was wondering what I was missing since I saw with a SELECT post_name FROM wp_posts; only ???? instead of kanji. The question marks are normally replacement characters used when conversion went ok but with small issues. Silly me for forgetting I had not done a SET NAMES utf8;.

Now I am walking all links to see if they’re actually clickable. Seems after you edit them and save them it corrects some database entries.

Of course, it seems my slugs vary wildly. Older entries use some weird underscore based scheme, I wonder if that was a left-over from my Drupal import that I never noticed. It goes against a lot of persistent URL guidelines, but for the sake of consistency I am updating every single post just to be on the safe side. The search engines will correct over time, I just hope I won’t break too many referrers.