Monday, December 29, 2014

Do Frameworks get in the way? A tale of Python and PayPal IPN.

I was writing some basic Python3 CGI code to handle PayPal IPN posts.   PayPal docs here.   The IPN message authentication protocol has PayPal first POST a message to your URL.  After sending back a quickie 200 OK in response to the POST, you then POST back the "complete, unaltered message", with "the same fields (in the same order) as the original message".  I'm not sure this really matters, as I have seen online code that doesn't seem to worry about ordering, that claims to work.  But, just to be robust, I wanted to follow the "correct" protocol.

If PayPal were sending a GET, one could use os.environ.get('QUERY_STRING').  But, for a POST, that returns None.  The Python cgi library provides a nice, standard, "handles lots of tricky cases" mechanism to read the POST fields, using cgi.FieldStorage().  However that returns a non-sorted dictionary, where the order is not preserved.  I reported this on a Stack Overflow question, and asked how one could get the data exactly as sent.  I mean, it was in a big String coming over the wire, e.g. "foo=bar&count=3",  right?  This should be simple.  HTTP can be complex, but this part isn't very tricky.

To my surprise, nobody answered, and not many people even viewed the question.  Maybe it was poorly worded.  I think the real reason might be that programmers are too used to using a library or framework, such as cgi.FieldStorage. or Django, and don't understand what's actually going on deep underneath.   Not picking on Python programmers here, I think the same is true in most languages.

After some playing around, the answer is astoundingly simple.  The POST data is coming over the wire as a String, so just read it from stdin.

query_string = sys.stdin.read()

To POST everything back to PayPal, use this simple code.  (Should I worry more about encoding?)

formData = "cmd=_notify-validate&" + query_string
req = urllib.request.Request(PAYPAL_URL, formData.encode())
req.add_header("Content-type", "application/x-www-form-urlencoded")
response = urllib.request.urlopen(req)
status = str(response.read())
if (not status == "b'VERIFIED'"):
    #complain/abort/whatever
else:
    #continue processing

There's one drawback: now you can't get cgi.FieldStorage() to work.  When it goes to read from stdin, there's nothing left, so it returns an empty dictionary.  (the Python cgi source code is here) .  So, it you also want the convenience of a dict for other purposes, such as checking on various IDs or the price they paid, you need to create your own dict.  But that is also trivial:

multiform = urllib.parse.parse_qs(query_string)

Just like cgi.FieldStorage(), this returns a dictionary where the values are lists of Strings, since it is possible for a key to be repeated in a query, e.g.  foo=bar&foo=car.  However, in practice, this is rare, and doesn't apply for the PayPal case.  I guess you could always ask for the 0th item in the list - FieldStorage has some special methods for this.  To simplify things, I created a nice, simple, single-valued form with Strings for keys and values:

form = {}
for key in multiform.keys():
    form[key] = multiform.get(key)[0]