The Flying Spaniel Software Design Blog

Thursday, March 24, 2016

Learning Google Maps

Inspired by the upcoming Race to Alaska, I put together a web page that integrates the nice JSON weather reports from the OpenWeatherMap API, plus some XML format reports from NOAA buoys. These get drawn on a map with wind barbs. You can scroll around to see various areas on the map, and click on a station to get more details.

It tracks the recent reports and attempts to compute an hourly rate of change for some statistics. Depending on the time between reports this can be an inexact science. But it helps to see if wind, pressure or temperature is changing rapidly.

The back end code is written in JavaScript using Node.js. The front end is HTML and JavaScript and the Google Maps Javascript API.

Monday, December 29, 2014

Do Frameworks get in the way? A tale of Python and PayPal IPN.

I was writing some basic Python3 CGI code to handle PayPal IPN posts. PayPal docs here. The IPN message authentication protocol has PayPal first POST a message to your URL. After sending back a quickie 200 OK in response to the POST, you then POST back the "complete, unaltered message", with "the same fields (in the same order) as the original message". I'm not sure this really matters, as I have seen online code that doesn't seem to worry about ordering, that claims to work. But, just to be robust, I wanted to follow the "correct" protocol.

If PayPal were sending a GET, one could use os.environ.get('QUERY_STRING'). But, for a POST, that returns None. The Python cgi library provides a nice, standard, "handles lots of tricky cases" mechanism to read the POST fields, using cgi.FieldStorage(). However that returns a non-sorted dictionary, where the order is not preserved. I reported this on a Stack Overflow question, and asked how one could get the data exactly as sent. I mean, it was in a big String coming over the wire, e.g. "foo=bar&count=3", right? This should be simple. HTTP can be complex, but this part isn't very tricky.

To my surprise, nobody answered, and not many people even viewed the question. Maybe it was poorly worded. I think the real reason might be that programmers are too used to using a library or framework, such as cgi.FieldStorage. or Django, and don't understand what's actually going on deep underneath. Not picking on Python programmers here, I think the same is true in most languages.

After some playing around, the answer is astoundingly simple. The POST data is coming over the wire as a String, so just read it from stdin.

query_string = sys.stdin.read()

To POST everything back to PayPal, use this simple code. (Should I worry more about encoding?)

formData = "cmd=_notify-validate&" + query_string
req = urllib.request.Request(PAYPAL_URL, formData.encode())
req.add_header("Content-type", "application/x-www-form-urlencoded")
response = urllib.request.urlopen(req)
status = str(response.read())
if (not status == "b'VERIFIED'"):
#complain/abort/whatever
else:
#continue processing

There's one drawback: now you can't get cgi.FieldStorage() to work. When it goes to read from stdin, there's nothing left, so it returns an empty dictionary. (the Python cgi source code is here) . So, it you also want the convenience of a dict for other purposes, such as checking on various IDs or the price they paid, you need to create your own dict. But that is also trivial:

multiform = urllib.parse.parse_qs(query_string)

Just like cgi.FieldStorage(), this returns a dictionary where the values are lists of Strings, since it is possible for a key to be repeated in a query, e.g. foo=bar&foo=car. However, in practice, this is rare, and doesn't apply for the PayPal case. I guess you could always ask for the 0th item in the list - FieldStorage has some special methods for this. To simplify things, I created a nice, simple, single-valued form with Strings for keys and values:

form = {}
for key in multiform.keys():
form[key] = multiform.get(key)[0]

Wednesday, November 12, 2014

Into the Clouds, deploying node.js with Modulus and OpenShift

My Agility website, www.nextq.info, is up and running on modulus.io. I like Modulus. It's easy to use, has been reliable, and you don't need to do a ton of heavy-duty Unix-ese command line stuff. Their web interface does most of the work, and a simple command modulus deploy will update your codebase. The main drawback is that they charge a small fee, $15 a month. I haven't tried any scaling yet.

So lately I've also been playing with OpenShift. It's free for small projects, and that even includes a little scaling. It's definitely harder, more technical, and more "UNixy" than Modulus. You deploy using git, and many commands must be done from the command line, not the web UI. They have a free book to get you started, Getting Started with Openshift. After some fiddling, I got things going.

One major issue is that Modulus and OpenShift use different environment variables for important settings like the port and ip address. So, if you want code portable across both, you will need something like this in your node code:


function setupConfig(config) {
   if (process.env.OPENSHIFT_APP_DNS) {
      config.port = process.env.OPENSHIFT_NODEJS_PORT;
      config.ipAddress = process.env.OPENSHIFT_NODEJS_IP || '127.0.0.1';
      config.mongoURI = process.env.OPENSHIFT_MONGODB_DB_URL;
      config.isOpenshift = process.env.OPENSHIFT_APP_DNS;
   }
   else if (process.env.MODULUS_IAAS) {  // modulus
      config.port = process.env.PORT;
      config.ipAddress = '0.0.0.0';  // modulus doesn't need an ip
      config.mongoURI = process.env.MONGO_URI;
      config.isModulus = process.env.MODULUS_IAAS;
   }

   // possibly more here...
   
   return config;
}

And use these values when you create the server, i.e.

app.listen(config.port, config.ipAddress, function(){
...
});

I have the "isXXX" fields so that you can setup specific options like shutdown hooks.

For OpenShift you must change the package.json file to point to your main class. OpenShift defaults to server.js, where most people use app.js. Be sure to have the following lines in package.json with the correct name of your main file.

"scripts": {
"start": "node app.js"
},
"main": "app.js",

Finally, on a scaled platform, OpenShift (using the haproxy load balancer) "pings" your app every two seconds, quickly filling up the log file with confusing junk. There are even three (duplicate) bugs for this: 918783, 923141 and 876473. Their suggested "fix" is to run a cron job calling rhc app-tidy once in a while to clear out your logs. This fixes the "too much space" issue, but you still have a big problem using the log file, cause all this pings make it harder to see any real problems. If you are brave, you could edit the haproxy.cfg file as hinted at (but not fully explained) in this StackOverflow post. I chose an alternative.

My fix is to use Express to insert some middleware before the logger. The "pings" can be recognized since they have no x-forwarded-for header. Real requests should have that field, and that's also the value you want in the logfile. At least, that works for me.

First, a function to ignore these pings and not call next(). Ever the fiddler, it is wrapped in another function so that it can still show a subset of the pings - you might want to see the pings every hour or so.


function ignoreHeartbeat(except) {
   except = except || 0;
   var count = 1;
   return function(req, res, next) {
      if (req.headers["x-forwarded-for"])
         return next();      // normal processing

      if (except > 0) {
         if (--count <= 0) {
            count = except;
            return next();
         }
      }
    
      res.end();
   }   
}

Then, in your app setup code, add this before you add the logger. e.g. (Express 3 shown)


app.use(ignoreHeartbeat(1800));         // 1800 is once an hour
...
app.use(express.logger(myFormat));

Here's is example log data, where the ignoreHeartbeat was set to 10, so the pings should appear roughly every 20 seconds. Note how the pings have no ip address.

Wed, 12 Nov 2014 22:08:36 GMT - - GET / 200 - 2 ms
Wed, 12 Nov 2014 22:08:56 GMT - - GET / 200 - 2 ms
Wed, 12 Nov 2014 22:08:59 GMT 50.174.189.32 - GET / 200 - 10 ms
Wed, 12 Nov 2014 22:08:59 GMT 50.174.189.32 - GET /javascripts/jquery-jvectormap-1.2.2.css 200 - 17 ms
(more "real" GETs here...)
Wed, 12 Nov 2014 22:09:17 GMT - - GET / 200 - 3 ms
Wed, 12 Nov 2014 22:09:37 GMT - - GET / 200 - 1 ms

Monday, October 20, 2014

Web Scraping with node.js and Cheerio

I recently gave a talk at the BayNode Meetup, about my experiences web scraping for dog agility trials using node.js and the cheerio module. The results are used for my website, www.nextq.info.

You can find the slides as Google Docs here: Web scraping with cheerio. Enjoy!

Wednesday, July 23, 2014

Groovy-Like XML for Java. Simple and Sane.

Parsing and navigating through XML in Java is a pain. The org.w3c.dom.* classes are numerous, messy, and "old style", with no Collections, no Generics, no varargs. XPath helps a lot with the navigation part, but is still a bit complex and messy.

Groovy, with XMLParser and XMLSlurper and their associated classes, makes this amazingly, dramatically easier. Simple and Sane. For example, Making Java Groovy Chapter 2 has an example to parse the Google geocoder XML data to retrieve latitude and longitude. Below is the essentials of the code. The full code, which is not much longer, is on GitHub here.


String url = 'http://maps.google.com/maps/api/geocode/xml?' + somemore...
def response = new XmlSlurper().parse(url)
stadium.latitude = response.result[0].geometry.location.lat.toDouble()
stadium.longitude = response.result[0].geometry.location.lng.toDouble()

The parsing is trivial, and navigating to the data (location.lat or location.lng) is also simple, following the familiar dot notation.

Can you do something anything like this in pure Java? Not quite. So I wrote a small library, xen, to mimic much of how Groovy does things. The full Geocoder.java code is here, snippet below:


String url = BASE + URLEncoder.encode(address);
Xen response = new XenParser().parse(url);

Option 1: XPath slash style, 1 based indices
latLng[0] = response.toDouble("result[1]/geometry/location/lat");
latLng[1] = response.one("result[1]/geometry/location/lng").toDouble();

Option 2: Groovy dot style, 0 based indices
latLng[0] = response.toDouble(".result[0].geometry.location.lat");
latLng[1] = response.one(".result[0].geometry.location.lng").toDouble();

Pretty close, eh?

The main difference is that we can't use the dot notation directly from an object, but we can use a very similar slash notation based upon XPath syntax. If you use XPath notation, one major difference from Groovy is that array indices in W3C XPath are 1-based, not 0-based. Therefore note that we access the 1st element of result, not the 0th. However, if the "path" starts with a . and a letter, as in the final example, the path is treated as a Groovy / "dot notation" style, with 0-based indices.

So, if you want to greatly simplify parsing and navigating through XML, and/or you love how Groovy does things, please check out my (very beta!) xen library which allows you to do it in Java. Currently it is compiled vs. Java 6 but I think it should be fine in Java 5. So if you need to support some Android device, or can't or don't want to integrate Groovy into your Java projects, this could be very useful.

Xen library
JavaDocs
README

The README discusses various design decisions, particularly, how my design converged upon many aspects of the Groovy design. More discussion will appear in later posts. And, be warned, this is still a very early version, 0.0.2, so there are probably bugs, some mistakes, and upcoming API changes.

Node for Java Programmers

At a recent BayNode Meetup, I gave a 15 minute presentation on "Node for Java Programmers". Mainly notes on common things I did wrong coming from the Java world, and ideas or idioms to deal with them.

I got some good feedback and positive responses, and recently edited the presentation.

Here is a link to it. (on Google Docs).

Thursday, June 12, 2014

Coding by Convention is Great

... except when it isn't.

"Coding by Convention" (a.k.a. "Convention over Configuration") attempts to simplify programming by telling the programmer the preferred way to name or organize things. It often saves a lot of time and hassle. Without it, you write extra configuration files, typically in XML. Spring and J2EE used to require way too much configuration, with lots of stupid redundancies, something like "When I say Foo bean I mean you to use a Foo.class, when they go to myCompany.com/order/books use the com.myCompany.order.books servlet". On the other hand there can be too much convention - I've never used Maven, but hear that it is particularly dictatorial and hard to modify.

I'm developing a lot in JavaScript / node.js lately, and wanted the ability to save my data as either csv files or iCal (.ics) files. Searching the NPM registry finds several candidates.

In csv files, the header line contains the name for each column. If using convention, this would be taken from the property key. And the property would map directly to the data in the following lines. Sometimes you would want to change this. Can you, or is it all done by convention? As I understand their documentation:

to-csv convention only
fast-csv allows for transformation, but over an entire row
json-2-csv convention only
json-csv allows for flexible transformations

I ended up using json-csv, though one drawback of it's power is that it takes more work to use.

On the iCalendar side, the question is how to setup or create the complex VEVENT information. One could use properties named DTSTART, UID, etc. But it's extremely unlikely that your object has properties with those unusual and capitalized names, and the correct values. Plus, DTSTART has a complex format with a possible "DATE-TIME" option and a time format of YYYYMMDDTHHMMSSZ.

cozy-iCal builds VEvent objects programmatically
ical-generator convention (uses .start .end for DTSTART DTEND)
icalevent convention (also uses .start, .end etc)
icsjs builds programmatically
icalendar builds VEvent objects programmatically

So, it's a mishmash. And convention, while simple and convenient, doesn't always do the trick. For example, in my CSV data I'd like to include the distance from the user's location. This is obviously not even a field in the data, since it is calculated on the fly per user from the respective latitudes and longitudes. The CSV should not include the latitude and longitude - not very useful to the end user. My start and end dates are also not fields, they are stored in an array. So I definitely can't use convention.

Unless...

The obvious work-around is to create a new temporary object, that meets the required convention, from the fields and data in the original, "real" object. In many cases, you would just wrote custom code to do this, especially if speed is a concern. There are also some modules that vaguely do this. (Did I miss some???) But they are pretty limited. For example object-adapter can only copy values from a source object (renaming the fields), not apply any functions.

So, I wrote my own general-purpose module, remodeler. For convenience there are copyKeys() and excludeKeys() methods for properties you simply want to copy as-is or ignore. For the fancy stuff, you provide key / value pairs. The key is the new property name, and the value is a "transformation".

If the transformation is a String, it means to copy the value from the old object, using the string as property name. For example, "UID", "uid" would mean to create a UID property by copying the previous uid property.

If the transformation is a function, it will be called via function(oldObject, key) and the result used as the new value. In practice, the key argument is often ignored. For example,

"SUMMARY", function(o,k) { return 'name:' + o.name + ' date:' + o.date[0]; }

would mean to create a new SUMMARY property by concatenating two existing values.

In many cases, you are still better off writing your own custom code. On further thought, I'm not sure how useful my module will be, since in JavaScript, it is just so easy to go

var newthing = {
newkey: oldObject.oldKey,
...
}

Other times you can follow the conventions, or use the programmatic interface. However, if you want a quick way to "remodel" your domain object, this module might meet your needs. Let me know what you think. I think I'm going to try using this with ical-generator.

The Flying Spaniel Software Design Blog

Thursday, March 24, 2016

Learning Google Maps

Monday, December 29, 2014

Do Frameworks get in the way? A tale of Python and PayPal IPN.

Wednesday, November 12, 2014

Into the Clouds, deploying node.js with Modulus and OpenShift

Monday, October 20, 2014

Web Scraping with node.js and Cheerio

Wednesday, July 23, 2014

Groovy-Like XML for Java. Simple and Sane.

Node for Java Programmers

Thursday, June 12, 2014

Coding by Convention is Great

Search This Blog

Stuff I Follow

Inspirational Links

Blog Archive

Tags

Visitors to this Blog

Total Pageviews

LinkedIn

Followers