Wednesday, November 12, 2014

Into the Clouds, deploying node.js with Modulus and OpenShift

My Agility website, www.nextq.info, is up and running on modulus.io.  I like Modulus.  It's easy to use, has been reliable, and you don't need to do a ton of heavy-duty Unix-ese command line stuff.  Their web interface does most of the work, and a simple command modulus deploy will update your codebase.  The main drawback is that they charge a small fee, $15 a month.  I haven't tried any scaling yet.

So lately I've also been playing with OpenShift.  It's free for small projects, and that even includes a little scaling.  It's definitely harder, more technical, and more "UNixy" than Modulus.  You deploy using git, and many commands must be done from the command line, not the web UI.  They have a free book to get you started, Getting Started with Openshift.  After some fiddling, I got things going.

One major issue is that Modulus and OpenShift use different environment variables for important settings like the port and ip address.  So, if you want code portable across both, you will need something like this in your node code:


function setupConfig(config) {
   if (process.env.OPENSHIFT_APP_DNS) {
      config.port = process.env.OPENSHIFT_NODEJS_PORT;
      config.ipAddress = process.env.OPENSHIFT_NODEJS_IP || '127.0.0.1';
      config.mongoURI = process.env.OPENSHIFT_MONGODB_DB_URL;
      config.isOpenshift = process.env.OPENSHIFT_APP_DNS;
   }
   else if (process.env.MODULUS_IAAS) {  // modulus
      config.port = process.env.PORT;
      config.ipAddress = '0.0.0.0';  // modulus doesn't need an ip
      config.mongoURI = process.env.MONGO_URI;
      config.isModulus = process.env.MODULUS_IAAS;
   }

   // possibly more here...
   
   return config;
}

And use these values when you create the server, i.e.

app.listen(config.port, config.ipAddress, function(){
  ...
});


I have the "isXXX" fields so that you can setup specific options like shutdown hooks.

For OpenShift you must change the package.json file to point to your main class.  OpenShift defaults to server.js, where most people use app.js.  Be sure to have the following lines in package.json with the correct name of your main file.

"scripts": {
    "start": "node app.js"
  },
"main": "app.js",

Finally, on a scaled platform, OpenShift (using the haproxy load balancer"pings" your app every two seconds, quickly filling up the log file with confusing junk.  There are even three (duplicate) bugs for this: 918783923141 and 876473.  Their suggested "fix" is to run a cron job calling rhc app-tidy once in a while to clear out your logs.  This fixes the "too much space" issue, but you still have a big problem using the log file, cause all this pings make it harder to see any real problems.  If you are brave, you could edit the haproxy.cfg file as hinted at (but not fully explained) in this StackOverflow post.  I chose an alternative.

My fix is to use Express to insert some middleware before the logger.  The "pings" can be recognized since they have no x-forwarded-for header.  Real requests should have that field, and that's also the value you want in the logfile.  At least, that works for me.

First, a function to ignore these pings and not call next().  Ever the fiddler, it is wrapped in another function so that it can still show a subset of the pings - you might want to see the pings every hour or so.


function ignoreHeartbeat(except) {
   except = except || 0;
   var count = 1;
   return function(req, res, next) {
      if (req.headers["x-forwarded-for"])
         return next();      // normal processing

      if (except > 0) {
         if (--count <= 0) {
            count = except;
            return next();
         }
      }
    
      res.end();
   }   
}
Then, in your app setup code, add this before you add the logger.  e.g. (Express 3 shown)


app.use(ignoreHeartbeat(1800));         // 1800 is once an hour
...
app.use(express.logger(myFormat));

Here's is example log data, where the ignoreHeartbeat was set to 10, so the pings should appear roughly every 20 seconds.  Note how the pings have no ip address.

Wed, 12 Nov 2014 22:08:36 GMT - - GET / 200 - 2 ms
Wed, 12 Nov 2014 22:08:56 GMT - - GET / 200 - 2 ms
Wed, 12 Nov 2014 22:08:59 GMT 50.174.189.32 - GET / 200 - 10 ms
Wed, 12 Nov 2014 22:08:59 GMT 50.174.189.32 - GET /javascripts/jquery-jvectormap-1.2.2.css 200 - 17 ms
  (more "real" GETs here...)
Wed, 12 Nov 2014 22:09:17 GMT - - GET / 200 - 3 ms
Wed, 12 Nov 2014 22:09:37 GMT - - GET / 200 - 1 ms