Thursday, December 19, 2013

From Node.js back to Java, Part 1: EventEmitter and Callbacks

There were a lot of features of Node.js and JavaScript that I liked.  Why not bring them to Java?  That's what I'm working on with Nava, "Bringing good ideas from node.js into Java".  So far, I have implemented a fair amount.  The code can be compiled against Java 6, Java 7, and Android 2.2, and presumably later versions of Android.

The emit package is a pretty direct port of a node.js EventEmitter to Java, and can be used in a very similar manner as in node.js.  The main difference is that the event handler cannot be a closure, (cause we don't have closures!) it must implement a simple interface, Emit.IListener, with one method, handleEvent().  Producers of events would subclass or delegate to Emitter, much like the node class would extends or delegate to EventEmitter.  This code essentially replaces Java's EventListenerList and associated classes and interfaces.  There are several major advantages:

  1. The code does not depend on Swing, so it could be used in Android. 
  2. The "events" that get fired need not extend EventObject.  They could be Strings or voids.  However, in many cases, subclassing EventObject makes sense.
  3. Generics are used to obviate the need for a lot of boilerplate, such as declaring a bunch of Listener subclasses and their respective delegation methods addFooListener(), removeFooListener(), etc...
There are also a few disadvantages, noted in the documentation. The main one is that everything is more loosely typed, as you'd expect as this is based upon JavaScript.  Underneath, there are unchecked casts.


The callback package is a rough port of JavaScript / node.js style callbacks.  Your callback "closures" must implement the Callback interface, or extend from AbstractCallback.  Callbacks can be "chained together", either manually using setNextCallback(), or programatically with the utility method Callbacks.chainUp().

You could then "submit" the callback directly by calling the first one, but that would happen synchronously, in-line, and, other than for initial testing, probably isn't worth it.  The more "node-like" technique would be to create a CallbackExecutor (probably application-wide) and use submitCallback(), passing the first callback and it's data.  Your thread will then continue, just like in node, with no need to await the completion (good) and no knowledge of the result (sometimes annoying).

For example, if you had a Reader that read a File into a String, and a Counter that counted words, the skeleton Java code would be:

      Reader reader = new Reader();
      Counter counter = new Counter();
      Callbacks.chainUp(reader, counter );
      CallbackExecutor cex = new CallbackExecutor(2,1);
      cex.submitCallback(reader, theFile);
      // your code then moves on (or just stops...)

As opposed to actual JavaScript/Node, which would look something like:

      fs.readFile(theFile, function(err, data) {
         if (err) throw err;
         // normally the counting code would be in-line here
         // but for modularity similar to the Java
         countWords(data);
       });
       // your code then moves on (or just stops...)


There's more, but that's a start for today.  Check out the readme file, javadocs, and JUnit tests for more information.

Thursday, December 5, 2013

Javadocs on GitHub for Dummies

GitHub has a feature to allow you to upload supporting documents, such as Javadocs, into a gh-pages branch, for easy access by people browsing your project.  The problem is that it's a huge PITA to configure, especially if you are new to git.  Here's how I did it.  This is for a Windows machine and a typical java project.  YMMV.

For example, if you are working on ProjectFoo in GitHub, somewhere there will be a ProjectFoo folder on your local disk that contains a README.me, .gitignore, and folders such as src and test.  I will assume that this has been done, plus your project, in this state, already exists on GitHub as github.com/UserName/ProjectFoo.git.

You want your javadocs to go into a completely separate folder from your main project.  Otherwise you are constantly switching back and forth.  I have organized my projects as follows, splitting my "master" GIT_FOLDER into master and gh-pages subfolders:

  GIT_FOLDER   (all git projects go here)
    master
      ProjectFoo
      ProjectSomeOther
    gh-pages
      ProjectFoo
      ProjectSomeOther

You don't have to follow this arrangement.  But the important thing is to get the javadocs separated from the main project so the branches don't keep stepping on each other.

Initial Setup

cd to GIT_FOLDER/gh-pages  (or, if you have a different setup for the gh-pages branch, to the corresponding folder) and open a Command Window.

Checkout your main branch there

>git clone https://github.com/UserName/ProjectFoo.git

This will create a folder GIT_FOLDER/gh-pages/ProjectFoo.  CD there and check your branch.

>cd ProjectFoo. 

>git branch
* master

Do not create a gh-pages branch yet!  Instead, checkout a new orphan branch named gh-pages

>git checkout --orphan gh-pages
Switched to a new branch 'gh-pages'

In my experience, git still says you are on master if you go git branch, but ignore that for now, it will eventually figure things out.

A directory command (dir /A /B) should show your Java code, with src and possibly test folders, something like:

.git
.gitignore
LICENSE
README.md
src
test

All you want here are the javadocs.
  1. Delete the LICENSE file, and the src and test folders.
  2. Create (or copy) the javadocs to a folder named javadocs.
  3. Double check that .gitignore isn't doing anything too goofy
Your folder should now contain:

.git
.gitignore
javadocs
README.md

Check your git status.  It should now have the correct branch.  Your list of deleted files will vary.

>git status
# On branch gh-pages
# Changes not staged for commit:
#   (use "git add/rm ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#       deleted:    LICENSE
#       deleted:    src/com/company/package/SomeFile1.java
#       deleted:    src/com/company/package/SomeFile2.java
#       deleted:    src/com/company/package/package-info.java
#       deleted:    test/com/company/package/SomeFile1Test.java
#
# Untracked files:
#   (use "git add ..." to include in what will be committed)
#
#       javadocs/

Add javadocs to the repository.  You may see warnings about line endings.

>git add javadocs

Commit them.  You may see warnings about line endings.

>git commit -m "1st checkin of javadocs"

Now, commit everything else to delete the src and test folders

>git commit -a -m "deleted src from gh-pages"

As a final check, git status should now be clean and git branch shows the new branch:

>git status
# On branch gh-pages
nothing to commit, working directory clean

>git branch
* gh-pages
  master

Finally, push to GitHub.  

>git push origin gh-pages
Username for 'https://github.com': UserName
Password for 'https://UserName@github.com':
Counting objects: ...
... reused 0 (delta 0)
To https://github.com/UserName/project.git
 * [new branch]      gh-pages -> gh-pages

Now, you can login to GitHub with your browser, and you should see the new branch.  Assuming you want a prominent link to these javadocs, go to the main branch and add this line somewhere in the README file:

[JavaDocs are here](http://UserName.github.io/ProjectFoo/javadocs/)

For an example, see my Nava project.

Later on, how to update the JavaDocs:


  1. Generate (or copy) them into that javadocs folder
  2. Open a command window there
  3. git status to see whats going on.  You should be on the gh-pages branch!
  4. You will probably need to add some new docs:  git add .
  5. git commit -m "some comment"
  6. git status   (if you are paranoid)
  7. git push origin gh-pages
  8. git status  Everything should be clean.

Wednesday, November 20, 2013

Leaping into node.js and JavaScript, the Conculsion

In three weeks I've gone from node.js newbie to beginner  to having a module on npm.  Despite the off-beat field of endeavor (Flow Cytometry), it has dozens of downloads.  I think some are robots harvesting the net.

Things I liked:
  • WebStorm made working with GitHub simple and painless.  No more series of three commands to add files, commit locally, and push to the remote site.
  • Integration with Travis-ci was nearly painless.  The main gotcha was in pasting the cute little "the build is passing" icon back into the GitHub readme.  Capitalization matters.  If your repository is CamelCase, the links to travis-ci must be CamelCase.
  • The package.json file and conventions.  Simpler, and much less verbose, than your typical web.xml descriptor.  Use npm init to create a good template, and then it's simple to edit by hand.  I found the online docs fairly inscrutable.  Better to look at various package.jsons from other npm projects, then reread the man pages with the benefit of examples.
  • Publishing to npm was also painless.  My preference is to use a text editor, not npm commands, to edit the package.json file.  Once you have logged in for the first time to npm, just go   npm publish 
  • Using simple JavaScript "objects" { } to hold incoming options, and to return multiple values from a method.
  • JavaScript "short-circuit or" syntax to test for nulls and use defaults.  e.g.  encoding = options.encoding || 'utf8';
  • JavaScript's first class functions, combined with node.js Buffers, were superb in reading binary data from a file, handling various data sizes (16, 32, 64 bit) and endianness.
  • JavaScript's optional function arguments, most of the time.
Things that worked "o.k.":
  • All the node.js callbacks.  It is a huge mental shift.  A few times I got stuck with just too many callbacks to figure things out.  Still don't know how to read a Stream synchronously, since the basic rs.read() method doesn't work.  Rethinking or refactoring so that the callbacks were distributed across more methods helped.  In other words, instead of one function with three callbacks, consider refactoring into three functions, each with one callback.
  • Closures.  O.K., yes, they are nice.  For node.js, required.  But writing line after line ending with  function() {  and then adding all the }); at the end, and getting it right, gets old.  And verbose.  And these closures in general have no names, so they lose out as self-documenting.
  • Because it seemed cool and trendy, I used mocha for unit tests, instead of the more "JUnit-like" nodeunit.  But I couldn't run it well from WebStorm 7.0.  Lo and behold, one of the features of WebStorm 7.0.2 is improved support for Mocha!  Problem largely solved, I can run my unit tests from the IDE.  Code coverage is still an issue.
  • Not sure I'm a fan of Mocha's "literate" style.  In JUnit, I'd name my test method something like testReadFCSFile().  In Mocha, you use describe to wrap the anonymous function, e.g.   describe('Read an FCS File', function() ...)  JUnit is simpler and more concise.  OTOH, Mocha definitely encourages you to think about what the results should be.
Things that didn't work or were annoying:
  • "this" changes in a closure.  So you have to go  self = this, and remember to use self!
  • JavaScript thinking that it "O-O", but the terminology is pretty fuzzy and loose.  For example, those handy { } thingies that just hold a few snippets of data - what do you call them?  "Object" is just plain wrong, since they hold no real behavior, and "hash", (or "Map" or "Dictionary" or even "Key/Value pairs") which I greatly prefer, doesn't seem all that standardized.  Let's push for "hash" or "map"!
  • The documentation for setting up your test scripts is all Unix based, not Windows.  For example, their script suggested for mocha testing is  "test": "./node_modules/.bin/mocha"  which doesn't work at all under Windows.  I played around for an hour and found a workaround, "node node_modules/mocha/bin/mocha".  Turns out that there is a super-simple way, "test":"mocha". But I had to find this out by asking on StackOverflow.
  • I still haven't figured out how to get test coverage for mocha.  Looks like I might need to install Karma, then Istanbul, then whatever...  More to come.  I'm concerned that just as Windows has DLL Hell and Java has JAR hell, node may turn into NPM hell.
  • The documentation is still, er, "young".  Maybe as you expect, node.js is only in version 0.10.  A lot of times you have to make educated guesses, or look via the debugger, as to what events a Stream will emit.  Looking at the node source is helpful, but it is daunting for a JavaScript beginner and there are distressingly few comments in there.  Many are of the "here be Dragons" type around tricky code, not "here be the parameters".  Java and Javadocs are far superior here.
  • At least for methods, I really miss Java's "verbose" style, where you know the types of the arguments and the return value.  And the JavaDocs.  Too often in JavaScript / node you are guessing, or, once you are experienced, using your instincts, as to whether an argument is a String, a hash, or an object.  And often the answer is "all of the above".
  • A common Java practice is to have an "all-powerful" method (or constructor) with all possible parameters, and "convenience" methods that provide some default parameters.  Currying.  This proves to be very awkward in JavaScript, with ugly code like     
            [].unshift.call(arguments, moreArgs);
            allPowerful.apply(this, arguments);


Friday, November 15, 2013

Leaping into node.js and JavaScript, part deux

Well, I'm making a lot of progress since my post of two weeks ago.  Might be starting to get the hang of this JavaScript and node.js stuff.  First, a few followups from last post.


  1. I like WebStorm.  Well worth the small fee for a personal license.  Even just the way it makes Git and GitHub painless is almost worth the price.  And for a relative newbie blundering along it's great.  Especially since a lot of the node.js documentation is a bit sketchy.  Setting breakpoints and looking at fields is wonderful.  My only complaint is that sometimes it gets very very slow.  I'm using version 7.0.  There's a 7.0.1 upgrade available, not sure what it fixes.
  2. The SAMs book Teach Yourself node.js in 24 Hours has been pretty useful - much better than many of the "in 24 Hours" books.
  3. Professional Node.js: Building JavaScript Based Scalable Software is o.k., but a bit disappointing compared to most of the other WROX books I have read.
A few more resources I have found that seem useful:
  1. JavaScript the Definitive Guide is essential.  Get it.
  2. Manuel Kiessling is developing a Node Craftsman followup book.  I't still pretty early in development but might be useful. 
  3. A free download of JavaScript the Good Parts.  I'm temporarily ignoring some of his advice, but it's still a good reference.
  4. Of course, Stackoverflow, within limits.  Many of the responses are very client-HTML-ish (not node.js serverish) and not all are good.  But you can dig to find good information.
Javascript

  I'm coming from 20+ years of Java experience, and my JavaScript shows it.  I put in semicolons.  Looping over arrays I use a for-next with indices, not each().  I'd like to use for (var x in theArray)more, except that stupidly returns all the indices in the array, not all the values.  Sorry, makes no sense for an array, but since JavaScript arrays aren't real I understand whats going on.  But still been burned there several times.  And I still often mistype my loops as for (int i=0; i<....).  Where the "int" should be "var".  Of course, when I go back to programming in Java I'll surely make the opposite mistake.  :-)

  Creating a JavaScript class is fraught with way too much danger.  There are too many ways to do it, all the examples are different, and you can run into religious wars.  Frankly, I think a lot of the people writing have no clue about OOP classes.  I ended up using "classical" style, partly cause it worked well with node's CommonJS module structure to simplify the namespace and export issues, and mainly cause it felt most natural to me.  With more experience this may change.  Classical style uses the .prototype field a lot.  It is well described in JavaScript the Definitive Guide, 6th ed. in section 9.3, "Java-Style Classes in JavaScript".  Example classical style code below:

I'll talk more about working with node.js in future posts...

Tuesday, October 29, 2013

Leaping into node.js and JavaScript, part1

A close friend with tons of tech experience had been urging me to learn JavaScript.  He uses it all the time and loves it.  Last week the local library had a Sams book, Teach Yourself node.js in 24 Hours.  So I checked it out.  How hard could it be?  :-)  I have over twenty years experience in Java, and more before that in C, C++, Pascal etc.  All your classic strongly typed, O-O / procedural languages.  But I have zero experience in node.js, or anything.js, and JavaScript.  Let the adventure begin.

I always had the bias that JavaScript was a "toy language" just for browsers.  At least the second half is wrong.  The node.js guys took Google's Chrome JavaScript runtime engine out of the browser, so you can run JavaScript, for example, from the command line.  Just like a JRE lets you run Java.  Cool.  There is an npm installer to search for and download additional libraries.  It's very quick to get a toy "Hello World" HTTP server up and running.  So far, so good.

A couple of problems.

Many of the examples are of the silly "Hello World" variety.  Data is collected all well and good, and then they use console.log(theData) to output to the console.  Now, very few real-world apps are going to do that.  More importantly, doing it this way glosses over some issues with all the callbacks that are the bane, or blessing, of node.js.  (For a good time, do a Google search on "node.js cancer")  Depends on your brain.  In a real application, you want to pass this data on to something else, such as an HTTP reply, an HTTP/XML/JSON parser, etc...  And this gets tricky.  Simply logging to the console obfuscates some of the key themes of node.js control flow.

Many generic intro Javascript blogs or books have crappy code too.  All the code is glommed into a single huge file, variables are named "$", and prints to the console happen.  This is not proper software engineering!  Javascript / node.js may be the hot new nailgun that lets you build sites quickly and easily, but you still need to put studs in the wall and use electrical boxes.

I found the following site pretty useful:  How to Learn Javascript Properly, along with Learn Node.js Completely and with Confidence.  He's opinionated but I agree with a lot of them.

He suggests that you get "the absolute best editor (IDE) for programming JavaScript", JetBrains' WebStorm.  I agree completely.  It's free for 30 days, thereafter very affordable.  Don't know about you, but my time and sanity is worth way over $49.  And for a "traditional" Java programmer moving over it's great to be able to have some syntax help and a nice graphical debugger.  JavaScript is one of those loosely typed everything is an Object languages.  So many times opening up a variable to see "what the heck type is this?" and "what fields can I access?" is simply the best way to blunder along.

I had separately discovered the Leanpub online ebook, The Node Beginner Book by Manuel Kiessling, which you can buy along with Hands-on Node.js in a $9.99 bundle.  Do so.  So far I found both useful.  They actually talk a bit about how to structure your source code for a realistic application.  Not dumping everything into one huge file with $s as variable names!

Anyway, since he also recommends these books, another tip of the hat to him.  Since I like what's he's saying so far, I took his advice and ordered a more advanced book, Professional Node.js: Building JavaScript Based Scalable Software.  I'll let you know what I think once it arrives.

Finally, stealing a phrase from Bruce Eckel, a word about "Thinking in node.js".  Code flow does not flow in the normal sense.  There are callbacks.  You don't have an option.  If you can't buy into or grok the callbacks, don't use node.js.  It's gonna take me a while...  For example, here was some of my first, ignoramus code.  (It's using a popular request library to do basic http requests)

var googleBody = 'not there';

Request.get('http://www.google.com', function (error, response, body) {
    googleBody = body;
});

doSomethingWith(googleBody);  // like parsing the HTML...

This doesn't work!  You have to remember that the callback executes at some indefinite time in the future, so the googleBody passed to doSomethingWith is very likely to be 'not there'.  I really should have known this, and, when I realized it a couple of hours later after some struggles, it was a real Homer Simpson "Doh" moment.  But I'm learning.  More to come.

Tuesday, October 8, 2013

I don't get git and github

The whole add / commit / push throws me.  When I first described the system to my wife, a former techwriter for computer networking companies, she said "geez, that sounds strange and clunky, is that something from UNIX?".  Well, it is.  But I think I can adjust...

Anyway, what has me really stymied is how to publish javadocs to GitHub in a sane matter.  GitHub allows you can create a branch, hardcoded to "gh-pages", to hold docs.  But the problem is how to move up to date documents (generated manually, via ant, etc.) from some local folder into the proper branch of the GitHub repository.

I'm not the only one.  Here is a 15 step process from a question on StackOverflow.  Wow!

Here's a very similar StackOverflow question with a complex answer.  May be the best.  Involves a symbolic link between the second branch and the internal folder which is something I was considering.  Note that the question was raised two years ago and the answer hasn't been "confirmed".  And both questioner and answerer have a ton of experience (StackOverflow points).  There aren't any "oh yeah baby, that's right" comments.  So it's not like a bunch of smart people agreed that this was a wonderful solution.

Here is yet another solution.  Ant is nice but not required.  However, it does involve checking out one branch (the javadoc .html files) over the code branch, wiping out the .java code files (even though one might call this "reverting", they are gone, right?) which seems, well, bizarre.  Frankly, the whole branching / wiping out thing strikes me as bizarre   And the generated javadocs then get removed when you return to the main branch and checkout the code.  So you can't look at them anymore.

I think the seconds answer, with two folders, one per branch, with a link is best.

Question in general for git experts: do you usually keep branches in separate folders / separate clones?  Or do you throw them all together into one directory and trust git in overwriting all of your previous work?  Maybe I'm just too old fashioned and paranoid?  :-)