Thursday, September 6, 2012

ExecutorService, Part 2, the "Unhappy Path"

In a previous post, I explained how to parallelize your algorithms using an ExecutorService.  Here we consider the unhappy path when your algorithms throw Exceptions, or you wish to stop their execution.  This requires adding a few lines of code (a try - catch block) to the calculateInParallel( ) method.



 try {
List<Future<CallableCalculation>> results = myService.invokeAll(tasks);
   for (Future<CallableCalculation> future : results) {
      CallableCalculation cc = future.get();       
      resultMap.put(cc.year, cc.getPercentileResults());
   } 
   }
   catch (ExecutionException ee) {
      // what to do?
   }
   catch (InterruptedException ie) {
      // what to do???
   }
But "what to do"? You probably want to shut down the ExecutorService and return some status for failure or interruption. To handle this, plus some of the other "utility" methods like calculating a "good" number of threads to use, I wrote a class CallablesCaller, which handles this common utility work, (in case of failure it returns null for the List of results, and also has a wasInterrupted() method), while preserving the key idea of submitting a list of Callables to an ExecutorService. The source code (two files, you also need ConcurrentUtil) is available here, and the unit test is here.

Enjoy, and I hope you find this useful in your efforts.  Note that CallablesCaller does not do anything fancy to shutdown immediately if any of the Callables fail.  This is because it uses the simple, basic invokeAll() method.  If you wanted to stop ASAP on a failure, you'd want to add on an ExecutorCompletionService so that you could poll the results as they arrive and handle failures immediately.  I didn't add this complication because the algorithms I am using are simple statistics that hardly ever fail, and, even if one does fail, the other results are still useful.

Monday, July 2, 2012

Java Event Handling, Revisited

  Quite a while ago I blogged about replacement code for Java's javax.swing.event.EventListenerList, which never appealed to me.  (Blog posts are Here, here, and here)  The EventListenerList code, is, well, icky.  You jump by 2 along an array of mixed Classes and Listeners, and non-reassuring comments in the source code like "it provides ... a degree of MT (multi-threaded) safety (when used correctly)."  It was written before the newfangled java.util.concurrent code, such as the CopyOnWrite collections, which were written by experts and are perfect for this task. 

  Also, to use EventListenerList, you are required to define a little Listener class for each type of event to be fired, which is repetitive boilerplate work.  Now that Java has Generics this is less necessary.  EventListenerList will only handle events that implement the marker interface EventListener.  Which isn't horrible, but, if you want a more general purpose class to "fire" something else, say a String, to listeners, it won't do the job.  Finally,  EventListenerList depends on Swing. If your project targets Android, you don't have Swing.  So, if you like (or are just plain used to) the concept of a decoupled event listener, but are looking for something a bit easier to use and with broader application, this code might be for you.

  I have slightly updated and reorganized the code, which is available on Assembla.  The source code is here, and the JUnit tests here.

  Events.java defines three inner interfaces.  In the future, I may add some utility methods here to manipulate events.

  1. Events.Listener defines a listener, generically, based upon the type of event it listens for.  No need to create an actually boilerplate class for each different listener.
  2. Events.Broadcaster is a convenience interface for any class that fires a single type of event.  You don't need to define and implement it (Java never defined such as interface) but it could be useful.
  3. Events.Broadcasters is a convenience interface for any class that fires multiple types of events.
Broadcaster is an implementation of Events.Broadcaster to fire a single type of event.  In the common case where there are 0 or 1 listeners, it just uses a direct reference, aListener.  When there are multiple listeners, it uses a CopyOnWriteArraySet.  Note that this avoids the common practice of adding the same listener twice, which, in my experience, is almost always a bug.

Broadcasters is an implementation of Events.Broadcasters.  It can fire multiple types of event, using a HashMap to redirect them by the specific class of the event.  Note that it uses == logic, not instanceof.  If you want to subclass an event (in my experience, extremely rare - look at the built in Java events) you'll have to do something clever.


  One drawback of this concept is that, due to erasure, a class cannot listen to two types of events.  That is, you cannot go

public class FooBarListener implements Events.Listener, Events.Listener 

One option is to implement Double Dispatch, as illustrated in DoubleDispatchUnitTest.  I find double-dispatch confusing, so the simpler alternative is to define your separate listeners in inner classes, e.g.

public class FooBarListener {

   private Events.Listener fooListener = new Events.Listener() {
       public void handleEvent(Foo fooEvent) {
         // do stuff here
      }
   };

   private Events.Listener barListener = new Events.Listener() {
       public void handleEvent(Bar barEvent) {
         // do stuff here
      }
   };

}





Friday, June 29, 2012

From java.awt.print.Pageable to PDF, Revisited

In an earlier post, I presented code to "print" to a PDF file from a Pageable.  Since then I have added the ability to concatenate multiple Pageables into a single PDF file, with the option for bookmarks.  The previous code was modified and some things renamed.

A small interface, PDFBookmarker, was added.
The previous class, PDFPrinter, was renamed PDFStream (mainly to keep it different).  The new class takes an OutputStream in the constructor, since it may be shared across multiple prints.
The key method, which was called printToPdf(), is renamed to the clearer appendToPDF().
A few new methods, such as newPage() and close(), were added, so you can control output in between prints, and at the end.

The two java files are available via Assembla, at  http://www.assembla.com/code/hastur/subversion/nodes/trunk/Hastur_J/src/com/flyingspaniel/pdf

This code requires the wonderful iText library, either 2.1.7 or 5.  Enjoy!

Friday, June 15, 2012

How to Parallelize your Algorithm with an ExecutorService. Part 1, the "Happy Path"

Let's say you have a long running algorithm to calculate stuff, for example, the median and 99th 
percentile (and other percentiles) of IRS income data.  This is slow because calculating medians and percentiles typically requires sorting data, and there's a lot of IRS data.  Your method looks like:


  public static PercentileResults calcPercentileResults (IRSData irsData, int year, Options options) {
     BigObjectLotsOfData = irsData.getData(year, options);
     // big calculation here
     return percentileResults;
  }
You always need a year.  Options represents optional complex things such as "only Married filing
jointly", "only ages 55-65", etc.  PercentileResults is a Business Object with the results.  Currently, the code calls this in serial and puts results into a Map:

  for (int year = 2000; year <= 2011; year++) {
     PercentileResults result = calcMedianAnd99(irsData, year, options);
     resultsMap.put(year, result);
  }
Obviously calculations for different years (and Options) can be done totally independently in parallel. An ExecutorService helps manage this for you. Most of the ExecutorService methods want a Callable, so the first step is to convert your class into a callable. Here's a first pass:

 public class CallableCalculation implements Callable<PercentileResults> {

   public final IRSData irsData;
   public final int year;
   public final Options options;
   
   public CallableCalculation(IRSData irsData, int year, Options options) {
      this.irsData = irsData;
      this.year = year;
      this.options = options;
   }

   @Override
   public PercentileResults call() throws Exception {
      
      BigObjectLotsOfData bolod = irsData.getData(year, options);
      PercentileResults result = new PercentileResults();
      // big calculation here that sets stuff in PercentileResults
      bolod = null; // important to free this memory
      // placeholder for Option 1 and 2 (see below)
      return result;
   }
   
 }

For simplicity, I made all the values public final so that they could be accessed later. If you don't like this style, make them private and add accessors as desired. More on this later... The simple way to call this in parallel would be:


 public Map<Integer, PercentileResults> calculateInParallel(Options options) throws Exception {
   IRSData irsData = IRSData.getInstance();
   HashMap<Integer, PercentileResults> resultMap = new HashMap<Integer, PercentileResults>();
      
   ArrayList<CallableCalculation> tasks = new ArrayList<CallableCalculation>();
      
   for (int year = 2000; year <= 2011; year++) {
      CallableCalculation cc = new CallableCalculation(irsData, year, options);
      tasks.add(cc);
   }
   int processors = Runtime.getRuntime().availableProcessors();
    //might want to adjust that number some...
   ExecutorService myService = Executors.newFixedThreadPool(processors);
      
   // oops - there's a problem coming up
   List<Future<PercentileResults>> results = myService.invokeAll(tasks);
   for (Future<PercentileResults> future : results) {
      PercentileResults pr = future.get();
      
      resultMap.put(year, pr);  // oops, what's the year for that Future???
   }      
      
   return resultMap;      
 }


Now, there's one "gotcha" so far. The input parameter year has gotten separated from the results.
There's a few options.

1) If the final location is really really clear and obvious, the algorithm itself could put the results
there. In this example, before returning the result (see comment "placeholder"), just put the results
into the Map. This is a simple solution but not very robust to changes in requirements.

2) Add the year as a new field to PercentileResults, and set it (at placeholder spot). This is robust.
But tedious and violates DRY. What if there are lots of settings you want to remember? Like all the
Options? And maybe you don't want to clutter your XXXResults with the input settings.  Or you can't - it's taken from some third party library.

3) The option I like best it to return the CallableCalculation! It already holds all the settings. You just need to add a field for the results, and relevant accessors. So you aren't doing much extra work. Your class would look like this (changes noted by "NEW")


public class CallableCalculation implements Callable<CallableCalculation> {

   public final IRSData irsData;
   public final int year;
   public final Options options;
   
   PercentileResults results;  // NEW
   
   public CallableCalculation(IRSData irsData, int year, Options options) {
      this.irsData = irsData;
      this.year = year;
      this.options = options;
   }

   public PercentileResults getPercentileResults () { return results; }  // NEW

   @Override
   public CallableCalculation call() throws Exception {
      
      BigObjectLotsOfData bolod = irsData.getData(year, options);
      results = new PercentileResults();
      // big calculation here that sets stuff in PercentileResults
      bolod = null; // important to free this memory
      return this; // NEW
   }
   
}
and the calling method, in place of the "oops - there's a problem coming up" section has:


List<Future<CallableCalculation>> results = myService.invokeAll(tasks);
   for (Future<CallableCalculation> future : results) {
      CallableCalculation cc = future.get();       
      resultMap.put(cc.year, cc.getPercentileResults());
   } 
In a future post we will consider the unhappy path with errors and Exceptions.

Sunday, April 29, 2012

From java.awt.print.Pageable to PDF

My current project has implemented multi-page printing, with headers and footers and all that.  It's complex.  We use some of the ideas from Stanislav Lapitsky and his series of articles about a PaginationPrinter.  If a user wants the printouts not on paper, but as PDF files, we suggest that they use a free utility like CutePDF.  (there are alternatives)

However, CutePDF is Windows only, and it's a couple of extra steps as they have to remember to select the correct "printer", then enter a filename.  Also, there may be a future need to append output from a collection of data into a single output PDF file.

So, a cross-platform, java-centric way to "print" our results by creating a PDF file would be a good option.  Fortunately, iText offers everything we need, in fact, way more than we need, and you can get an excellent book iText in Action.

Some Googling for help was tricky, cause, if you Google for something like "java print PDF" you get lots of links on how to open and print an existing PDF document.  We want to create a PDF document.  More searching led to an excellent blogpost by Gert-Jan Schoeten, "From java.awt.print.Printable to PDF".  You would not do too poorly by skipping the rest of my blog and just going to his.  However, by using more information from Pageable and Printable you can slightly simplify his code.  The way I set things up and return the results is, IMO, slightly preferable, but still largely a matter of style.  I am largely glossing over the whole FontMapper issue as well, since my project uses only basic fonts.

Here's the code:


package com.flyingspaniel.pdf;

import java.awt.Graphics2D;
import java.awt.print.PageFormat;
import java.awt.print.Pageable;
import java.awt.print.Printable;
import java.awt.print.PrinterException;
import java.io.IOException;
import java.io.OutputStream;


// if you change to iText v5 these imports change to com.itextpdf...

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Rectangle;
import com.lowagie.text.pdf.DefaultFontMapper;
import com.lowagie.text.pdf.FontMapper;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfWriter;

/**
 * With this class, you can print a {@link Pageable} to a pdf
 *
 * @author Morgan Conrad
 *
 * Based upon earlier work by
 * @author G.J. Schouten
 * @see http://www.zenbi.co.uk/2011/09/04/printable-to-pdf/
 *
 * Using the wonderful iText library.  This version is built for version 2.1.7 but
 * it will work, with changes to the import statements, with iText 5.2.1
 * @author Bruno Lowagie
 * @see http://itextpdf.com/
 */

public class PDFPrinter {

   protected final FontMapper fontMapper;

   static DefaultFontMapper sDefaultFontMapper = null;
   static String sFontDirectory = "C:/windows/fonts";

   /**
    * Constructor
    * @param fontMapper if null, uses the DefaultFontMapper
   */
  
    public PDFPrinter(FontMapper fontMapper) {
      this.fontMapper = fontMapper != null ? fontMapper : getDefaultFontMapper(sFontDirectory);
    }

   /**
    * Default constructor, uses the DefaultFontMapper
    */

   public PDFPrinter() {
     this(getDefaultFontMapper(sFontDirectory));
   }

   /**
    * Change the directory for fonts. (OS dependent)
    * Generally, one should call this *before* ever instantiating a PDFPrinter
    * 
    * @param fontDirectory
    */

    public static synchronized void setFontDirectory(String fontDirectory) {
       sFontDirectory = fontDirectory;
       sDefaultFontMapper = null;  // force getDefaultFontMapper to recalculate...
   }


   private static synchronized FontMapper getDefaultFontMapper(String fontDirectory) {
      if (sDefaultFontMapper == null) {
         sDefaultFontMapper = new DefaultFontMapper();
         if (sFontDirectory != null)
            sDefaultFontMapper.insertDirectory(sFontDirectory);
       }

       return sDefaultFontMapper;

   }

   /**
    * Creates a PDF from a Pageable
    * @param pageable
    * @param os
    * @param closeStream whether to close the stream.
    *                    Since we don't create the stream, best practice is to leave this false
    * @return number of pages actually printed
    *
    * @throws IOException
    * @throws PrinterException
   */

   public int printToPdf(Pageable pageable, OutputStream os, boolean closeStream) throws IOException, PrinterException {

      // sanity check
      if (pageable.getNumberOfPages() == 0)
         return 0;

      // base page sizes on the first page
      PageFormat pageFormat = pageable.getPageFormat(0);
      float width = (float)pageFormat.getWidth();
      float height = (float)pageFormat.getHeight();
      Rectangle pageRect = new Rectangle(0.0f, 0.0f, width, height);

      Document document = new Document(pageRect);
      PdfWriter writer;

      try {
         writer = PdfWriter.getInstance(document, os);
      } catch (DocumentException e) { // don't throw as an iText exception so other classes don't need links to iText classes
         throw new RuntimeException(e);
      }

      writer.setCloseStream(closeStream);
      document.open();

      PdfContentByte contentByte = writer.getDirectContent();

      int pageIdx = 0;
      int pageStatus;

      do {

         if (pageIdx > 0)
            document.newPage();

         // The following is deprecated in iText5 but works

         Graphics2D g2d = contentByte.createGraphics(width, height, fontMapper);
        // if you are using iText5 and want to be "up to date", use
        // Graphics2D g2d = new PdfGraphics2D(contentByte,  width, height, fontMapper);

         Printable printable = pageable.getPrintable(pageIdx);
         try {
            pageStatus = printable.print(g2d, pageFormat, pageIdx);
         } finally {
            g2d.dispose();// iText in Action book says this is very important, so put in a finally clause...
         }
      }
      while ((pageStatus == Printable.PAGE_EXISTS) && (++pageIdx < pageable.getNumberOfPages()));

      document.close();
      writer.close();
      os.flush();

      return pageIdx;

   }

}


Hope this helps out some of you who wish an easy way to create PDF files from Swing components.

Note that this code as written works in iText 2.1.7, but, with just changes to the imports it also works with iText 5.2.1.  The version 5 is obviously more "up to date" but the licensing has changed.  I don't blame Bruno Lowagie for trying to make some money from his great efforts.  Here's a largely civil discussion on the matter.


Wednesday, January 18, 2012

A Brief Detour into JavaScript, XHTML, and HTML5

We will return to Ubuntu and Hadoop later...

My wife helps maintain a website for a volunteer organization, and I am powerless to help much because, despite years of programming, I have essentially zero knowledge of JavaScript and XHTML.  Sure, I've read some JavaScript and XHTML, but all writing has been like my SQL "writing": take some existing code, make an extremely minor change, and hope for the best.  What I saw of XHTML just felt "strange".  Not as bad as early editions of J2EE, with Factories, FactoryFactories, and FactoryFactoryStrategies, (o.k., I'm kidding some) but pretty bad.

But, it would be useful to actually know a bit more.  Also, I want to learn HTML5, so it seemed like a good time to learn some basics of JavaScript and XHTML.  So, to get started, I visited the local library and checked out JavaScript and AJAX for Dummies by Andy Harris. I already own a couple of HTML5 books.

From Andy's book, here is the XHTML version of the classic "Hello World" program.  I have removed all the < and > cause they screw up blogger.


!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
html lang="EN" dir="ltr" xmlns="http://www.w3.org/1999/xhtml"
  head
    meta http-equiv="content-type" content="text/xml; charset=utf-8" /
    titleHelloWorld.html/title
    script type = "text/javascript"
      //![CDATA[
        // Hello, world!
        alert("Hello, World!");
      //]]
    /script
  /head
  body  
  /body
/html


Now, I'm a Java programmer and used to some verbosity, but this really pushed the limits.  Speaking as a complete outsider from Mars looking at this, what the Heck were they thinking?  There's the !DOCTYPE tag that I guess you just get used to typing, or copy from a pre-existing file.  Same for the HTML lang="en" etc...  

There's the meta tag.  What's that even mean?  That I'm UTF-8 and text/xml.  I know about internationalization, where it might not be UTF-8.  But I already said that I'm English via lang="EN". And UTF-8 is the natural encoding for English, should be the default, shouldn't it?  Again, what were they thinking

Finally, the really mysterious //CDATA stuff.  Now, CDATA means "Character data".  If the XML wizards are going to force me to write 174 characters (for the DOCTYPE and html tags) just to get started, why are they suddenly trying to save a few characters here???  Just call it CHARACTER_DATA or whatever.  Secondly, why is this commented out???  I think I finally understand, it's commented out with "//" so that JavaScript will ignore it.  And, fortunately, XML parsers do not use "//" for comments so they will not ignore it.  What a hack!  Now, I'd probably appreciate the hack in some deep part of the Java Runtime or the Linux kernel.  But as something out in the open that everybody is supposed to use around every JavaScript, all I can say, yet again, is what were they thinking?  It's quite obvious that whoever wrote the XHTML standard had no thought for usability, elegance, or thought.  Whenever they saw a hoop for coders to jump through, they added two just to make sure.  I expect someday to see it revealed that XHTML was actually a social experiment like the Stanford Prison Experiment - given a big-shot committee, just how much pain and agony will users tolerate?

By contrast, here is the HTML5 version.  (Again, < and > removed)

!DOCTYPE HTML
html
   head
   titleHello World/title
   script type="text/javascript"
      alert("Hello World");
   /script
   /head
   body
   /body
/html   
     

Notice something?  All that crap is gone!  Wonderful.

As it turns out, my wife might want a Calendar thingamabob on her page.  You can find lots of JavaScript code to do this.  I found some code, a mere 220 lines long (sarcasm intended), that you can cut and paste into your page.  With HTML5, you can do it with 0 lines of code: just say input type="date".

In conclusion, I am exceeding glad that I never bothered to learn XHTML.  I hope to remain, as much as possible, proudly ignorant of XHTML.  HTML5 has made an instant convert.

Tuesday, January 3, 2012

Adventures in Ubuntu and Hadoop Part 4

Let's recap Rocky and Bullwinkle's adventures with Hadoop and Ubuntu on an old desktop computer. They fairly easily installed Ubuntu, Java and Eclipse, then installed an SSH server and set the ip4 address, and then, in their greatest adventure, finally got X-Windows (and VNC) working. Now we are finally ready to install Hadoop. I plan to follow the blog post by Michael Noll and the O'Reilley Hadoop book by Tom White (esp. Appendix A). According to Michael Noll, I have more that satisfied the prereqs.

Recent Hadoop releases are here or here. Some strange domain names. As of today 0.20.203 was the latest stable release, so I grabbed it and put it in /opt/hadoop. Then, per Michael's instructions

sudo tar xzf hadoop-0.20.203.0rc1.tar.gz and sudo chown -R hduser:hadoop hadoop-0.20.203.0

Then edit /home/hduser/.bashrc (don't forget to type sudo!) to add the following lines at the end. YMMV depending on exactly where your Hadoop and Java are installed.


# Set Hadoop-related environment variables
export HADOOP_HOME=/opt/hadoop/hadoop-0.20.203.0


# Set JAVA_HOME
export JAVA_HOME=/opt/java/32/jdk1.6.0_30

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:$HADOOP_HOME/bin

The Hadoop book suggests that you test if it will run by typing hadoop version. Before this will work,either re-login to run the .bashrc script, or manually do all three exports. If you forget to export JAVA_HOME, you'll see a useful, informative message

Error: JAVA_HOME is not set.

But, once you set all three, you'll see something like

mpc@mpc-desktop:/home/hduser$ hadoop version
Hadoop 0.20.203.0

Subversion http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333

Compiled by oom on Wed May 4 07:57:50 PDT 2011


Wohoo! Our work is done! Well, not really, there's still a whole bunch to go, like configuring the Hadoop Distributed File System. (HDFS). But, let's declare victory for now and return to that on a later day.