Sunday, December 25, 2011

Adventures in Ubuntu and Hadoop Part 1

One of my goals for 2012 is to learn Hadoop. So I want to install it somewhere. I have a nice Windows 7 Laptop used for most development. Hadoop only "sortof runs" on Windows, for basic development only, and then only with Cygwin. Now, I know many people who love and swear by Cygwin, but I'm in the other camp - in admittedly limited experience, I disliked and swore at Cygwin. Besides, I've wanted to play with Linux for a while anyway. Twenty years ago I knew how to use vi and ls. How hard could it be? And, in reorganizing / swapping offices with my wife, that freed up an old 160 GB disk and some RAM to upgrade my 7 year old desktop, a Gateway 503GR. So, why not install Linux on the 2nd drive, and then Hadoop?

The nice thing about Linux is that you can spend weeks deciding which "distro" is best. Fortunately, some Googling revealed that, for my old machine, Ubuntu definitely ran, and Ubuntu got good reviews as relatively "easy". I grabbed a book from the local library (Ubuntu Linux by Willian von Hagen) and got started.

Downloading the ISO image for Ubuntu 11.10 to my Windows machine was pretty easy. Fortunately I already had a utility program to burn ISO images to CD. So far, so good. Transferring the hardware was easy too. My 1G RAM and 160GB disk was quickly upgrded to 2G and 2x160GB. My plan was to keep Windows XP on the original disk and install Linux on the 2nd one.

O.K., boot up and hit F2. Well, the first try my timing was off, but I did notice that Windows saw the new RAM and disk. Second time I got the timing right and opened in "Try Ubuntu" mode. I wanted to look at the disks to make sure of which was which. Turned out that disk "b" was the new one. Then clicked install. I got a surprising dialog about "unmounting" the disks, thought a bit, said what the hey and did that. Said to install on disk "b" and everything went pretty smoothly. Except, I noticed that the "look for updates" box was disabled. My network adapter (on the motherboard) is getting cranky, especially after the computer is totally disconnected from AC power. Anyway, it installed fine.

Time to reboot. To my pleasant surprise, the "Grand Unified Boot Loader" (GRUB) worked just as advertised, and I could boot into either Linux or Windows. Ubuntu recognized my NVidia card and I was able to get both monitors configured quickly. Just needed some playing with the power cable to trick the network adapter into working.

So far, so good. Now, to install Java. Right now Java 6 is very well established, at update 30, while Java 7 is pretty new. I'm not using any of the new Java 7 yet. Java SE 6 update 30 it is. Now the trouble begins. Now I'm sure that somebody knows why, (here's a link) but Ubunti and the "official" Oracle Java JDK don't get along. In that Ubuntu doesn't feature it in their Software Center or Synaptic Package Manager. They do feature the Open JDK stuff, which has some mixed reviews. Besides, that would be too easy - what would I learn with that? I downloaded the official .bin package, not the rpm.bin. Moved it deep in the bowels of /opt, then ran it (eventually) via

sudo ./jdk-6u30-linux-i586.bin

Geez, you need to type sudo a lot! But java -version didn't work. And my feeble attempts to add java/bin to the PATH didn't seem to work either.

The magical incantation, found by Googling, was:

/usr/bin$ sudo update-alternatives --install /usr/bin/java java /opt/java/32/jdk1.6.0_30/bin/java 100
then
sudo update-alternatives --config java

The first command will put a java link in /usr/bin that points to /etc/alternatives/java, which is a link that points to the actual stuff in /opt. Anyway, this worked, and java -version happily printed out java version "1.6.0_30"

BTW, I since found a more complete web site with the gory details. Looks like I still have a little work to do.


Before moving on to Hadoop (which will be the subject of a future post) I thought I should install Eclipse. Downloaded the latest, eclipse-java-indigo-SR1-linux-gtk.tar.gz, put it into /opt/eclipse, and the typed my first tar xzf in years.

tar xcf ec*


Drat, need yet another sudo,

sudo tar xzf ec*
.

Worked! A little more research found a simple script to put into /usr/bin to launch it. Create /usr/bin/eclipse with:

#!/bin/sh
export ECLIPSE_HOME="/opt/eclipse/eclipse"

$ECLIPSE_HOME/eclipse $*

Viola! I'll probably want to add some settings for memory usage, but, for now, Eclipse started up and asked for me to Select a workspace etc... It is pointing at the correct JRE. And a quick "Hello World" worked.

Stay tuned for the next exciting episode, "Ricky and Bullwinkle wrestle with Hadoop", or, "How many more times will I forget the sudo wrestler?". :-)

Wednesday, December 14, 2011

Testing Error Handling in Complex Code - The "Error Injection" Principle

I'm working with some legacy code, adding cool new features, many involving multi-threading.  But testing the error handling is difficult. The code is very complex for unit tests, and because it was written many years ago it isn't really structured for them.  I can step through in the debugger and change some variables to cause NPEs, etc, but that's really slow and tedious.

So I developed a fairly simple class, called TestSimulator, to do "Error Injection".  :-)  Basically, at key parts in your code you insert one line of code

TestSimulator.doCommand(someUniqueStringRepresentingYourLocation);

e.g.

TestSimulator.doCommand("MyClassName.someMethodName-complete");

TestSimulator has a boolean enabled, plus a HashMap.


It uses the unique string to look up a command, which is a simple DSL for what to do.  It currently supports


throw:exceptionClass
throw:exceptionClass(message)
sleep:milliseconds
interrupt:


All of them do pretty much what you'd expect.  For example, you could say throw:java.lang.ArithmeticException(Too many foobars).  (no quotes)  Note that interrupt: does not throw an Interrupted exception, use throw: for that, it calls Thread.currentThread().interrupt(); 
so you can test if you are properly checking that flag later.

Here's the source code.  Error handling within the class is a bit primitive.

public class TestSimulator {

 public static boolean enabled = false;
 
 /**
  * Commands have the form command:value
  * Currently we support
  *  throw:exceptionclass,     e.g. throw:java.lang.ArithmeticException
  *  throw:exceptionclass(message), e.g. throw:java.lang.ArithmeticException(Too many foobars)
  *  sleep:milliseconds,      e.g. sleep:1000
  *  interrupt:
  */
 public static final HashMap<String, String> sCommandMap = new HashMap();
 
 
 
 
 /**
  * Executes the command for the given key
  *
  * @param key
  * @throws Exception  the most common case throws some form of Throwable
  */
 public static void doCommand(String key) throws Exception {
   String command = sCommandMap.get(key);
   if (!enabled || command == null)
    return;
  
   System.out.println("testSimulator.doCommand " + command);
  
   if (command.startsWith("throw:")) {
    Throwable t = makeThrowable(command.substring(6));
    if (t instanceof Exception)
      throw (Exception)t;
    else if (t instanceof Error)
      throw (Error)t;    
   }
   else if (command.startsWith("sleep:")) {
    long milliseconds = Long.parseLong(command.substring(6));
    Thread.sleep(milliseconds);
   }
   else if (command.startsWith("interrupt:")) {
    Thread.currentThread().interrupt();
   }
  
   else {
    throw new IllegalArgumentException(key + " = " + command);
   }
 }

 // utilities
 static Throwable makeThrowable(String classNameAndMessage) {
   String message = null;
   String className = classNameAndMessage.trim();
   int paren = classNameAndMessage.indexOf('(');
   if (paren > 0) {
    message = classNameAndMessage.substring(paren+1, classNameAndMessage.length()-1);
    className = classNameAndMessage.substring(0, paren).trim();
   }
   try {
    Class<? extends Throwable> clazz = (Class<? extends Throwable>) Class.forName(className);
    if (message == null)
      return clazz.newInstance();
    else
      return clazz.getConstructor(String.class).newInstance(message);
   } catch (Exception e) {
    e.printStackTrace();
    return null;
   }
 }


To actually use this class, I've been writing small little mains in the classes I am primarily interested in testing.  There's probably a better way, but this works for now.  e.g. if I am testing a class called CalculatePI, it would have a main looking like:


public static void main(String[] args) throws Exception {
      
   TestSimulator.enabled = true;
      
   // modify the following as desired
   TestSimulator.sCommandMap.put("CalculatePI.calculateThis1", null);
   TestSimulator.sCommandMap.put("CalculatePI.longCalculationStep3", "interrupt:");
   TestSimulator.sCommandMap.put("CalculatePI.longCalculation-complete", throw: java.lang.ArithmeticException(Failed to converge)");
   
   // launch the big fancy app as appropriate here... 
   MainClass.main(new String[0]);
}

Friday, March 25, 2011

The Microsoft Suit against Android apps - these are patents?

Microsoft is suing Barnes and Noble and other companies making Android-powered devices.  Geekwire enumerates the patents allegedly violated.  Read them.  Geez, these are patents?  6891551, Selection handles around selections was patentable in 2005?  Hasn't that been around forever?  5778372, Delayed downloading of web images?  Isn't that pretty much what ALT is for?  At least that patent is from 1998, where I can vaguely imagine it was novel.  6957233, annotating a read-only file by noting the location in the file.  That was novel is 2005?

I don't know, maybe child windows were novel in 1999?.  5,889,522.



Friday, December 17, 2010

CopyOnWrite Wrappers Part 2 (now CIS Wrappers)

There was some good feedback on my first pass. To summarize:


1. The implementation is not a "true" CopyOnWrite. It is more of a "concurrent iteration safe wrapper". Implying that it is CopyOnWrite will confuse users.

2. Does the code pass the JSR-166 Unit tests?

3. For speed: Why are the getters synchronized? Could you use atomics?

4. What are you trying to solve, why not use a (some other class)?


I have updated the code, at http://flyingspaniel.wikidot.com/cow to address those comments:

1. To better indicate that these are not true CopyOnWrite, they are now named CISListWrapper and CISMapWrapper, where CIS stands for "Concurrent iteration safe". There are improved comments. Users who are familiar with CopyOnWrite behavior should not be confused or disappointed.

2. In addition to my own unit test (CISWrapperTest.java) I modified some of the JSR 166 unit tests, renaming them, making them more generic, and taking out a few tests (serialization) that didn't really apply (or work for me). This required a few changes in my code - mainly adding equals(), hashCode() and toString() methods that I had neglected.

3. As implemented, the reads do need to be synchronized. (also here). I wasn't concerned about ultimate speed, and this has the tremendous advantage of being more "idiot-proof". The wrapped Collection need not be synchronized. Removing the synchronization might require a user to add a Collections.synchronizedXXX() wrapper around the underlying collection, adding one more layer and eliminating any speed benefit.

I did consider changing the synchronization from the wraper to the wrapee. That is, instead of the current

public synchronized boolean containsKey(Object key) {
return wrapee.containsKey(key);
}

use instead:

public boolean containsKey(Object key) {
synchronized(wrapee) {
return wrapee.containsKey(key);
}
}

This is "left as an exercise for the reader" and might offer modest speed increases if the underlying collection were itself synchronized. If you really want the utmost in speed on simple reads, extend or use something like ConcurrentHashMap, or use one of the "true" CopyOnWriteMaps:

org.apache.mina.util.CopyOnWriteMap



4. So why use the CIS code? The "true" CopyOnWrite wrappers make a copy of your Map, and, for the most part, they copy the data into a basic HashMap or TreeMap. If that behavior is suitable, you are better off using their wrappers. Of course, if that basic behavior is suitable and you are seeking ultimate speed, you could consider rewriting your code to use the "standard" java.util.concurrent.ConcurrentHashMap.

If you have an existing List or Map that is not thread-safe or iteration-safe, and it has special or complex behavior that is not a simple HashMap or TreeMap, then this pure wrapper is useful. For example, you have a class com.mycompany.FunkyMap and it:
  1. validates inputs and throws Exceptions
  2. has special behavior for null keys or values
  3. has unusual rules for sorting
  4. implements some security rules on puts and gets. (and throws Exceptions)
  5. logs stuff
  6. encrypts values and stores them on a database
  7. is some facade or proxy (say, for an ORM)
  8. is buggy and you have set some breakpoints in your IDE or added printlns.
  9. is a singleton

In order to keep this behavior in a pure CopyOnWrite, the copy would have to itself be a FunkyMap, not a TreeMap. One could use reflection, but there goes all the speed bonus. To their credit, the Atlassian Utilities do provide for the possibility of subclasses. It's a small bit of work and may not be suitable for all cases.



Friday, December 3, 2010

CopyOnWrite Wrappers

Java has a CopyOnWriteArrayList is a very useful class in java.util.concurrent which, when iterating, takes "snapshots" of the underlying array, and therefore never throws a ConcurrentModificationException.  It is highly effective and efficient when you need to preclude interference among concurrent threads.  But it only implements two types of collection behavior: ArrayList and a CopyOnWriteArraySet.  Unlike many of the other java.util.Collections goodies, like Collections.unmodifiableList, it is not implemented as a wrapper class, but as an actual class.  So, if you want behavior other than an ArrayList or ArraySet, or you need to protect one of your own special List implementations, you are out of luck.  Nor is there any version for a Map, i.e. there is no CopyOnWriteMap.

So I implemented it.  The files are on my wiki page at http://flyingspaniel.wikidot.com/cow.

COWListWrapper wraps any List, providing "CopyOnWrite" behavior for iteration.
COWMapWrapper wraps a Map, providing "CopyOnWrite" behavior when you iterate using keySet(), entrySet(), or values().

There are some hopefully useful utility classes that they use:

UnmodifiableCollection is an actual class, not a wrapper utility, to wrap an underlying Collection with unmodifiable behavior.  It includes static inner classes, UnmodifiableCollection.List and UnmodifiableCollection.Set, that add List or Set behavior.

UnmodifiableIterator is a class implementing ListIterator that works efficiently with the above classes and prohibits modification during iteration.

Enjoy!

A couple of notes.  For any CopyOnWrite implementation, even the "official" Java ones, you only get the iteration safety if you actually use the iterator( ).  If you use the old fashioned "C-style" integer loop

for (int i=0; i < myList.size(); i++) {
  doSomethingWith( myList.get(i));
}

and a separate thread is modifying myList, all bets are off.  You should use the new style:

for (E element:myList) {
  doSomethingWith(element);
}

The standard Java CopyOnWrite classes know exactly what collection behavior (ArrayList or ArraySet)  to implement and, at construction, they make a safe copy any incoming data, and thereafter are completely divorced from the original collection.  My wrapper classes do not, and cannot, know all possible List or Map behaviors.  They use the passed in (wrapped) List or Map for storage and most of the implementation.  For example, a call to add(Object o) is passed to the underlying object, and it implements the behavior and storage.  In other words, my wrappers do not make a "safe" copy of the data at construction time, and remain linked to the underlying collection.  You should not use the wrapped collection directly - all calls and modifications must be done through the the wrapper.  The "safe" copy of data is done only for iteration.

Recently added:
  UnmodifiableCollection.toString() implemented (useful in the unit test)
  COWWrapperTest  unit tests

Sunday, September 26, 2010

What's the Key for the next "big" parallel programming language?

This has been a popular subject on Artima, e.g.here &  here.  Most of the discussion has revolved around syntax, simplicity, speed, etc.  IMO, this is interesting, but, ultimately, not important.


In order for a parallel programming language to take off and grab mindshare, it must be well integrated with low cost GPUs from ATI and NVIDIA.  Because, ultimately, to take advantage of parallel programming, one either has to have a million dollar compute cluster, or a $100 graphics card.  My bet is on cheap.


A pure, elegant language that doesn't readily talk to CUDA will lose in the marketplace to some hack language that does.  It will be Beta vs. VHS, or 68000  vs. 8088 revisited.  So Clojure, Scala, Fantom, Groovy enthusiasts, write some GPU libraries!



Sunday, September 12, 2010

Java RMI is a PITA

My background is mainly in Java desktop applications, not "EE server stuff".  But I had an idea for a simple server app, and, having recently attended an excellent pair of meetups sponsored by the SF Java User's Group, I was psyched to write a server app.  The app involves the instrument software sending status messages to a central server.  So first, how to do that?

In the past, I've often done this by opening a ServerSocket on the server, listening to connection requests from the client on a known port, creating a short lived Socket, and streaming over the data.  It works and I know how to code it.  But it's not particularly robust, nor scalable, and it's "so 1980s".

Since the client (at least initially) is written in Java, I next thought to use RMI.  From the little I knew it was "simple".  The drawback, that it only worked for Java programs, wasn't a deal breaker.  Sun/Oracle provide some good documentation.

As you can see from their docs, basic coding is simple.  Define an interface for the service, implement it on the server, and call it from the client.  Since the messages I was passing were basic Strings, not any fancy business classes, no extra classes needed to be declared.  In my case, the interface was one method

public interface MyInterface extends Remote {
   public void setStatus(String instrumentID, String status) throws RemoteException;
}

and the implementation was also quite simple, storing the message in a HashMap.  Well, simple once I realized that the call needed to declare that it throws a RemoteException.  Forget that and it won't work.  Also, both server and client need a main() to do basic registration.  But there's much more.  The Devil is in the details.

Setting up the classes and .jar files to compile is a minor PITA.  Even in my very trivial case.  Mainly getting everything onto the right classpath.  As you can imagine from the tutorial, this could be complex for a complex RMI call, or if the classes depend on numerous other classes or jar files.

On the server, somebody must manually start the registry, rmiregistry.  This seems a needless step, because in main the server class calls

MyImpl theServer = new MyImpl();
Naming.rebind(NAME, myImpl);

Why can't the java.rmi.Naming check to see that rmiregistry is running?  There's probably some complex cases where you don't want this, or a security issue, but why not handle the simple case?  BTW, just like a ServerSocket, the client needs to know the host and port for the rmiregistry.  No big advantage to RMI here. (yes, once you have multiple RMI calls, it's simpler cause you need only assign one port, not many)

There's a possible classpath issue on the server, where you may have to set java.rmi.server.codebaseWhy?  In my case the one interface that my client uses, MyInterface, has already been included and resolved in the code - it gets compiled!  Isn't the idea to code to an interface and try not to think about the implementations?  Sure, there's probably a good reason, but another hassle.

Much more tedious is the SecurityManager stuff.  In your main, you have to provide a SecurityManager. 

System.setSecurityManager(new RMISecurityManager());


 Why?  Why have security and force people to put in their own, especially when the code examples all grant AllPermissions?  Seems like saying "we have a lock, but you must replace it with an open door".  And, frankly, I never even got this to work.  Probably got some of the -Ds wrong in the runtime arguments.  Luckily, I found this RMI tutorial  where they override key security checks to do nothing.

System.setSecurityManager (new RMISecurityManager() {
public void checkConnect (String host, int port) {}
  public void checkConnect (String host, int port, Object context) {}
});


By doing this I got RMI to work.  But the security hassles seem designed to lead the user to provide no security, and, IMO, the whole RMI setup issue is a PITA.  Besides, RMI is so "2000".  If you Google "Java RMI sucks", you'll get a lot of hits.  This one was useful.

I next looked into REST, much more trendy and cutting edge, for which JavaEE 6 provides support.  It worked for me.  More in a later blog.