Wednesday, December 30, 2009

Answering the Clojure vs. Ruby & Scala Challenge

Lau Jensen has done an interesting comparison of the relatively "new" languages Clojure, Ruby & Scala, comparing them for lines of code and speed on a sample "interview" problem, counting unique words in a directory.  Below in my Java version of the solution.  (You can also find it here)  One could squeeze a few more LOC out of it, but, IMO, that's not the point. The point is to show that a relatively normal and "readable" Java program, using no foreign libraries, does the same job in a reasonably similar amount of lines of code to the others.  I think you could find some Apache or similar code to do some of this work.

The original code is verbose "thanks" to all the generics definitions.

Note added after first posting:

You can save considerable time (about 15%) by pre-compiling the regular expression.  Add a line

Pattern splitOnWhitespace = Pattern.compile("[ \t]");

then change line 18, the line.split() code to

for (String s : splitOnWhitespace.split(line))

I've done this below, plus fixed one other issue about the definition of a "word".

I haven't does a full comparison timing, but hopefully Lau will run one shortly.

import java.util.*;

public class WordCounter {

 public static void main(String[] args) throws IOException {
  Long timeStart = System.currentTimeMillis();
  File rootDir = new File("C:/temp/20_newsgroups");
  CountingSet counter = new CountingSet();
  Pattern wordPattern = Pattern.compile("\\w+");

  for (File groupDirectory : rootDir.listFiles())
   if (groupDirectory.isDirectory())
    for (File f : groupDirectory.listFiles()) {
     if (f.isFile()) {
      BufferedReader reader = new BufferedReader(new FileReader(f));
      String line;
      while ((line = reader.readLine()) != null) {
       Matcher matcher = wordPattern.matcher(line);
       while (matcher.find())

   PrintWriter pw = new PrintWriter("C:/temp/counts-alphabetical-java.txt");
   for (Map.Entry<String, Integer> me : counter.entrySet())
    pw.println(me.getKey() + " : " + me.getValue());
   pw = new PrintWriter("C:/temp/counts-decreasing-java.txt");
   spewInverted(counter, pw);
   System.out.println("Finished in " + 0.001
            * (System.currentTimeMillis() - timeStart) + " seconds");

 static void spewInverted(Map<String, Integer> in, PrintWriter pw) {
  ArrayList<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String, Integer>>(
  Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
   public int compare(Map.Entry<String, Integer> o1,
Map.Entry<String, Integer> o2) {
    return o2.getValue().compareTo(o1.getValue());

  for (Map.Entry<String, Integer> entry : list)
   pw.println(entry.getKey() + " : " + entry.getValue());


class CountingSet extends TreeMap<String, Integer> {
 void add(String s) {
  Integer i = get(s);
  put(s, (i== null) ? Integer.valueOf(1) : Integer.valueOf(i+1));

