Wednesday, December 28, 2011

Adventures in Ubuntu and Hadoop Part 2

Continuing from the previous post, where I eventually managed to wrestle Ubuntu Linux, the Oracle JDK 6, and Eclipse, onto an old desktop computer. The next step is to install Hadoop. Or is it? There's some more prep work that needs to be done sometime, as noted in this excellent article, with the exact title I want, "Running Hadoop On Ubuntu Linux (Single-Node Cluster)". It's looking like I haven't done much so far, only accomplishing the first of about 30 labors (which is about three times more than Heracles).

Adding a hadoop group and hduser was easy
$ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser
as was generating a SSH key, with a but...
user@ubuntu:~$ su - hduser hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
But... SSH wasn't installed on my system yet. The Ubuntu Software Center graphical program wasn't very useful - it just showed clients when I searched for SSH. But found this post detailing what to do.

sudo apt-get install openssh-server
With this accomplished, the keypair generation worked and I was able to test by logging in with no password via

ssh localhost

The article then recommends disabling IPv6 via a simple addition to the /etc/sysctl.conf file. Done. Well, minor glitch, by that point I was logged in as hduser, not the main user and I got myself confused on sudo stuff. For simplicity, opened up a new terminal window as the boss-man and did the editing, adding these lines to the end of /etc/sysctl.conf.

#disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Next step? Well, before installing Hadoop, I really should verify that the object of this exercise works: that I can see the Ubuntu "workstation" from my Windows7 laptop. So, at least one more prepatory step, installing a X server (stupid name, should be an X client) on the laptop. Looks like Xming is the best option. More next session!

One final small step - configure my Ubuntu to not use DHCP for it's address, so that it's constant behind our router/firewall. The trick I use (which I read somewhere else, not my idea) is to assign it a number less than 100, cause most routers start assigning at 100 and up. Our printer is at 10, so in the best BASIC/FORTRAN line numbering style I'll leave the 10s for more printers, and start Ubuntu at 20.

No comments:

Post a Comment