<%@ page import="java.io.*" %> 
<%@ page import="java.util.Collections" %> 
<%@ page import="java.util.Comparator" %> 
<%@ page import="java.util.ArrayList" %> 
<%@ page import="AlphanumComparator.*" %> 



MRNT - map reduced newsticker





Welcome to map/reduced Newsticker!

What are the most used words today:

<% String file = "/tmp/output/part-00000"; BufferedReader br = new BufferedReader(new FileReader( file)); String zeile; ArrayList Stuff = new ArrayList(); AlphanumComparator ac = new AlphanumComparator(); while ((zeile = br.readLine()) != null) { String [] splitupText = zeile.split("\t"); String a1 = splitupText[1] + " : " + splitupText[0]; Stuff.add(a1); } Collections.sort(Stuff, ac); Collections.reverse(Stuff); for(int j=0; j < Stuff.size(); j++) { out.println(Stuff.get(j)); out.println("
"); } %>

How it works:

  • We need Java JRE and JDK for Java Developement from http://java.sun.com/javase/downloads/index.jsp
  • We need Apache Tomcat from http://tomcat.apache.org/
  • Unzip the software and start Apache Tomcat /usr/local/apache-tomcat-6.0.26/bin/startup.sh (Listen default on port 8080) Set JAVA_HOME in bin/catalina.sh if the system has no defaults JAVA_HOME=/usr/local/jre1.6.0_18
  • Additional Software is needed: GNU JavaMail and GNU JAF and GNU inetlib
  • The first java app is a NNTP Client (look at http://blog.jservlet.com/post/2007/06/29/first or download it). Configure news account and newsgroup name. Compile the java to a class: /usr/local/jdk1.6.0_18/bin/javac -cp gnumail.jar:gnumail-providers.jar NNTP.java
  • Download and install Hadoop from http://hadoop.apache.org
  • Compile the WordCount.java example from hadoop sources /usr/local/jdk1.6.0_18/bin/javac -cp hadoop-0.20.2-core.jar -d wordcount_classes WordCount.java
    /usr/local/jdk1.6.0_18/bin/jar -cvf wordcount.jar -C wordcount_classes/ .
  • Fetching all article bodies from the selected newsgroup: cd /usr/local/cloudcomputing/mail-1.1.2/
    /usr/local/jre1.6.0_18/bin/java NNTP gnu.mail.providers.nntp.NNTPStore > ../hadoop-0.20.2/input/001

  • Starting hadoop in single mode or server or use AWS:
    cd /usr/local/cloudcomputing/hadoop-0.20.2/
    bin/hadoop jar wordcount.jar org.myorg.WordCount input output
    cp /usr/local/cloudcomputing/hadoop-0.20.2/output/part-00000 /usr/local/apache-tomcat-6.0.26/webapps/cloud/

  • Download AlphanumComparator from here and install it in the class directory from your webapp
  • Write