Print
Parent Category: Weblog
Hits: 25578

HBase-Writer version 0.94.0 has been released and is available for download now.  This version of HBase-Writer continues to have support for both Heritrix2 & Heritrix3 (3.1.1) and has been tested against the latest release version of HBase (0.95.1) and Hadoop (1.1.2) and all their dependencies.  An exception handling bug was discovered in the makeWriter() method.  Previously a RuntimeException was not logging the parent exception but it is fixed in this latest release.  Several new dependencies were added from HBase and they have been added to the README files.  The HBase server I tested on is running 0.95.2 but hbase-writer is built against 0.95.1 because of a RuntimeException caused during unit testing.  Here was the stacktrace:

testCreateHBaseWriter(org.archive.io.hbase.TestHBaseWriter)  Time elapsed: 0.366 sec  <<< FAILURE!
java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (0.95.2-hadoop2), this version is 0.95.2-hadoop1
    at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:70) .....

After reading the hbase mailing list and talking with some developers it seems to be caused from bad packaging issues.  These issues should be resolved in v0.96.x  I didnt bother to debug to find where the reference to 0.95.2-hadoop2 is coming from but 0.95.1-hadoop1 builds and passes the instance creation test so the latest release of hbase-writer has this version set in the maven build file (pom.xml).  Here are the jar dependencies I needed to copy from my test hbase installation (v0.95.2-hadoop1 running hadoop v1.1.2) into my test heritrix installation (v3.1.1):

cp hbase-writer-0.94.0.jar heritrix/lib/

cp hbase/lib/hbase-common-0.95.2-hadoop1.jar heritrix/lib/
cp hbase/lib/hbase-protocol-0.95.2-hadoop1.jar heritrix/lib/
cp hbase/lib/hbase-server-0.95.2-hadoop1.jar heritrix/lib/
cp hbase/lib/hbase-client-0.95.2-hadoop1.jar heritrix/lib/
cp hbase/lib/protobuf-java-2.4.1.jar heritrix/lib/
cp hbase/lib/commons-configuration-1.6.jar lib/
cp hbase/lib/hbase-protocol-0.95.2-hadoop1.jar heritrix/lib/
cp hbase/lib/slf4j-api-1.6.4.jar heritrix/lib/
cp hbase/lib/htrace-core-2.00.jar heritrix/lib/
cp hbase/lib/jackson-mapper-asl-1.8.8.jar heritrix/lib/
cp hbase/lib/jackson-core-asl-1.8.8.jar heritrix/lib/

In hbase-writer TRUNK currently I have added a "jar-with-dependencies" goal so hbase-writer and all of its dependencies can be placed into one jar and you can use this one jar to copy over to heritrix/lib.

After adding the bean configuration described in hbase-writer's README for Heritrix3, you should be able to start up Heritrix3 and use the Heritrix3 web-ui to make crawls that write to hbase tables. 

Happy crawling..... Thank you and Enjoy :)