HBase-Writer 0.18.2 has been released. This release contains support for max content size, default max size is 20 MB. Any content item crawled that is bigger than 20MB will be rejected by the writer. This release also contains a bug fix; If HBase throws an exception, the writer wasnt being added back to the Heritrix writerpool. The writer is now being added back. Thanks to Andrew Purtell at Apache for these patches.
HBase-Writer is a processor plugin following the Heritrix2 processor API. With HBase-Writer, you can have Heritrix2 crawl and save its results directly to a table in HBase. The HBase-Writer plugin was based off the Heritrix-HDFS-Writer plugin. Thanks to Questio for the support in releasing this project.