Apache Accumulo Table-to-File Example

April 4, 2022 ยท View on GitHub

This example uses mapreduce to extract specified columns from an existing table.

To run this example you will need some data in a table. The following will put a trivial amount of data into accumulo using the accumulo shell:

$ accumulo shell
root@instance> createnamespace examples
root@instance> createtable examples.input
root@instance examples.input> insert dog cf cq dogvalue
root@instance examples.input> insert cat cf cq catvalue
root@instance examples.input> insert junk family qualifier junkvalue
root@instance examples.input> quit

The TableToFile class configures a map-only job to read the specified columns and writes the key/value pairs to a file in HDFS.

The following will extract the rows containing the column "cf:cq":

$ ./bin/runmr mapreduce.TableToFile -t examples.input --columns cf:cq --output /tmp/output

$ hadoop fs -ls /tmp/output
Found 2 items
-rw-r--r--   3 root supergroup          0 2021-05-04 10:32 /tmp/output/_SUCCESS
-rw-r--r--   3 root supergroup         44 2021-05-04 10:32 /tmp/output/part-m-00000

We can see the output of our little map-reduce job:

$ hadoop fs -text /tmp/output/part-m-00000
catrow cf:cq []	catvalue
dogrow cf:cq []	dogvalue