Apache Accumulo Table-to-File Example
April 4, 2022 ยท View on GitHub
This example uses mapreduce to extract specified columns from an existing table.
To run this example you will need some data in a table. The following will put a trivial amount of data into accumulo using the accumulo shell:
$ accumulo shell
root@instance> createnamespace examples
root@instance> createtable examples.input
root@instance examples.input> insert dog cf cq dogvalue
root@instance examples.input> insert cat cf cq catvalue
root@instance examples.input> insert junk family qualifier junkvalue
root@instance examples.input> quit
The TableToFile class configures a map-only job to read the specified columns and writes the key/value pairs to a file in HDFS.
The following will extract the rows containing the column "cf:cq":
$ ./bin/runmr mapreduce.TableToFile -t examples.input --columns cf:cq --output /tmp/output
$ hadoop fs -ls /tmp/output
Found 2 items
-rw-r--r-- 3 root supergroup 0 2021-05-04 10:32 /tmp/output/_SUCCESS
-rw-r--r-- 3 root supergroup 44 2021-05-04 10:32 /tmp/output/part-m-00000
We can see the output of our little map-reduce job:
$ hadoop fs -text /tmp/output/part-m-00000
catrow cf:cq [] catvalue
dogrow cf:cq [] dogvalue