bigann

December 18, 2024 ยท View on GitHub

This mini crate benches the crate hnsw-rs on sampled u8 vectors from the BIGANN benchmark. See BIGANN and IRISA

Files bigann_base.bvecs, bigann_query.bvecs must be dowloaded and installed in some directory (This amounts to 133Gb). Then depending on the size of the data you want to run on the first (10M, 100M, 1B) of the large file bigann_base.bvecs you download the corresponding ground truth as explained in the BIGANN web page.

To run on the first 10M slices of data you download the corresponding ground truth corresponding to this size and extract the file dis_10M.fvecs and idx_10M.ivecs from the gnd directory to put it in the same directory as bigann_base.bvecs and bigann_query.bvecs check in source.

commandline

  • bigann --dir DataDir --nbdata 10 (or 100 or 1000) to specify the number million data you want to run.

Use --dump to dump the hnsw structure for search variations

  • bigann --dir DataDir --dump --nbdata 10 (or 100 or 1000) and
  • bigann --dir DataDir --hnsw "dumpbigann" --nbdata 10

For more see documentation (cargo doc --no-deps as usual)

Results for the first 10 Million data points.

Results with standard level sampling

Results on Intel E5-2630 v3 @2.4GHz 16 cores 2 thread / core

All parameters are explained in doc of hnsw-rs.

knbnmax_nb_connef_consef_searchextendkeep prunedrecallreq/slast ratio
1064100128nono0.99526101.0002
10064100128nono0.98313501.0006
1024100128nono0.97048451.001
10024100128nono0.92324111.003
knbnmax_nb_connef_consef_searchextendkeep prunedrecallreq/slast ratio
1024100128nono0.96059001.001
10024100128nono0.90728001.004
1024400128nono0.97246781.001
10024400128nono0.93823381.003
1024800128nono0.97543131.001
10024800128nono0.942821511.0025

Results with Amd Ryzen 9 7950 16 core and 0.5 scale modification factor

With modified level sampling level (as documented in hnsw-rs) we increase recall and have with max_nb_conn 48 better results than cited above with max_nb_conn=64 without scale modification.
This decreases memory consumption

knbnmax_nb_connef_consef_searchextendkeep prunedrecallreq/slast ratio
1048100128nono0.99762831.0001
10048100128nono0.98031521.0007
1024100128nono0.98998251.0003
10024100128nono0.95848971.0017