NMS Benchmarking Framework

August 21, 2020 · View on GitHub

=================================================================================== NMS Benchmarking Framework

CUDA implementation of the algorithm described in the paper:

"Work-Efficient Parallel Non-Maximum Suppression Kernels"

http://dx.doi.org/10.1093/comjnl/bxaa108

The Computer Journal

David Oro, Carles Fernández, Xavier Martorell, Javier Hernando

Requirements:

     1. GCC Compiler v5.0 or greater
     2. CUDA Toolkit v6.0 or greater
     3. NVIDIA GPU with Compute Capability 3.2 or greater

Build instructions:

     1. Set the GPU_ARCH and SM_ARCH variables in the Makefile according to 
        the underlying NVIDIA GPU architecture of your computer. For further 
        details, please refer to our GitHub Wiki page:

              https://github.com/hertasecurity/gpu-nms/wiki

     2. Set your CUDA installation path in the Makefile (CUDA_HEADERS and 
        CUDA_LIBS variables)

     3. Compile the source code:   make

Execution:

     * You can run the GPU NMS benchmark using a comma-separated input file 
       containing the list of detected objects in the following format:

              xcoordinate,ycoordinate,width,score

     * We provide a sample input file "detections.txt" obtained after having 
       executed a face detector over the "oscars.png" file.

     * The GPU NMS benchmark must be executed as follows:

              ./nmstest  detections.txt  output.txt

     * The application should then return the computation time of both the MAP 
       and REDUCE GPU NMS kernels and write the results in the "output.txt" file.

     * Finally, you can visualize both the input (pre-NMS) and the output 
       (post-NMS) with the "drawrectangles" Python script. For example:

              ./drawrectangles  detections.txt

       Or:

              ./drawrectangles  output.txt

       The graphical output is stored in the "oscarsdets.png" file

IMPORTANT:

     * The source code must be compiled to the microarchitecture matching the 
       GPU platform during execution (check GPU_ARCH and SM_ARCH variables 
       in the Makefile).

     * If the NMS algorithm is not capable of properly merging the candidate 
       windows, re-check the GPU_ARCH and SM_ARCH variables and then 
       recompile the code.

     * This GPU NMS benchmark is limited to a maximum of 4096 detected 
       objects per input. If you want to increase this limit, please 
       modify the MAX_DETECTIONS constant in the "nms.cu" file.