README.md
October 10, 2016 ยท View on GitHub
Grad-CAM: Gradient-weighted Class Activation Mapping
Code for the paper
Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
https://arxiv.org/abs/1610.02391
Demo: gradcam.cloudcv.org

Usage
Download Caffe model(s) and prototxt for VGG-16/VGG-19/AlexNet using sh models/download_models.sh.
Classification
th classification.lua -input_image_path images/cat_dog.jpg -label 243 -gpuid 0
th classification.lua -input_image_path images/cat_dog.jpg -label 283 -gpuid 0
Options
proto_file: Path to thedeploy.prototxtfile for the CNN Caffe model. Default ismodels/VGG_ILSVRC_16_layers_deploy.prototxtmodel_file: Path to the.caffemodelfile for the CNN Caffe model. Default ismodels/VGG_ILSVRC_16_layers.caffemodelinput_image_path: Path to the input image. Default isimages/cat_dog.jpginput_sz: Input image size. Default is 224 (Change to 227 if using AlexNet)layer_name: Layer to use for Grad-CAM. Default isrelu5_3(userelu5_4for VGG-19 andrelu5for AlexNet)label: Class label to generate grad-CAM for (-1 = use predicted class, 283 = Tiger cat, 243 = Boxer). Default is -1. These correspond to ILSVRC synset IDsout_path: Path to save images in. Default isoutput/gpuid: 0-indexed id of GPU to use. Default is -1 = CPUbackend: Backend to use with loadcaffe. Default isnnsave_as_heatmap: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1
Examples
'border collie' (233)

'tabby cat' (282)

'boxer' (243)

'tiger cat' (283)

Visual Question Answering
Clone the VQA (http://arxiv.org/abs/1505.00468) sub-repository (git submodule init && git submodule update), and download and unzip the provided extracted features and pretrained model.
th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'dog' -gpuid 0
th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'cat' -gpuid 0
Options
proto_file: Path to thedeploy.prototxtfile for the CNN Caffe model. Default ismodels/VGG_ILSVRC_19_layers_deploy.prototxtmodel_file: Path to the.caffemodelfile for the CNN Caffe model. Default ismodels/VGG_ILSVRC_19_layers.caffemodelinput_image_path: Path to the input image. Default isimages/cat_dog.jpginput_sz: Input image size. Default is 224 (Change to 227 if using AlexNet)layer_name: Layer to use for Grad-CAM. Default isrelu5_4(userelu5_3for VGG-16 andrelu5for AlexNet)question: Input question. Default isWhat animal?answer: Optional answer (For eg. "cat") to generate Grad-CAM for ('' = use predicted answer). Default is ''out_path: Path to save images in. Default isoutput/model_path: Path to VQA model checkpoint. Default isVQA_LSTM_CNN/lstm.t7gpuid: 0-indexed id of GPU to use. Default is -1 = CPUbackend: Backend to use with loadcaffe. Default iscudnnsave_as_heatmap: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1
Examples
What animal? Dog

What animal? Cat

What color is the fire hydrant? Green

What color is the fire hydrant? Yellow

What color is the fire hydrant? Green and Yellow

What color is the fire hydrant? Red and Yellow

Image Captioning
Clone the neuraltalk2 sub-repository. Running sh models/download_models.sh will download the pretrained model and place it in the neuraltalk2 folder.
Change lines 2-4 of neuraltalk2/misc/LanguageModel.lua to the following:
local utils = require 'neuraltalk2.misc.utils'
local net_utils = require 'neuraltalk2.misc.net_utils'
local LSTM = require 'neuraltalk2.misc.LSTM'
th captioning.lua -input_image_path images/cat_dog.jpg -caption 'a dog and cat posing for a picture' -gpuid 0
th captioning.lua -input_image_path images/cat_dog.jpg -caption '' -gpuid 0
Options
input_image_path: Path to the input image. Default isimages/cat_dog.jpginput_sz: Input image size. Default is 224 (Change to 227 if using AlexNet)layer: Layer to use for Grad-CAM. Default is 30 (relu5_3 for vgg16)caption: Optional input caption. No input will use the generated caption as defaultout_path: Path to save images in. Default isoutput/model_path: Path to captioning model checkpoint. Default isneuraltalk2/model_id1-501-1448236541.t7gpuid: 0-indexed id of GPU to use. Default is -1 = CPUbackend: Backend to use with loadcaffe. Default iscudnnsave_as_heatmap: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1
Examples
a dog and cat posing for a picture

a bathroom with a toilet and a sink

License
BSD
3rd-party
- VQA_LSTM_CNN, BSD
- neuraltalk2, BSD