Kaggle CLI
June 18, 2018 · View on GitHub
(CLI = Command Line Interface)
Resource
Installation
Check to see if kaggle-cli is installed:
kaggle-cli --version
Install kaggle-cli:
pip install kaggle-cli
or pip3 install kaggle-cli
May need to update package if you run into errors:
pip install kaggle-cli --upgrade
or pip3 install kaggle-cli --upgrade
Kaggle Competition Datasets
Note 1: You must have a Kaggle user ID and password. If you logged in to Kaggle using FB or LI, you'll have to reset your password, as that is needed for command line access to the data.
Note 2: Pick a competition, and ensure you have accepted the rules of that competition. Otherwise, you will not be able to download the data using the CLI.
Step 1: Identify the competition I will use
https://www.kaggle.com/c/dogs-vs-cats
Note: the competition name can be found in the url; here it is dogs-vs-cats
Step 2: Accept competition rules
https://www.kaggle.com/c/dogs-vs-cats/rules
Step 3: Set up data directory
ls
mkdir data
cd data
my example
ubuntu@ip-10-0-0-13:~$ ls
anaconda2 anaconda3 downloads git nbs temp
ubuntu@ip-10-0-0-13:~$ mkdir data
ubuntu@ip-10-0-0-13:~$ cd data
Step 4a: Download data (try 1)
Syntax:
kg config -g -u 'username' -p 'password' -c 'competition'
kg download
Note: Here's an example of warning message I receive when I tried to download data before accepting the rules of the competition:
my example
ubuntu@ip-10-0-0-13:~/data$ kg config -g -u 'reshamashaikh' -p 'xxx' -c dogs-vs-cats
ubuntu@ip-10-0-0-13:~/data$ kg download
Starting new HTTPS connection (1): www.kaggle.com
downloading https://www.kaggle.com/c/dogs-vs-cats/download/sampleSubmission.csv
sampleSubmission.csv N/A% | | ETA: --:--:-- 0.0 s/B
Warning: download url for file sampleSubmission.csv resolves to an html document rather than a downloadable file.
Is it possible you have not accepted the competition's rules on the kaggle website?
Step 4b: Dowload data (try 2)
Note 1: I have accepted the competition rules; will try downloading again
config -g -u 'username' -p 'password' -c 'competition'
kg download
my example
ubuntu@ip-10-0-0-13:~/data$ kg config -g -u 'reshamashaikh' -p 'xxx' -c dogs-vs-cats
ubuntu@ip-10-0-0-13:~/data$ kg download
Starting new HTTPS connection (1): www.kaggle.com
downloading https://www.kaggle.com/c/dogs-vs-cats/download/sampleSubmission.csv
Starting new HTTPS connection (1): storage.googleapis.com
sampleSubmission.csv 100% |##################################################################################################################| Time: 0:00:00 320.2 KiB/s
downloading https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip
test1.zip 100% |#############################################################################################################################| Time: 0:00:08 32.5 MiB/s
downloading https://www.kaggle.com/c/dogs-vs-cats/download/train.zip
train.zip 100% |#############################################################################################################################| Time: 0:00:17 31.4 MiB/s
Download Kaggle Data (another way)
Note: sometimes setting up the configuration results in an error the next time you try to download another competition. You may want to bypass configuration and directly include your user ID, password and competition name in one command line.
kg download -u 'reshamashaikh' -p 'xxx' -c statoil-iceberg-classifier-challenge
Step 5: Look at data that was downloaded
ls -alt
ubuntu@ip-10-0-0-13:~/data$ ls -alt
total 833964
-rw-rw-r-- 1 ubuntu ubuntu 569546721 Nov 4 18:24 train.zip
drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 4 18:24 .
-rw-rw-r-- 1 ubuntu ubuntu 284321224 Nov 4 18:24 test1.zip
-rw-rw-r-- 1 ubuntu ubuntu 88903 Nov 4 18:23 sampleSubmission.csv
drwxr-xr-x 22 ubuntu ubuntu 4096 Nov 4 18:23 ..
ubuntu@ip-10-0-0-13:~/data$
Step 6: Unzip Files
Note 1: You will need to install and use unzip to unzip files.
For Window users:
-
First Download ubuntu from Window Microsoft Store
-
Open PowerShell as Administrator and run:
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux -
Once the download has completed, select "Launch".This will open a console window. Wait for installation to complete then you will be prompted to create your LINUX user account.
-
Create your LINUX username and password.
-
Go to Control Panel and Turn on Developer Mode .
-
Run
bashfrom command-prompt. After that you can follow same as Linux users guide.
For Linux Users:
sudo apt install unzip
unzip train.zip
unzip -q test.zip (Note: -q means to unzip quietly, suppressing the printing)
ubuntu@ip-10-0-0-13:~/nbs/data$ ls train/dogs/dog.1.jpg
train/dogs/dog.1.jpg
ubuntu@ip-10-0-0-13:~/nbs/data$ ls -l train/dogs/ | wc -l
12501
ubuntu@ip-10-0-0-13:~/nbs/data$
ubuntu@ip-10-0-0-13:~/nbs/data$ ls -l train/cats/ | wc -l
12501
ubuntu@ip-10-0-0-13:~/nbs/data$
ubuntu@ip-10-0-0-13:~/nbs/data$ ls test1 | wc -l
12500
ubuntu@ip-10-0-0-13:~/nbs/data$
Kaggle - Submit Results
kg submit <submission-file> -u <username> -p <password> -c <competition> -m "<message>"
my example
/home/ubuntu/data/iceberg/sub
(fastai) ubuntu@ip-172-31-2-59:~/data/iceberg/sub$
kg submit resnext50_sz150_zm13.csv -u 'reshamashaikh' -p 'xxx' -c statoil-iceberg-classifier-challenge
Jeremy’s Setup
Good to copy 100 or so the sample directory; enough to check that the scripts are working
Advice 1: Separate TEST data into VALIDATION TASK: move 1000 each dogs / cats into valid
> ls valid/cats/ | wc -l
1000
> ls valid/dogs/ | wc -l
1000
Advice 2: Do all of your work on sample data
> ls sample/train
> ls sample/valid
> ls sample/train/cats | wc -l
8
> ls sample/valid/cats | wc -l
4
Kaggle API
Another option is to use the Kaggle API https://github.com/Kaggle/kaggle-api