README.md

June 6, 2026 · View on GitHub

Project: Classify Kaggle San Francisco Crime Description

Highlights

  • Multi-class text classification (sentence classification) problem.
  • Classify Kaggle San Francisco Crime Descript into 39 Category labels.
  • Hybrid TextCNN + GRU model implemented with TensorFlow 2.x Keras.

Data: Kaggle San Francisco Crime

  • Input: Descript
  • Output: Category

Examples:

DescriptCategory
GRAND THEFT FROM LOCKED AUTOLARCENY/THEFT
POSSESSION OF NARCOTICS PARAPHERNALIADRUG/NARCOTIC

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Training data is included at ./data/train.csv.zip. Prediction sample data is at ./data/small_samples.csv.

Train

python3 train.py ./data/train.csv.zip ./training_config.json

Artifacts are written to ./trained_results_<timestamp>/:

  • saved_model/ — exported SavedModel for inference
  • best_model.keras — best validation checkpoint (primary load path for predict)
  • words_index.json — vocabulary mapping
  • labels.json — class labels
  • trained_parameters.json — hyperparameters and sequence length

Predict

python3 predict.py ./trained_results_<timestamp>/ ./data/small_samples.csv

Predictions are saved to ./predicted_results_<timestamp>/predictions_all.csv.

Reference