README.md
June 6, 2026 · View on GitHub
Project: Classify Kaggle San Francisco Crime Description
Highlights
- Multi-class text classification (sentence classification) problem.
- Classify Kaggle San Francisco Crime Descript into 39 Category labels.
- Hybrid TextCNN + GRU model implemented with TensorFlow 2.x Keras.
Data: Kaggle San Francisco Crime
- Input: Descript
- Output: Category
Examples:
| Descript | Category |
|---|---|
| GRAND THEFT FROM LOCKED AUTO | LARCENY/THEFT |
| POSSESSION OF NARCOTICS PARAPHERNALIA | DRUG/NARCOTIC |
Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Training data is included at ./data/train.csv.zip. Prediction sample data is at ./data/small_samples.csv.
Train
python3 train.py ./data/train.csv.zip ./training_config.json
Artifacts are written to ./trained_results_<timestamp>/:
saved_model/— exported SavedModel for inferencebest_model.keras— best validation checkpoint (primary load path for predict)words_index.json— vocabulary mappinglabels.json— class labelstrained_parameters.json— hyperparameters and sequence length
Predict
python3 predict.py ./trained_results_<timestamp>/ ./data/small_samples.csv
Predictions are saved to ./predicted_results_<timestamp>/predictions_all.csv.