Data Science Articles from CodeCut

December 2, 2025 Β· View on GitHub

πŸ“¦ We’ve Relocated
The contents of this repository are now hosted at github.com/khuyentran1401/codecut-blog. Please follow the new repo to stay updated.

Data Science Articles from CodeCut

About CodeCut

CodeCut is the platform that helps data scientists stay productive and current by delivering short, practical code examples that highlight modern tools in action.

It's the resource you wish you had when learning a new libraryβ€”clean, concise, and instantly applicable.

Article Collection

This repository is a curated collection of data science articles from CodeCut, covering topics like MLOps, data management, testing, visualization, and more. Each article comes with practical examples, code repositories, and video tutorials to help you quickly implement these tools and practices in your own projects.

CategoryTitleArticleRepositoryVideo
MLOpsGoodbye Pip and Poetry. Why UV Might Be All You NeedπŸ”—
MLOpsStop Hard Coding in a Data Science Project – Use Configuration Files InsteadπŸ”—πŸ”—πŸ”—
MLOpsPoetry: A Better Way to Manage Python DependenciesπŸ”—πŸ”—
MLOpsGit for Data Scientists: Learn Git through Practical ExamplesπŸ”—πŸ”—
MLOps4 pre-commit Plugins to Automate Code Reviewing and Formatting in PythonπŸ”—πŸ”—πŸ”—
MLOpsHow to Structure a Data Science Project for MaintainabilityπŸ”—πŸ”—πŸ”—
MLOpsBuild Reliable Machine Learning Pipelines with Continuous IntegrationπŸ”—πŸ”—πŸ”—
MLOpsAutomate Machine Learning Deployment with GitHub ActionsπŸ”—πŸ”—πŸ”—
MLOpsHow to Build a Fully Automated Data Drift Detection PipelineπŸ”—πŸ”—πŸ”—
Data Management ToolsVersion Control for Data and Models Using DVCπŸ”—πŸ”—πŸ”—
Data Management ToolsWhat is dbt (data build tool) and When should you use it?πŸ”—πŸ”—πŸ”—
Data Management ToolsStreamline dbt Model Development with Notebook-Style WorkspaceπŸ”—πŸ”—πŸ”—
TestingPytest for Data ScientistsπŸ”—πŸ”—πŸ”—
Python Helper ToolsWrite Clean Python Code Using PipesπŸ”—πŸ”—πŸ”—
Python Helper ToolsIntroducing FugueSQL β€” SQL for Pandas, Spark, and Dask DataFramesπŸ”—πŸ”—
Python Helper ToolsFugue and DuckDB: Fast SQL Code in PythonπŸ”—πŸ”—
Python Helper ToolsMarimo: A Modern Notebook for Reproducible Data ScienceπŸ”—πŸ”—
Feature EngineeringPolars vs. Pandas: A Fast, Multi-Core Alternative for DataFramesπŸ”—πŸ”—
VisualizationTop 6 Python Libraries for Visualization: Which one to Use?πŸ”—πŸ”—
PythonPython Clean Code: 6 Best Practices to Make Your Python Functions More ReadableπŸ”—πŸ”—πŸ”—
Logging and DebuggingLoguru: Simple as Print, Flexible as LoggingπŸ”—πŸ”—πŸ”—
LLMEnforce Structured Outputs from LLMs with PydanticAIπŸ”—πŸ”—
LLMRun Private AI Workflows with LangChain and OllamaπŸ”—πŸ”—
Speed-up ToolsWriting Safer PySpark Queries with ParametersπŸ”—πŸ”—
Speed-up ToolsNarwhals: Unified DataFrame Functions for pandas, Polars, and PySparkπŸ”—πŸ”—
Speed-up ToolsEager to Lazy DataFrames with NarwhalsπŸ”—πŸ”—
Speed-up ToolsScaling Pandas Workflows with PySpark's Pandas APIπŸ”—πŸ”—

Contributing

If you're passionate about data science and want to share your knowledge about open-source tools for data processing and LLM applications in Python, we'd love to have you contribute!

To contribute:

  1. Create a GitHub issue:
    • Click on the "Issues" tab
    • Click "New issue"
    • Select "Article Topic Suggestion" template
    • Fill in the template with your article proposal
  2. Read our contribution guidelines