README.md

September 9, 2025 ยท View on GitHub

Adding recipes to Weaviate Docs

In your PR, you should make sure you've completed the following steps:

  1. Add an entry for your recipe into index.toml. Add any relevant and optional tags: tags: a list of topic tags, agent: true or false, integration: true or false etc.
  2. pip install -r requirements.txt
  3. python scripts/generate_markdowns.py: This will generate a markdown version of your recipe including the frontmatter needed for the website. Check that the markdown looks correct and fix any errors that running this script causes.
  4. Create the PR which should includes: your recipe, an edit to the index.toml as well as the generated markdown in markdowns/

After you have created the markdown files and updated the index.toml file, you can copy them into the Weaviate Docs repository and create a PR.

Important

Make sure you adhere to these rules:

  • Use the tag description in the index.toml file for LLM friendliness.
  • Any JSX element can be written with a self-closing tag, and every element must be closed. The image tag, for example, must always be written as (and not ) in order to be valid JSX that can be transpiled.
  • Handling images: If your recipe has an image displayed in it, we recommend embedding the image via it's GitHub URL. Images generated from code can't be transformed into markdown so instead save them and explicitly import them into the notebook.
  • If you are making changed to the recipes repository layout, please make sure you've corrected the recipe paths in index.toml if they have changed location!

Jupyter Notebook to Markdown converter

This tool converts Jupyter notebooks to Markdown format optimized for Docusaurus documentation. The generate_markdown.py script processes Jupyter notebooks defined in the /index.toml configuration file, converting them to Markdown files with appropriate frontmatter, formatting, and enhancements for display in a Docusaurus documentation site.

Usage

python generate_markdown.py [--config CONFIG_PATH] [--output OUTPUT_DIR]

Arguments:

  • --config: Path to the TOML configuration file (default: /index.toml)
  • --output: Base directory for markdown output (default: /markdowns)

Configuration File Structure

The items in the index.toml configuration file should have the following structure:

[[recipe]]
title = "My Notebook Title"
notebook = "path/to/notebook.ipynb"
featured = false
integration = false
agent = false
tags = ["tag1", "tag2"]

What the script does

  1. Reads the TOML configuration file that specifies notebooks to convert
  2. For each notebook:
    • Generates frontmatter with metadata (title, tags, etc.)
    • Adds a Colab badge for easy opening in Google Colab
    • Converts the notebook content to Markdown
    • Applies various transformations to make it Docusaurus-compatible
    • Outputs the transformed Markdown to the specified directory

The notebook_converter.py script performs the following steps in the conversion process:

  1. Generate Frontmatter: Creates YAML frontmatter with metadata from the configuration
  2. Add Colab Badge: Inserts an HTML badge after frontmatter for easy opening in Google Colab
  3. Load & Clean Notebook: Processes the notebook and cleans Colab-specific dataframe outputs
  4. Convert to Markdown: Uses nbconvert to transform the notebook to Markdown
  5. Format Indented Output: Properly formats code output blocks with ```text markers
    • Preserves Python code within output blocks
    • Escapes backticks within output blocks
    • Strips ANSI color codes from outputs
  6. Clean HTML Tags: Removes span tags from pre blocks and transforms style attributes to React format
  7. Apply Docusaurus Fixes: Converts inline styles to React style format
  8. Escape Special Characters: Handles special characters for MDX compatibility
  9. Fix Image Paths: Updates image paths to reference the GitHub repository
  10. Remove WhatNext Component: Removes WhatNext import and component references
  11. Remove First H1 Heading: Removes the first H1 heading if it exists
  12. Write Output: Saves the processed Markdown to the output directory

Output Structure

The output will be organized into subdirectories based on notebook type:

  • agents/ - For agent-related notebooks
  • integrations/ - For integration-related notebooks
  • weaviate/ - For general Weaviate notebooks