Prompt Gemini Script with Raw Output Option

October 23, 2024 ยท View on GitHub

human (Oct 23, 2024, 10:52 AM)

#!/bin/bash

# Check if GOOGLE_API_KEY is set
if [ -z "$GOOGLE_API_KEY" ]; then
    echo "Error: GOOGLE_API_KEY environment variable is not set" >&2
    exit 1
fi

# Default model
model="8b"
prompt=""
image_file=""

# Parse arguments
while [[ $# -gt 0 ]]; do
    case \$1 in
        -m)
            model="\$2"
            shift 2
            ;;
        *)
            if [ -z "$prompt" ]; then
                prompt="\$1"
            elif [ -z "$image_file" ]; then
                image_file="\$1"
            fi
            shift
            ;;
    esac
done

# Validate prompt
if [ -z "$prompt" ]; then
    echo "Error: No prompt provided" >&2
    echo "Usage: prompt-gemini \"prompt\" [image_file] [-m model]" >&2
    exit 1
fi

# Map model names to full model strings
case $model in
    "8b"|"flash-8b")
        model_string="gemini-1.5-flash-8b-latest"
        ;;
    "flash")
        model_string="gemini-1.5-flash-latest"
        ;;
    "pro")
        model_string="gemini-1.5-pro-latest"
        ;;
    *)
        model_string="gemini-1.5-$model"
        ;;
esac

# Create temporary file
temp_file=$(mktemp)
trap 'rm -f "$temp_file"' EXIT

# Determine mime type if image file is provided
if [ -n "$image_file" ]; then
    if [ ! -f "$image_file" ]; then
        echo "Error: File '$image_file' not found" >&2
        exit 1
    fi

    # Get file extension and convert to lowercase
    ext=$(echo "${image_file##*.}" | tr '[:upper:]' '[:lower:]')
    
    case $ext in
        png)
            mime_type="image/png"
            ;;
        jpg|jpeg)
            mime_type="image/jpeg"
            ;;
        gif)
            mime_type="image/gif"
            ;;
        pdf)
            mime_type="application/pdf"
            ;;
        *)
            echo "Error: Unsupported file type .$ext" >&2
            exit 1
            ;;
    esac

    # Create JSON with image data
    cat <<EOF > "$temp_file"
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "$prompt"
        },
        {
          "inlineData": {
            "data": "$(base64 -i "$image_file")",
            "mimeType": "$mime_type"
          }
        }
      ]
    }
  ]
}
EOF
else
    # Create JSON without image data
    cat <<EOF > "$temp_file"
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "$prompt"
        }
      ]
    }
  ]
}
EOF
fi

# Make API request
curl -s "https://generativelanguage.googleapis.com/v1beta/models/$model_string:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d @"$temp_file" | jq

Modify this script to add an extra -r option which, if present, causes the final line to pipe through jq like this:

... | jq '.candidates[0].content.parts[0].text' -r

assistant (Oct 23, 2024, 10:52 AM)

I'll help modify the script to add the raw output option.

Modified Prompt Gemini Script with Raw Output Option

#!/bin/bash

# Check if GOOGLE_API_KEY is set
if [ -z "$GOOGLE_API_KEY" ]; then
    echo "Error: GOOGLE_API_KEY environment variable is not set" >&2
    exit 1
fi

# Default model and options
model="8b"
prompt=""
image_file=""
raw_output=false

# Parse arguments
while [[ $# -gt 0 ]]; do
    case \$1 in
        -m)
            model="\$2"
            shift 2
            ;;
        -r)
            raw_output=true
            shift
            ;;
        *)
            if [ -z "$prompt" ]; then
                prompt="\$1"
            elif [ -z "$image_file" ]; then
                image_file="\$1"
            fi
            shift
            ;;
    esac
done

# Validate prompt
if [ -z "$prompt" ]; then
    echo "Error: No prompt provided" >&2
    echo "Usage: prompt-gemini \"prompt\" [image_file] [-m model] [-r]" >&2
    exit 1
fi

# Map model names to full model strings
case $model in
    "8b"|"flash-8b")
        model_string="gemini-1.5-flash-8b-latest"
        ;;
    "flash")
        model_string="gemini-1.5-flash-latest"
        ;;
    "pro")
        model_string="gemini-1.5-pro-latest"
        ;;
    *)
        model_string="gemini-1.5-$model"
        ;;
esac

# Create temporary file
temp_file=$(mktemp)
trap 'rm -f "$temp_file"' EXIT

# Determine mime type if image file is provided
if [ -n "$image_file" ]; then
    if [ ! -f "$image_file" ]; then
        echo "Error: File '$image_file' not found" >&2
        exit 1
    fi

    # Get file extension and convert to lowercase
    ext=$(echo "${image_file##*.}" | tr '[:upper:]' '[:lower:]')
    
    case $ext in
        png)
            mime_type="image/png"
            ;;
        jpg|jpeg)
            mime_type="image/jpeg"
            ;;
        gif)
            mime_type="image/gif"
            ;;
        pdf)
            mime_type="application/pdf"
            ;;
        *)
            echo "Error: Unsupported file type .$ext" >&2
            exit 1
            ;;
    esac

    # Create JSON with image data
    cat <<EOF > "$temp_file"
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "$prompt"
        },
        {
          "inlineData": {
            "data": "$(base64 -i "$image_file")",
            "mimeType": "$mime_type"
          }
        }
      ]
    }
  ]
}
EOF
else
    # Create JSON without image data
    cat <<EOF > "$temp_file"
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "$prompt"
        }
      ]
    }
  ]
}
EOF
fi

# Make API request with conditional jq processing
if [ "$raw_output" = true ]; then
    curl -s "https://generativelanguage.googleapis.com/v1beta/models/$model_string:generateContent?key=$GOOGLE_API_KEY" \
        -H 'Content-Type: application/json' \
        -X POST \
        -d @"$temp_file" | jq '.candidates[0].content.parts[0].text' -r
else
    curl -s "https://generativelanguage.googleapis.com/v1beta/models/$model_string:generateContent?key=$GOOGLE_API_KEY" \
        -H 'Content-Type: application/json' \
        -X POST \
        -d @"$temp_file" | jq
fi

The key changes I made to the script are:

  1. Added a new raw_output variable initialized to false
  2. Added -r to the argument parsing section
  3. Updated the usage message to include the new -r option
  4. Modified the final API request section to conditionally use different jq processing based on the raw_output flag
  5. Kept all other functionality intact

Now you can use the script with the -r flag to get raw text output, like this:

./prompt-gemini "Tell me a joke" -r

Or without the -r flag to get the full JSON output as before:

./prompt-gemini "Tell me a joke"

human (Oct 23, 2024, 10:53 AM)

refactor that so it sets a jq variable of '.' if no -r was set and .candidates.... if the option was set

assistant (Oct 23, 2024, 10:53 AM)

Modified Prompt Gemini Script with JQ Variable

#!/bin/bash

# Check if GOOGLE_API_KEY is set
if [ -z "$GOOGLE_API_KEY" ]; then
    echo "Error: GOOGLE_API_KEY environment variable is not set" >&2
    exit 1
fi

# Default model and options
model="8b"
prompt=""
image_file=""
jq_filter="."

# Parse arguments
while [[ $# -gt 0 ]]; do
    case \$1 in
        -m)
            model="\$2"
            shift 2
            ;;
        -r)
            jq_filter=".candidates[0].content.parts[0].text"
            shift
            ;;
        *)
            if [ -z "$prompt" ]; then
                prompt="\$1"
            elif [ -z "$image_file" ]; then
                image_file="\$1"
            fi
            shift
            ;;
    esac
done

# Validate prompt
if [ -z "$prompt" ]; then
    echo "Error: No prompt provided" >&2
    echo "Usage: prompt-gemini \"prompt\" [image_file] [-m model] [-r]" >&2
    exit 1
fi

# Map model names to full model strings
case $model in
    "8b"|"flash-8b")
        model_string="gemini-1.5-flash-8b-latest"
        ;;
    "flash")
        model_string="gemini-1.5-flash-latest"
        ;;
    "pro")
        model_string="gemini-1.5-pro-latest"
        ;;
    *)
        model_string="gemini-1.5-$model"
        ;;
esac

# Create temporary file
temp_file=$(mktemp)
trap 'rm -f "$temp_file"' EXIT

# Determine mime type if image file is provided
if [ -n "$image_file" ]; then
    if [ ! -f "$image_file" ]; then
        echo "Error: File '$image_file' not found" >&2
        exit 1
    fi

    # Get file extension and convert to lowercase
    ext=$(echo "${image_file##*.}" | tr '[:upper:]' '[:lower:]')
    
    case $ext in
        png)
            mime_type="image/png"
            ;;
        jpg|jpeg)
            mime_type="image/jpeg"
            ;;
        gif)
            mime_type="image/gif"
            ;;
        pdf)
            mime_type="application/pdf"
            ;;
        *)
            echo "Error: Unsupported file type .$ext" >&2
            exit 1
            ;;
    esac

    # Create JSON with image data
    cat <<EOF > "$temp_file"
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "$prompt"
        },
        {
          "inlineData": {
            "data": "$(base64 -i "$image_file")",
            "mimeType": "$mime_type"
          }
        }
      ]
    }
  ]
}
EOF
else
    # Create JSON without image data
    cat <<EOF > "$temp_file"
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "$prompt"
        }
      ]
    }
  ]
}
EOF
fi

# Make API request with jq filter
curl -s "https://generativelanguage.googleapis.com/v1beta/models/$model_string:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d @"$temp_file" | jq "$jq_filter" -r

The key changes I made are:

  1. Replaced the raw_output boolean with a jq_filter variable that's initialized to "."
  2. Modified the -r option handling to set the jq_filter to ".candidates[0].content.parts[0].text"
  3. Simplified the final curl command to use a single line with the jq_filter variable
  4. Always use -r with jq since it doesn't affect the output when the filter is "."

The usage remains the same, but the code is now more maintainable with less duplication. The -r option still produces raw text output while omitting it shows the full JSON response.