codellama

August 8, 2024 · View on GitHub

Introduction

codellama features enhanced coding capabilities. It can generate code and natural language about code, from both code and natural language prompts (e.g., “Write me a function that outputs the fibonacci sequence”). It can also be used for code completion and debugging. It supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash and more.

There are three sizes (7b, 13b, 34b) as well as three flavours (base model, Python fine-tuned, and instruction tuned) released on HuggingFace.

Base ModelPythonInstruct
codellama/CodeLlama-7b-hfcodellama/CodeLlama-7b-Python-hfcodellama/CodeLlama-7b-Instruct-hf
codellama/CodeLlama-13b-hfcodellama/CodeLlama-13b-Python-hfcodellama/CodeLlama-13b-Instruct-hf
codellama/CodeLlama-34b-hfcodellama/CodeLlama-34b-Python-hfcodellama/CodeLlama-34b-Instruct-hf

The correspondence between the model and capabilities is:

modelscode completioninfillinginstructions / chatpython specialist
Base ModelYY(7B,13B), N(34B)NN
PythonYNNY
InstructYY(7B,13B), N(34B)YN

Inference

Based on the above table, this section shows how to utilize CodeLlama's capabilities by examples

Completion

from lmdeploy import pipeline, GenerationConfig, ChatTemplateConfig

pipe = pipeline('meta-llama/CodeLlama-7b-hf',
                chat_template_config=ChatTemplateConfig(
                    model_name='codellama',
                    capability='completion'
                ))

response = pipe(
    'import socket\n\ndef ping_exponential_backoff(host: str):',
    gen_config=GenerationConfig(
        top_k=10,
        temperature=0.1,
        top_p=0.95
    )
)
print(response.text)

Infilling

from lmdeploy import pipeline, GenerationConfig, ChatTemplateConfig

pipe = pipeline('meta-llama/CodeLlama-7b-hf',
                chat_template_config=ChatTemplateConfig(
                    model_name='codellama',
                    capability='infilling'
                ))

prompt = """
def remove_non_ascii(s: str) -> str:
    \"\"\"
    <FILL>
    \"\"\"
    return result
"""
response = pipe(
    prompt,
    gen_config=GenerationConfig(
        top_k=10,
        temperature=0.1,
        top_p=0.95,
        max_new_tokens=500
    )
)
print(response.text)

Chat

from lmdeploy import pipeline, GenerationConfig, ChatTemplateConfig

pipe = pipeline('meta-llama/CodeLlama-7b-Instruct-hf',
                chat_template_config=ChatTemplateConfig(
                    model_name='codellama',
                    capability='chat'
                ))

response = pipe(
    'implement quick sort in C++',
    gen_config=GenerationConfig(
        top_k=10,
        temperature=0.1,
        top_p=0.95
    )
)
print(response.text)

Python specialist

from lmdeploy import pipeline, GenerationConfig, ChatTemplateConfig

pipe = pipeline('meta-llama/CodeLlama-7b-Python-hf',
                chat_template_config=ChatTemplateConfig(
                    model_name='codellama',
                    capability='python'
                ))

response = pipe(
    'implement quick sort',
    gen_config=GenerationConfig(
        top_k=10,
        temperature=0.1,
        top_p=0.95
    )
)
print(response.text)

Quantization

TBD

Serving

Prepare a chat template json file, for instance "codellama.json", with the following content:

{
    "model_name": "codellama",
    "capability": "completion"
}

Then launch the service as follows:

lmdeploy serve api_server meta-llama/CodeLlama-7b-Instruct-hf --chat-template codellama.json

After the service is launched successfully, you can access the service with openai package:

from openai import OpenAI
client = OpenAI(
    api_key='YOUR_API_KEY',
    base_url="http://0.0.0.0:23333/v1"
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
  model=model_name,
  messages=[
    {"role": "user", "content": "import socket\n\ndef ping_exponential_backoff(host: str):"},
  ],
    temperature=0.1,
    top_p=0.95,
    max_tokens=500
)
print(response)

Regarding the detailed information of the api_server, you can refer to the guide.