vllamacpp
June 3, 2026 · View on GitHub
English|中文
v_llama_cpp is the V language binding for llama.cpp, allowing you to directly use llama.cpp functionality in V language projects.
What is llama.cpp?
llama.cpp is an LLM (Large Language Model) inference framework implemented in C++, with the following main features:
- Pure CPU Inference: Run large models without a GPU
- Quantization Support: Supports INT4, INT5, INT8 and other quantization formats, significantly reducing memory requirements
- Cross-Platform: Works on Windows, Linux, macOS, and even mobile devices
- Efficient Performance: Optimized for ordinary hardware, runs on regular laptops
Simply put, llama.cpp allows you to run large models like Deepseek, Qwen, ChatGLM locally on consumer-grade hardware.
Installation
Manual Setup
It is recommended to download the source code using git:
# Download from Github
git clone https://github.com/sakana-ctf/v_llama_cpp
# For users in China, download from atomgit
git clone https://atomgit.com/sakana-ctf/v_llama_cpp
# Or download from Gitee
git clone https://gitee.com/sakana_ctf/v_llama_cpp
Build and check the llama.cpp environment; if the llama.cpp environment does not exist, it will attempt to install it:
v install.vsh
Note: Installing llama.cpp with vlang may require root privileges. You can use sudo v build.vsh
Uninstall
A convenient method is now provided to uninstall the current repository:
v unstall.vsh
If you had configured v_llama_cpp before updating, it will be uninstalled first and then reinstalled during the installation process.
Usage
Example
Several basic examples are provided in the ./examples/ folder. Below is the simplest calling method: ./examples/ez_simple.v:
module main
import os
import v_llama_cpp {
ModelUrl,
}
fn main() {
model_url := ModelUrl{
url: [
'https://www.modelscope.cn/models/bartowski/google_gemma-3-1b-it-GGUF/resolve/master/google_gemma-3-1b-it-Q4_0.gguf',
'https://huggingface.co/bartowski/google_gemma-3-1b-it-GGUF/resolve/main/google_gemma-3-1b-it-Q4_0.gguf',
]
sha256: '4c62ce8950bc6d5ba5124a70fc13ece971fabd4dc5705477f305a6c3eb6294cd'
}
model_path := './google_gemma-3-1b-it-Q4_0.gguf'
mut ctx := ModelUrl(model_url).ez_load_model(model_path, -1, 2048, 512) or {
println('load model failed.')
return
}
input_buffer := os.input('>')
prompt := '<start_of_turn>user\n${input_buffer}<end_of_turn>\n<start_of_turn>model\n'
print('gemma: ')
ctx.ez_response(prompt, 512, 256, print_token) or { println('response failed.') }
print('\n')
}
fn print_token(token string) {
print(token)
}
The model file will be automatically downloaded to the ./google_gemma-3-1b-it-Q4_0.gguf directory where the program is located. It is recommended to obtain model files from the following sources: