README.md
October 22, 2023 ยท View on GitHub
llama2.rs.wasm ๐ฆ
A dirty and minimal port of @rahoua llama2.rs
How to run?
- Clone repo
git clone https://github.com/mtb0x1/llama2.rs.wasm
cd llama2.rs.wasm/port5/
- Download @Karpathy's baby Llama2 (Orig instructions) pretrained on TinyStories dataset and place them in
wwwfolder.
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
stories42M is used by default (for now @todo), you can change this in
index.html
- Run (requires wasm-pack)
wasm-pack build --release --target web --out-dir www/pkg/ - Run a minimal webserver with
wwwfolder :- Run (requires python 3), you can use other webservers if you want
cd www && python3 -m http.server 8080- go to http://localhost:8080/
- open browser console (@todo)
- (Optional) if you want to make changes :(reload browser/clear cache after changes)
Performance
- Temperature : 0.9
- Sequence length: 20
| tok/s | 15M | 42M | 110M | 7B |
|---|---|---|---|---|
| wasm v1 | ? | ? | ? | ? |
Not really sure about result (yet!).
todo/Next ?
- Tests
- Display bench result in webpage instead of browser console (wip need cleaning and remove console.info hack)
- Infrence based on user inputs (done)
- Optmization : simd, rayon (wip) ... etc
License
MIT