README.md

October 22, 2023 · View on GitHub

llama2.rs.wasm 🦀

A dirty and minimal port of @rahoua llama2.rs

Cute Llama

How to run?

Clone repo

git clone https://github.com/mtb0x1/llama2.rs.wasm
cd llama2.rs.wasm/port5/

Download @Karpathy's baby Llama2 (Orig instructions) pretrained on TinyStories dataset and place them in www folder.

wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin

stories42M is used by default (for now @todo), you can change this in index.html

Run (requires wasm-pack)

wasm-pack build --release --target web --out-dir www/pkg/

Run a minimal webserver with www folder :
1. Run (requires python 3), you can use other webservers if you want
```
cd www && python3 -m http.server 8080
```
1. go to http://localhost:8080/
2. open browser console (@todo)
(Optional) if you want to make changes :(reload browser/clear cache after changes)
1. Changing lib.rs content :
```
wasm-pack build --release --target web --out-dir www/pkg/
```
2. Changing the frontend index.html
3. Changing model/tokenizer :
  - Follow @Karpathy's instructions in llama2.c
  - Place new files in www folder and edit index.html if needed

Performance

Temperature : 0.9
Sequence length: 20

tok/s	15M	42M	110M	7B
wasm v1	?	?	?	?

Not really sure about result (yet!).

todo/Next ?

Tests
Display bench result in webpage instead of browser console (wip need cleaning and remove console.info hack)
Infrence based on user inputs (done)
Optmization : simd, rayon (wip) ... etc

License

MIT