Examples

September 18, 2024 ยท View on GitHub

xFasterTransformer provides C++, Python(Pytorch) examples to help users learn the API usage. Web demos of some models based on Gradio are provided. All of the examples and web demo support multi-rank.

C++ example

C++ example support automatic identification model and tokenizer which is implemented by SentencePiece, excluding Opt model which tokenizer is a hard code.

Python (PyTorch) example

Python(PyTorch) example achieves end-to-end inference of the model with streaming output combining the transformer's tokenizer.

Web Demo

A web demo based on Gradio is provided in repo.
Support list:

  • ChatGLM
  • ChatGLM2
  • ChatGLM3
  • ChatGLM4
  • Llama2
  • Llama3
  • Gemma
  • Yi
  • Baichuan2
  • Qwen
  • Qwen2