Yuanchat Server Introduce

January 3, 2024 · View on GitHub

Yuanchat Server Introduce

English / 简体中文

Yuan2.0 Inference Server Deploy

Yuan-2.0, deploy docker instruction, run inference get model request url http://127.0.0.1:8000 instruction

API Server Deploy

You can run the docker_build.sh to build the image, and then run the docker_run.sh to run a docker container, which located on the root path. Attention, if your Yuan2.0 model server deployed at another machine, you need to add a -e YUAN_2_URL=http://ip:port/yuan in your docker run command to specify the Yuan2.0 model address.

API Server Introduce

We provide only one interface for UI in our back end service. The code is written in Python language. provided in a Restful manner, and interacts with the UI in SSE(Server Send Event) mode.

Interface info follows:

interface url:

http://ip:port/sse/subscribe

here, ip is you server ip which deploy the docker image, and the port is exposed by 5050 in docker(referred the docker_run.sh). this means you can use http://ip:5050/sse/subscribe to get the result.

Http Method: POST

Content-Type: application/json

Response Content-Type: text/event-stream

Description: user send the request, and get the result of Yuan2.0

Request Params:

paramparam descriptionparam typenecessarynote
messagesuser requestlist[dict_object]yesdetail info referred bellow
response_lengththe max token length of Yuan2.0 results. The larger the parameter setting, the longer it takes for the model to generate answers. Setting it too short may affect the integrity of the generated results. Token may be a word, word, or punctuation markintnodefault value:5000, range[0,8000]
temperatureThe larger the temperature value, the stronger the creativity of the model, but the generation effect is unstable.The smaller the temperature value, the stronger the stability of the model and the more stable the generation effect.floatnodefault value:0.6, range:[0,1]
top_pThe probability accumulation of generating tokens starts from the token with the highest probability and stops when the accumulation value is greater than or equal to top_p. When top_p is 0, this parameter has no effect.floatnodefault value:0.95, range:[0,1], can not be effective with top_k
top_kSelect the k tokens with the highest probability as the candidate set. If the value of k is 1, then the answer is unique. When topk is 0, this parameter has no effect. This parameter is related to top p cannot work simultaneously.intnodefault value:0, range:[0,10], can not be effective with top_p

detail introduction for dict_object in message: message is a dict type param, and use to transfer multi dialog info, the length is recommended to be less than 10.

paramparam descriptionparam typenecessarynote
questionuser requeststringyesThe value is the user's historical request information
answerYuan2.0 resultstringnoThe value is the system's historical inference result, corresponding one-to-one with the question, and the last one can be set as an empty string

Example:

{
	"messages":[
            {
                "question": "你好",
                "answer": "你好"
            },
            {
                "question": "1+1=?",
                "answer": ""
            }
        ],
	"top_p":0.95,
	"top_k":0,
	"response_length":50,
	"temperature":0.6
}

Response Info:

paramparam descriptionparam typenecessarynote
errCodeerror code of resultstringerror code of Yuan2.0 result
errMessageerror message of resultstringerror message of Yuan2.0 result
exceptionMsgexception infostringexception info
flagstatus of resultbooleanstatus of result, true:system normal, false: system error
resDataresult infodict_objectresult info of Yuan2.0, detail refers next

detail info for dict_object of resData:

paramparam descriptionparam typenecessarynote
messagedetail of resultstring
timetime of resultinttime info, 13 digit timestamp

The above is the description of the backend interface service. If secondary development is required, you can go to the code /src/python/yuan_process to view and modify in yuanchat.py.