In the blog, I would like to talk about my exploiting of Large Language Deployment in a linux server.
TL; DR: I use LMDeploy to deploy the large language models like llama2 with the TurboMind backend and gradio frontend.
1.0 Preparation
Use the following command to install lmdeploy.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
pip install lmdeploy
# install git lfs following the link (https://github.com/git-lfs/git-lfs/blob/main/INSTALLING.md)
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
# 1. Download InternLM model
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1
# 2. Convert InternLM model to turbomind's format, which will be in "./workspace" by default
lmdeploy convert internlm-chat-7b /path/to/internlm-chat-7b
2.0 Deploy
Serve with Gradio:
1
lmdeploy serve gradio ./workspace
Serve with Restful API:
https://github.com/InternLM/lmdeploy/blob/main/docs/en/restful_api.md
3.0 Different Models
The commands to serve different models can be found in the link: https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving.md.
After deploying the models in Ubuntu Linux, use ssh -N -f -L 6006:localhost:6006 <username>@<server ip>
to build a ssh channel to get the access to the models.
Appendix
Open-Source Models:
- Mistral-7B-Instruct-v0.2: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2