Home LLM Deploy
Post
Cancel

LLM Deploy

In the blog, I would like to talk about my exploiting of Large Language Deployment in a linux server.

TL; DR: I use LMDeploy to deploy the large language models like llama2 with the TurboMind backend and gradio frontend.

1.0 Preparation

Use the following command to install lmdeploy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
pip install lmdeploy

# install git lfs following the link (https://github.com/git-lfs/git-lfs/blob/main/INSTALLING.md)
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs

# 1. Download InternLM model

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b

# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1

# 2. Convert InternLM model to turbomind's format, which will be in "./workspace" by default
lmdeploy convert internlm-chat-7b /path/to/internlm-chat-7b

2.0 Deploy

Serve with Gradio:

1
lmdeploy serve gradio ./workspace

Serve with Restful API:

https://github.com/InternLM/lmdeploy/blob/main/docs/en/restful_api.md

3.0 Different Models

The commands to serve different models can be found in the link: https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving.md.

After deploying the models in Ubuntu Linux, use ssh -N -f -L 6006:localhost:6006 <username>@<server ip> to build a ssh channel to get the access to the models.

This post is licensed under CC BY 4.0 by the author.

[SP2024] Understanding and Bridging the Gap Between Unsupervised Network Representation Learning and Security Analytics

-