LLM Deploy

Posted Nov 10, 2023 Updated Mar 24, 2024

Responsive rendering of Chirpy theme on multiple devices.

By Jiacen (Jason) Xu

1 min read

In the blog, I would like to talk about my exploiting of Large Language Deployment in a linux server.

TL; DR: I use LMDeploy to deploy the large language models like llama2 with the TurboMind backend and gradio frontend.

1.0 Preparation

Use the following command to install lmdeploy.

        
      
pip install lmdeploy

# install git lfs following the link (https://github.com/git-lfs/git-lfs/blob/main/INSTALLING.md)
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs

# 1. Download InternLM model

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b

# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1

# 2. Convert InternLM model to turbomind's format, which will be in "./workspace" by default
lmdeploy convert internlm-chat-7b /path/to/internlm-chat-7b

2.0 Deploy

Serve with Gradio:

lmdeploy serve gradio ./workspace

Serve with Restful API:

https://github.com/InternLM/lmdeploy/blob/main/docs/en/restful_api.md

3.0 Different Models

The commands to serve different models can be found in the link: https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving.md.

After deploying the models in Ubuntu Linux, use ssh -N -f -L 6006:localhost:6006 <username>@<server ip> to build a ssh channel to get the access to the models.

Appendix

Open-Source Models:

Mistral-7B-Instruct-v0.2: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

Blogging

PhD

This post is licensed under CC BY 4.0 by the author.

LLM Deploy

1.0 Preparation

2.0 Deploy

3.0 Different Models

Appendix

Further Reading

Coding Interview

[DSN2023] On Adversarial Robustness of Point Cloud Semantic Segmentation

C&W Attack