LLM Deploy

Posted Nov 10, 2023 Updated Jan 9, 2024

Responsive rendering of Chirpy theme on multiple devices.

By Jiacen (Jason) Xu

1 min read

In the blog, I would like to talk about my exploiting of Large Language Deployment in a linux server.

TL; DR: I use LMDeploy to deploy the large language models like llama2 with the TurboMind backend and gradio frontend.

1.0 Preparation

Use the following command to install lmdeploy.

        
      
pip install lmdeploy

# install git lfs following the link (https://github.com/git-lfs/git-lfs/blob/main/INSTALLING.md)
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs

# 1. Download InternLM model

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-chat-7b-v1_1 /path/to/internlm-chat-7b

# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1

# 2. Convert InternLM model to turbomind's format, which will be in "./workspace" by default
lmdeploy convert internlm-chat-7b /path/to/internlm-chat-7b

2.0 Deploy

Serve with Gradio:

lmdeploy serve gradio ./workspace

Serve with Restful API:

https://github.com/InternLM/lmdeploy/blob/main/docs/en/restful_api.md

3.0 Different Models

The commands to serve different models can be found in the link: https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving.md.

After deploying the models in Ubuntu Linux, use ssh -N -f -L 6006:localhost:6006 <username>@<server ip> to build a ssh channel to get the access to the models.

Blogging

PhD

This post is licensed under CC BY 4.0 by the author.

LLM Deploy

1.0 Preparation

2.0 Deploy

3.0 Different Models

Further Reading

Coding Interview

[DSN2023] On Adversarial Robustness of Point Cloud Semantic Segmentation

C&W Attack