A5 Methods and Code Examples for Building LLM Services
In the constantly evolving field of large language models (LLMs), the tools and technologies used to support these models are advancing at the same speed as the models themselves. In this article, we will summarize 5 methods for building open source large language model services, each with detailed operational steps and their respective advantages and disadvantages.
1、Anaconda + CPU
We first introduce the entry-level method with the lowest threshold, because this method does not require a GPU. Basically, as long as there is a decent CPU and enough RAM, it can run.
Here we use llama.cpp and its python to bind llama-cpp-python.
pip install llama-cpp-python[server] \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
Create a directory named models/7B to store the downloaded models. Then use the command to download the quantized model in GGUF format.
mkdir -p models/7B
wget -O models/7B/llama-2-7b-chat.Q5_K_M.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
Then you can run the following command to start the server:
python3 -m llama_cpp.server --model…