Member-only story

A5 Methods and Code Examples for Building LLM Services

7 min readJun 27, 2024

In the constantly evolving field of large language models (LLMs), the tools and technologies used to support these models are advancing at the same speed as the models themselves. In this article, we will summarize 5 methods for building open source large language model services, each with detailed operational steps and their respective advantages and disadvantages.

1、Anaconda + CPU

We first introduce the entry-level method with the lowest threshold, because this method does not require a GPU. Basically, as long as there is a decent CPU and enough RAM, it can run.

Here we use llama.cpp and its python to bind llama-cpp-python.

pip install llama-cpp-python[server] \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Create a directory named models/7B to store the downloaded models. Then use the command to download the quantized model in GGUF format.

mkdir -p models/7B
 wget -O models/7B/llama-2-7b-chat.Q5_K_M.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true

Then you can run the following command to start the server:

python3 -m llama_cpp.server --model…

A5 Methods and Code Examples for Building LLM Services

1、Anaconda + CPU

Written by Beck Moulton

No responses yet