Introduce to Ollama
Ollama是在个人PC上运行大模型的一个推理软件,使用简单,适合新手。
Ollama安装
打开终端,运行 1
curl -fsSL https://ollama.com/install.sh | sh
如何使用
本地使用的时候,需要先运行ollama serve打开服务,然后再运行某一个模型,例如ollama run llama3.3,Ollama会先去拉取llama3.3,之后再去运行,会给出下面的对话框: 1
2$: ollama run llama3.3
>>> Send a message (/? for help)
不同模型占用的显存不一样,可以通过ollama ps来查看目前模型运行的情况: 1
2
3~$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.3:latest a6eb4748fd29 45 GB 46%/54% CPU/GPU 4 minutes from now
也可以通过ollama list来查看目前有哪些模型 1
2
3
4
5
6
7
8
9
10~$ ollama list
NAME ID SIZE MODIFIED
qwen2.5:32b 9f13ba1299af 19 GB 7 days ago
llava:latest 8dd30f6b0cb1 4.7 GB 10 days ago
codegemma:latest 0c96700aaada 5.0 GB 10 days ago
internlm2:20b a864ac8dade2 11 GB 10 days ago
internlm2:latest 5050e36678ab 4.5 GB 10 days ago
glm4:latest 5b699761eca5 5.5 GB 10 days ago
deepseek-coder-v2:latest 63fb193b3a9b 8.9 GB 12 days ago
wizardcoder:latest de9d848c1323 3.8 GB 12 days ago
作为服务运行
有时候会把Ollama作为一个服务运行,这样可以免除每次都执行ollama server的操作;可以通过一下方式实现: 1
sudo systemctl edit ollama.service
1
2# [Service]
# Environment="OLLAMA_HOST=0.0.0.0"1
2
3sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama.service1
2
3
4
5
6
7
8
9
10● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/ollama.service.d
└─override.conf
Active: active (running) since Fri 2024-12-20 07:40:24 CST; 6min ago
Main PID: 1241 (ollama)
Tasks: 15 (limit: 37306)
Memory: 139.7M
CGroup: /system.slice/ollama.service
└─1241 /usr/local/bin/ollama serve
Troubleshoot
- Ubuntu系统自动升级后发现Ollama运行很慢,通过
ollama ps发现推理都在CPU上面进行;再运行nvidia-smi,发现找不到显卡;通过以下手段进行修复:1
2
3
4
5# Secondly, you can find your NVIDIA driver's version by following command
ls /usr/src | grep nvidia
# 3.Lastly, you should install the corresponding version by following command sudo apt install dkms
sudo dkms install -m nvidia -v xxx
# https://stackoverflow.com/questions/67071101/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-mak - 很多插件(VSCode中的Continue)都使用本地的Ollama,如果Ollama设置在服务端,需要通过端口映射来建立本地的Ollama服务
1
ssh -L 11434:localhost:11434 -fNT remoteserver -o IdentitiesOnly=yes -q -A -T