Ollama Docker#
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
让更多人以最简单快速的方式在本地把大模型跑起来
Docker#
CPU only#
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Nvidia GPU#
Install the NVIDIA Container Toolkit.
# Configure Docker to use Nvidia driver
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
AMD GPU#
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm
Run model locally#
docker exec -it ollama ollama run llama3
Try different models#
More models can be found on the Ollama library.
Qwen#
Qwen 1.5 is a series of large language models by Alibaba
docker exec -it ollama ollama run qwen:0.5b
ollama pull qwen:0.5b-chat
ollama pull qwen2.5:0.5b
ollama pull qwen3:0.6b
QwQ#
QwQ is the reasoning model of the Qwen series.
ollama run qwq
DeepSeek#
DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
docker exec -it ollama ollama run deepseek-r1:1.5b
ollama pull deepseek-r1:32b
测试功能
智能客服,例如:如何学习人工智能?
内容创作,例如:请为我撰写一篇介绍沙县小吃的宣传文案
编程辅助,例如:用Python绘制一个柱状图
教育辅助,例如:解释牛顿第二定律
导出模型
ollama list
ollama show --modelfile deepseek-r1:1.5b
# FROM /root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
docker cp ollama:/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc ./deepseek-r1-1.5b.gguf
nomic-embed-text#
A high-performing open embedding model with a large token context window.
ollama pull nomic-embed-text
REST API#
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "The sky is blue because of Rayleigh scattering"
}'
Python library#
ollama.embeddings(model='nomic-embed-text', prompt='The sky is blue because of rayleigh scattering')
Javascript library#
ollama.embeddings({ model: 'nomic-embed-text', prompt: 'The sky is blue because of rayleigh scattering' })
BGE-M3#
BGE-M3 is a new model from BAAI distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.
ollama pull bge-m3
LLaVA#
Large Language and Vision Assistant
ollama run llava
What's in this image? /Users/jmorgan/Desktop/smile.png
Llama 3.2 Vision#
Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.
ollama pull llama3.2-vision
Python Library#
import ollama
response = ollama.chat(
model='llama3.2-vision',
messages=[{
'role': 'user',
'content': 'What is in this image?',
'images': ['image.jpg']
}]
)
print(response)
JavaScript Library#
import ollama from 'ollama'
const response = await ollama.chat({
model: 'llama3.2-vision',
messages: [{
role: 'user',
content: 'What is in this image?',
images: ['image.jpg']
}]
})
console.log(response)
cURL#
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2-vision",
"messages": [
{
"role": "user",
"content": "what is in this image?",
"images": ["<base64-encoded image data>"]
}
]
}'
Gemma3#
The current strongest model that fits on a single GPU.
ollama run gemma3
SQLCoder#
SQLCoder is a code completion model fined-tuned on StarCoder for SQL generation tasks
ollama run sqlcoder
Hugging Face#
ollama run hf.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF:latest
#或者
ollama run huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF:latest
#国内
ollama run hf-mirror.com/Qwen/Qwen2.5-1.5B-Instruct-GGUF:q2_k
ollama pull hf-mirror.com/bartowski/Qwen2-VL-7B-Instruct-GGUF:f16
ollama pull modelscope.cn/IAILabs/Qwen2.5-VL-7B-Instruct-GGUF:f16
CLI Reference#
ollama -h
ollama -v
Multiline input#
For multiline input, you can wrap text with """:
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
Multimodal models#
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
Pass the prompt as an argument#
ollama run deepseek-r1:32b "Summarize this file: $(cat README.md)"
Show model information#
ollama show deepseek-r1:32b
List which models are currently loaded#
ollama ps
Stop a model which is currently running#
ollama stop deepseek-r1:32b
Start Ollama#
ollama serve is used when you want to start ollama without running the desktop application.
REST API#
Generate a completion#
POST /api/generate
# Examples
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
Generate a chat completion#
POST /api/chat
List Local Models#
GET /api/tags
# Examples
curl http://localhost:11434/api/tags
List Running Models#
GET /api/ps
# Examples
curl http://localhost:11434/api/ps
Generate Embeddings#
POST /api/embed
Version#
GET /api/version
# Examples
curl http://localhost:11434/api/version
Customize a model#
Import from GGUF#
Create a file named
Modelfile, with aFROMinstruction with the local filepath to the model you want to import.FROM ./vicuna-33b.Q4_0.gguf
Create the model in Ollama
ollama create example -f Modelfile
Run the model
ollama run example
Import from PyTorch or Safetensors#
See the guide on importing models for more information.
Customize a prompt#
Models from the Ollama library can be customized with a prompt. For example, to customize the llama3 model:
ollama pull llama3
Create a Modelfile:
FROM llama3
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
Next, create and run the model:
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
For more examples, see the examples directory. For more information on working with a Modelfile, see the Modelfile documentation.