Ollama Docker#

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

让更多人以最简单快速的方式在本地把大模型跑起来

Docker#

CPU only#

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Nvidia GPU#

Install the NVIDIA Container Toolkit.

# Configure Docker to use Nvidia driver
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

AMD GPU#

docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

Run model locally#

docker exec -it ollama ollama run llama3

Try different models#

More models can be found on the Ollama library.

Qwen#

Qwen 1.5 is a series of large language models by Alibaba

docker exec -it ollama ollama run qwen:0.5b
ollama pull qwen:0.5b-chat
ollama pull qwen2.5:0.5b
ollama pull qwen3:0.6b

QwQ#

QwQ is the reasoning model of the Qwen series.

ollama run qwq

DeepSeek#

DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.

docker exec -it ollama ollama run deepseek-r1:1.5b
ollama pull deepseek-r1:32b

测试功能

智能客服，例如：如何学习人工智能？
内容创作，例如：请为我撰写一篇介绍沙县小吃的宣传文案
编程辅助，例如：用Python绘制一个柱状图
教育辅助，例如：解释牛顿第二定律

导出模型

ollama list
ollama show --modelfile deepseek-r1:1.5b
# FROM /root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
docker cp ollama:/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc ./deepseek-r1-1.5b.gguf

nomic-embed-text#

A high-performing open embedding model with a large token context window.

ollama pull nomic-embed-text

REST API#

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

Python library#

ollama.embeddings(model='nomic-embed-text', prompt='The sky is blue because of rayleigh scattering')

Javascript library#

ollama.embeddings({ model: 'nomic-embed-text', prompt: 'The sky is blue because of rayleigh scattering' })

BGE-M3#

BGE-M3 is a new model from BAAI distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

ollama pull bge-m3

LLaVA#

Large Language and Vision Assistant

ollama run llava

What's in this image? /Users/jmorgan/Desktop/smile.png

Llama 3.2 Vision#

Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

ollama pull llama3.2-vision

Python Library#

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)

JavaScript Library#

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)

cURL#

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

Gemma3#

The current strongest model that fits on a single GPU.

ollama run gemma3

SQLCoder#

SQLCoder is a code completion model fined-tuned on StarCoder for SQL generation tasks

ollama run sqlcoder

Hugging Face#

ollama run hf.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF:latest
#或者
ollama run huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF:latest
#国内
ollama run hf-mirror.com/Qwen/Qwen2.5-1.5B-Instruct-GGUF:q2_k
ollama pull hf-mirror.com/bartowski/Qwen2-VL-7B-Instruct-GGUF:f16
ollama pull modelscope.cn/IAILabs/Qwen2.5-VL-7B-Instruct-GGUF:f16

CLI Reference#

ollama -h
ollama -v

Multiline input#

For multiline input, you can wrap text with """:

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

Multimodal models#

ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"

Pass the prompt as an argument#

ollama run deepseek-r1:32b "Summarize this file: $(cat README.md)"

Show model information#

ollama show deepseek-r1:32b

List which models are currently loaded#

ollama ps

Stop a model which is currently running#

ollama stop deepseek-r1:32b

Start Ollama#

ollama serve is used when you want to start ollama without running the desktop application.

REST API#

Generate a completion#

POST /api/generate
# Examples
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?"
}'

Generate a chat completion#

POST /api/chat

List Local Models#

GET /api/tags
# Examples
curl http://localhost:11434/api/tags

List Running Models#

GET /api/ps
# Examples
curl http://localhost:11434/api/ps

Generate Embeddings#

POST /api/embed

Version#

GET /api/version
# Examples
curl http://localhost:11434/api/version

Customize a model#

Import from GGUF#

Create a file named Modelfile, with a FROM instruction with the local filepath to the model you want to import.
```
FROM ./vicuna-33b.Q4_0.gguf
```
Create the model in Ollama
```
ollama create example -f Modelfile
```
Run the model
```
ollama run example
```

Import from PyTorch or Safetensors#

See the guide on importing models for more information.

Customize a prompt#

Models from the Ollama library can be customized with a prompt. For example, to customize the llama3 model:

ollama pull llama3

Create a Modelfile:

FROM llama3

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Next, create and run the model:

ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.

For more examples, see the examples directory. For more information on working with a Modelfile, see the Modelfile documentation.

Ollama Docker

Contents

Ollama Docker#

Docker#

CPU only#

Nvidia GPU#

AMD GPU#

Run model locally#

Try different models#

Qwen#

QwQ#

DeepSeek#

nomic-embed-text#

REST API#

Python library#

Javascript library#

BGE-M3#

LLaVA#

Llama 3.2 Vision#

Python Library#

JavaScript Library#

cURL#

Gemma3#

SQLCoder#

Hugging Face#

CLI Reference#

Multiline input#

Multimodal models#

Pass the prompt as an argument#

Show model information#

List which models are currently loaded#

Stop a model which is currently running#

Start Ollama#

REST API#

Generate a completion#

Generate a chat completion#

List Local Models#

List Running Models#

Generate Embeddings#

Version#

Customize a model#

Import from GGUF#

Import from PyTorch or Safetensors#

Customize a prompt#

Runtime Environment#

References#