GPT4All#
A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.
Download Desktop Chat Client#
Models#
GPT4All API Server#
Activating the API Server#
Open the GPT4All Chat Desktop Application.
Go to
Settings
>Application
and scroll down toAdvanced
.Check the box for the
"Enable Local API Server"
setting.The server listens on port 4891 by default. You can choose another port number in the
"API Server Port"
setting.
Connecting to the API Server#
The base URL used for the API server is http://localhost:4891/v1
(or http://localhost:<PORT_NUM>/v1
if you are using a different port number).
Examples#
curl -X POST http://localhost:4891/v1/chat/completions -d '{
"model": "Phi-3 Mini Instruct",
"messages": [{"role":"user","content":"Who is Lionel Messi?"}],
"max_tokens": 50,
"temperature": 0.28
}'
API Endpoints#
Method | Path | Description |
---|---|---|
GET | /v1/models |
List available models |
GET | /v1/models/<name> |
Get details of a specific model |
POST | /v1/completions |
Generate text completions |
POST | /v1/chat/completions |
Generate chat completions |
GPT4All Python SDK#
Installation#
pip install gpt4all
Load LLM#
from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
with model.chat_session():
print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))
Chat Session Generation#
from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
with model.chat_session():
print(model.generate("quadratic formula"))
Direct Generation#
from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
print(model.generate("quadratic formula"))
Embeddings#
from nomic import embed
embeddings = embed.text(["String 1", "String 2"], inference_mode="local")['embeddings']
print("Number of embeddings created:", len(embeddings))
print("Number of dimensions per embedding:", len(embeddings[0]))
Tips#
GPT4All Python SDK#
Failed to load llamamodel-mainline-cuda-avxonly.dll: LoadLibraryExW failed with error 0x7e
Failed to load llamamodel-mainline-cuda.dll: LoadLibraryExW failed with error 0x7e
Python binding logs console errors when CUDA is not found, even when CPU is requested
Runtime Environment#
C++