GPT4All#

A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.

Download Desktop Chat Client#

Models#

GPT4All API Server#

Activating the API Server#

  1. Open the GPT4All Chat Desktop Application.

  2. Go to Settings > Application and scroll down to Advanced.

  3. Check the box for the "Enable Local API Server" setting.

  4. The server listens on port 4891 by default. You can choose another port number in the "API Server Port" setting.

Connecting to the API Server#

The base URL used for the API server is http://localhost:4891/v1 (or http://localhost:<PORT_NUM>/v1 if you are using a different port number).

Examples#

curl -X POST http://localhost:4891/v1/chat/completions -d '{
"model": "Phi-3 Mini Instruct",
"messages": [{"role":"user","content":"Who is Lionel Messi?"}],
"max_tokens": 50,
"temperature": 0.28
}'

API Endpoints#

Method Path Description
GET /v1/models List available models
GET /v1/models/<name> Get details of a specific model
POST /v1/completions Generate text completions
POST /v1/chat/completions Generate chat completions

GPT4All Python SDK#

Installation#

pip install gpt4all

Load LLM#

from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
with model.chat_session():
    print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))

Chat Session Generation#

from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
with model.chat_session():
    print(model.generate("quadratic formula"))

Direct Generation#

from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
print(model.generate("quadratic formula"))

Embeddings#

from nomic import embed
embeddings = embed.text(["String 1", "String 2"], inference_mode="local")['embeddings']
print("Number of embeddings created:", len(embeddings))
print("Number of dimensions per embedding:", len(embeddings[0]))

Tips#

GPT4All Python SDK#

Failed to load llamamodel-mainline-cuda-avxonly.dll: LoadLibraryExW failed with error 0x7e
Failed to load llamamodel-mainline-cuda.dll: LoadLibraryExW failed with error 0x7e

Python binding logs console errors when CUDA is not found, even when CPU is requested

Runtime Environment#

  • C++

Screenshots#

https://gpt4all.io/landing.gif

References#