GPT4All#

A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.

Download Desktop Chat Client#

Models#

GPT4All Python SDK#

Installation#

pip install gpt4all

Load LLM#

from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
with model.chat_session():
    print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))

Chat Session Generation#

from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
with model.chat_session():
    print(model.generate("quadratic formula"))

Direct Generation#

from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
print(model.generate("quadratic formula"))

Embeddings#

from nomic import embed
embeddings = embed.text(["String 1", "String 2"], inference_mode="local")['embeddings']
print("Number of embeddings created:", len(embeddings))
print("Number of dimensions per embedding:", len(embeddings[0]))

Tips#

GPT4All Python SDK#

Failed to load llamamodel-mainline-cuda-avxonly.dll: LoadLibraryExW failed with error 0x7e
Failed to load llamamodel-mainline-cuda.dll: LoadLibraryExW failed with error 0x7e

Python binding logs console errors when CUDA is not found, even when CPU is requested

Runtime Environment#

  • C++

Screenshots#

https://gpt4all.io/landing.gif

References#