LocalAI is a free, open-source alternative to OpenAI that allows you to run AI models locally without requiring a GPU. It provides a drop-in replacement REST API compatible with OpenAI’s specifications, enabling you to use existing applications and libraries that integrate with OpenAI while maintaining complete data privacy and eliminating API costs.
📑 Table of Contents
Key Features
- OpenAI API Compatible – Drop-in replacement for OpenAI API endpoints
- No GPU Required – Runs efficiently on CPU-only systems
- Multiple Model Support – LLMs, image generation, audio transcription, embeddings
- Docker Ready – Easy deployment with official Docker images
- Function Calling – Supports OpenAI-style function calling
- Grammar Constraints – JSON mode and structured output support
- Text-to-Speech – Built-in TTS capabilities
Supported Model Types
- Text Generation – LLaMA, GPT-J, Falcon, MPT, and more
- Image Generation – Stable Diffusion models
- Audio – Whisper for transcription, various TTS models
- Embeddings – BERT, sentence-transformers
- Vision – LLaVA, bakllava for image understanding
Installation
The easiest way to run LocalAI is with Docker:
# Basic installation with Docker
docker run -p 8080:8080 --name local-ai \
-v $PWD/models:/models \
localai/localai:latest
# With GPU support (NVIDIA)
docker run -p 8080:8080 --gpus all --name local-ai \
-v $PWD/models:/models \
localai/localai:latest-gpu-nvidia-cuda-12
For native installation:
# Download the latest release
curl -Lo local-ai https://github.com/mudler/LocalAI/releases/latest/download/local-ai-Linux-x86_64
chmod +x local-ai
# Run LocalAI
./local-ai --models-path ./models
Basic Usage
LocalAI exposes OpenAI-compatible endpoints:
# Text completion
curl http://localhost:8080/v1/completions -d '{
"model": "gpt-3.5-turbo",
"prompt": "What is Linux?",
"temperature": 0.7
}'
# Chat completion
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Generate embeddings
curl http://localhost:8080/v1/embeddings -d '{
"model": "text-embedding-ada-002",
"input": "Linux is an open-source operating system"
}'
Integration with Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Explain containers"}]
)
print(response.choices[0].message.content)
Use Cases
- Private ChatGPT – Run a private AI assistant without cloud dependencies
- Development Testing – Test OpenAI integrations without API costs
- Enterprise Deployment – Self-hosted AI for sensitive data environments
- Offline AI – AI capabilities without internet connectivity
- Multi-Modal Applications – Combine text, image, and audio AI in one service
Was this article helpful?