Running large language models locally provides privacy, eliminates API costs, and enables offline AI capabilities. Ollama makes deploying open-source AI models remarkably simple on Linux. This guide covers everything from installation to running models like Llama, Mistral, and Code Llama on your own hardware.
📑 Table of Contents
Why Run AI Models Locally?
- Privacy – Your data never leaves your machine
- No API Costs – Unlimited usage after initial setup
- Offline Access – Works without internet connection
- Customization – Fine-tune models for your needs
- Learning – Understand how LLMs work hands-on
Hardware Requirements
Local LLMs benefit greatly from GPU acceleration. Minimum recommendations:
- 7B Models – 8GB RAM, 6GB VRAM (or CPU with 16GB RAM)
- 13B Models – 16GB RAM, 10GB VRAM
- 70B Models – 64GB RAM, 48GB VRAM
Installing Ollama
# One-line installation
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service
sudo systemctl start ollama
sudo systemctl enable ollama
# Verify installation
ollama --version
Downloading and Running Models
# Pull a model
ollama pull llama3.2
ollama pull mistral
ollama pull codellama
# Run interactive chat
ollama run llama3.2
# Run with specific prompt
ollama run llama3.2 "Explain quantum computing in simple terms"
# List installed models
ollama list
Popular Models to Try
- Llama 3.2 – Meta’s latest open model, excellent general capability
- Mistral – Fast and efficient, great for many tasks
- Code Llama – Specialized for programming tasks
- Phi-3 – Microsoft’s small but capable model
- Gemma 2 – Google’s lightweight open model
Using the API
# Generate completion
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Write a Python function to calculate fibonacci"
}'
# Chat format
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Integration with Open WebUI
# Run Open WebUI with Docker
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
# Access at http://localhost:3000
Creating Custom Models
# Modelfile
FROM llama3.2
PARAMETER temperature 0.7
SYSTEM You are a helpful Linux system administrator assistant.
# Create custom model
ollama create linux-helper -f Modelfile
ollama run linux-helper
Conclusion
Ollama democratizes access to powerful AI models, enabling anyone with suitable hardware to run LLMs locally. Start with smaller models like Phi-3 or Mistral, then experiment with larger models as you explore the capabilities of local AI.
Was this article helpful?