Running AI Models Locally with Ollama: Complete Setup Guide

Running large language models locally provides privacy, eliminates API costs, and enables offline AI capabilities....

AI/ML Tools Linux Tutorials Linux Open Source

Running large language models locally provides privacy, eliminates API costs, and enables offline AI capabilities. Ollama makes deploying open-source AI models remarkably simple on Linux. This guide covers everything from installation to running models like Llama, Mistral, and Code Llama on your own hardware.

Why Run AI Models Locally?

Privacy – Your data never leaves your machine
No API Costs – Unlimited usage after initial setup
Offline Access – Works without internet connection
Customization – Fine-tune models for your needs
Learning – Understand how LLMs work hands-on

Hardware Requirements

Local LLMs benefit greatly from GPU acceleration. Minimum recommendations:

7B Models – 8GB RAM, 6GB VRAM (or CPU with 16GB RAM)
13B Models – 16GB RAM, 10GB VRAM
70B Models – 64GB RAM, 48GB VRAM

Installing Ollama

# One-line installation
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama service
sudo systemctl start ollama
sudo systemctl enable ollama

# Verify installation
ollama --version

Downloading and Running Models

# Pull a model
ollama pull llama3.2
ollama pull mistral
ollama pull codellama

# Run interactive chat
ollama run llama3.2

# Run with specific prompt
ollama run llama3.2 "Explain quantum computing in simple terms"

# List installed models
ollama list

Popular Models to Try

Llama 3.2 – Meta’s latest open model, excellent general capability
Mistral – Fast and efficient, great for many tasks
Code Llama – Specialized for programming tasks
Phi-3 – Microsoft’s small but capable model
Gemma 2 – Google’s lightweight open model

Using the API

# Generate completion
curl http://localhost:11434/api/generate -d '{ 
  "model": "llama3.2",
  "prompt": "Write a Python function to calculate fibonacci"
}'

# Chat format
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}'

Integration with Open WebUI

# Run Open WebUI with Docker
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

# Access at http://localhost:3000

Creating Custom Models

# Modelfile
FROM llama3.2
PARAMETER temperature 0.7
SYSTEM You are a helpful Linux system administrator assistant.

# Create custom model
ollama create linux-helper -f Modelfile
ollama run linux-helper

Conclusion

Ollama democratizes access to powerful AI models, enabling anyone with suitable hardware to run LLMs locally. Start with smaller models like Phi-3 or Mistral, then experiment with larger models as you explore the capabilities of local AI.

Was this article helpful?