Running AI Models Locally with Ollama: Complete Setup Guide

Running large language models locally provides privacy, eliminates API costs, and enables offline AI capabilities....

AI/ML ToolsLinuxTutorials Linux Open Source

Running large language models locally provides privacy, eliminates API costs, and enables offline AI capabilities. Ollama makes deploying open-source AI models remarkably simple on Linux. This guide covers everything from installation to running models like Llama, Mistral, and Code Llama on your own hardware.

Why Run AI Models Locally?

  • Privacy – Your data never leaves your machine
  • No API Costs – Unlimited usage after initial setup
  • Offline Access – Works without internet connection
  • Customization – Fine-tune models for your needs
  • Learning – Understand how LLMs work hands-on

Hardware Requirements

Local LLMs benefit greatly from GPU acceleration. Minimum recommendations:

  • 7B Models – 8GB RAM, 6GB VRAM (or CPU with 16GB RAM)
  • 13B Models – 16GB RAM, 10GB VRAM
  • 70B Models – 64GB RAM, 48GB VRAM

Installing Ollama

# One-line installation
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama service
sudo systemctl start ollama
sudo systemctl enable ollama

# Verify installation
ollama --version

Downloading and Running Models

# Pull a model
ollama pull llama3.2
ollama pull mistral
ollama pull codellama

# Run interactive chat
ollama run llama3.2

# Run with specific prompt
ollama run llama3.2 "Explain quantum computing in simple terms"

# List installed models
ollama list
  • Llama 3.2 – Meta’s latest open model, excellent general capability
  • Mistral – Fast and efficient, great for many tasks
  • Code Llama – Specialized for programming tasks
  • Phi-3 – Microsoft’s small but capable model
  • Gemma 2 – Google’s lightweight open model

Using the API

# Generate completion
curl http://localhost:11434/api/generate -d '{ 
  "model": "llama3.2",
  "prompt": "Write a Python function to calculate fibonacci"
}'

# Chat format
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}'

Integration with Open WebUI

# Run Open WebUI with Docker
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

# Access at http://localhost:3000

Creating Custom Models

# Modelfile
FROM llama3.2
PARAMETER temperature 0.7
SYSTEM You are a helpful Linux system administrator assistant.

# Create custom model
ollama create linux-helper -f Modelfile
ollama run linux-helper

Conclusion

Ollama democratizes access to powerful AI models, enabling anyone with suitable hardware to run LLMs locally. Start with smaller models like Phi-3 or Mistral, then experiment with larger models as you explore the capabilities of local AI.

Was this article helpful?