Memory and Knowledge Management
📑 Table of Contents
- Introduction to Agent Memory
- Types of AI Agent Memory
- 1. Short-Term Memory (Conversation Buffer)
- 2. Long-Term Memory (Vector Store)
- 3. Episodic Memory
- 4. Semantic Memory
- 5. Procedural Memory
- Memory Architecture Overview
- Prerequisites
- Part 1: Short-Term Memory (Conversation Buffer)
- Basic Conversation Memory
- Windowed Memory (Limited Context)
- Summary Memory (Compressed Context)
- Part 2: Vector Store (Long-Term Memory)
- Setting Up ChromaDB (Local Vector Database)
- Advanced: FAISS for Large-Scale Memory
- Part 3: Retrieval-Augmented Generation (RAG)
- Part 4: Redis for Fast Memory Access
- Part 5: Knowledge Graphs with Neo4j
- Part 6: Hybrid Memory System (Production-Ready)
- Performance Optimization
- Monitoring and Metrics
- Best Practices
- Security Considerations
- Conclusion
AI Agent Memory and Knowledge Management on Linux: Complete Guide
Last Updated: November 5, 2024 | Reading Time: 18 minutes | Difficulty: Advanced
Introduction to Agent Memory
Basic AI agents have a critical limitation: they forget. Each conversation starts fresh, with no memory of previous interactions or learned knowledge. This makes them ineffective for complex, ongoing tasks.
Advanced AI agents need memory systems to:
- Remember conversation history and context
- Store and retrieve domain-specific knowledge
- Learn from past experiences
- Access large knowledge bases efficiently
- Maintain persistent state across sessions
In this comprehensive guide, you’ll build production-ready memory systems for AI agents on Linux.
Types of AI Agent Memory
1. Short-Term Memory (Conversation Buffer)
Keeps recent conversation in context window. Limited by token limits (4K-128K tokens).
2. Long-Term Memory (Vector Store)
Persistent storage using embeddings. Unlimited capacity with semantic search.
3. Episodic Memory
Remembers specific past events and interactions.
4. Semantic Memory
General knowledge and facts learned over time.
5. Procedural Memory
Learned skills and procedures (how to do things).
Memory Architecture Overview
┌──────────────────────────────────────────┐
│ AI Agent │
│ ┌────────────────────────────────────┐ │
│ │ Conversation Manager │ │
│ │ (Short-term memory) │ │
│ └────────┬───────────────────────────┘ │
│ │ │
│ ┌────────▼───────────┐ ┌─────────────┐ │
│ │ Vector Store │ │ Knowledge │ │
│ │ (Long-term) │ │ Graph │ │
│ │ - ChromaDB │ │ - Neo4j │ │
│ │ - Pinecone │ │ - Redis │ │
│ └────────────────────┘ └─────────────┘ │
└──────────────────────────────────────────┘
Prerequisites
# Install required packages
pip install langchain langchain-community langchain-openai
pip install chromadb sentence-transformers
pip install redis neo4j faiss-cpu
pip install tiktoken python-dotenv
Part 1: Short-Term Memory (Conversation Buffer)
Basic Conversation Memory
#!/usr/bin/env python3
"""
Conversation Memory - Basic Implementation
"""
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from dotenv import load_dotenv
load_dotenv()
# Initialize LLM
llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0.7)
# Create conversation memory
memory = ConversationBufferMemory()
# Create conversation chain
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
# Example usage
def chat():
print("Chat with memory (type 'exit' to quit)\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ['exit', 'quit']:
break
response = conversation.predict(input=user_input)
print(f"\nAssistant: {response}\n")
# Show memory
print(f"[Memory: {len(memory.buffer)} messages]\n")
if __name__ == "__main__":
chat()
Windowed Memory (Limited Context)
from langchain.memory import ConversationBufferWindowMemory
# Keep only last 5 exchanges
memory = ConversationBufferWindowMemory(
k=5, # Number of exchanges to remember
return_messages=True
)
# Prevents memory overflow on long conversations
Summary Memory (Compressed Context)
from langchain.memory import ConversationSummaryMemory
# Summarize old conversations to save tokens
summary_memory = ConversationSummaryMemory(
llm=llm,
max_token_limit=1000
)
# Automatically summarizes when approaching token limit
Part 2: Vector Store (Long-Term Memory)
Vector stores enable semantic search – finding relevant information based on meaning, not just keywords.
Setting Up ChromaDB (Local Vector Database)
#!/usr/bin/env python3
"""
ChromaDB Vector Store - Long-term Memory
"""
import chromadb
from chromadb.config import Settings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document
# Initialize ChromaDB
chroma_client = chromadb.Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="/home/user/ai-agent/chroma_db"
))
# Initialize embeddings
embeddings = OpenAIEmbeddings()
# Create vector store
vectorstore = Chroma(
client=chroma_client,
collection_name="agent_memory",
embedding_function=embeddings
)
class AgentMemorySystem:
"""Complete memory system for AI agents"""
def __init__(self, persist_directory="./agent_memory"):
self.embeddings = OpenAIEmbeddings()
self.vectorstore = Chroma(
persist_directory=persist_directory,
embedding_function=self.embeddings,
collection_name="long_term_memory"
)
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
def store_memory(self, content, metadata=None):
"""Store information in long-term memory"""
# Split into chunks if needed
chunks = self.text_splitter.split_text(content)
# Create documents
documents = [
Document(page_content=chunk, metadata=metadata or {})
for chunk in chunks
]
# Add to vector store
self.vectorstore.add_documents(documents)
print(f"✓ Stored {len(documents)} memory chunks")
def recall(self, query, k=5):
"""Retrieve relevant memories"""
results = self.vectorstore.similarity_search_with_score(
query,
k=k
)
return [
{
"content": doc.page_content,
"metadata": doc.metadata,
"relevance": score
}
for doc, score in results
]
def forget(self, filter_dict):
"""Delete specific memories"""
# Delete by metadata filter
self.vectorstore.delete(where=filter_dict)
def get_context(self, query, max_tokens=2000):
"""Get relevant context for query"""
memories = self.recall(query)
# Build context string
context = "Relevant memories:\n\n"
for i, mem in enumerate(memories, 1):
context += f"{i}. {mem['content']}\n\n"
# Stop if approaching token limit
if len(context.split()) > max_tokens:
break
return context
# Example usage
if __name__ == "__main__":
memory = AgentMemorySystem()
# Store knowledge
memory.store_memory(
"User prefers Python over JavaScript. Works primarily on Linux RHEL 9.",
metadata={"type": "user_preference", "date": "2024-11-05"}
)
memory.store_memory(
"Project uses Docker containers with Kubernetes orchestration.",
metadata={"type": "technical_context"}
)
# Recall relevant information
context = memory.get_context("What programming language does user prefer?")
print(context)
Advanced: FAISS for Large-Scale Memory
from langchain_community.vectorstores import FAISS
import faiss
# FAISS is faster for large datasets (millions of vectors)
vectorstore = FAISS.from_documents(
documents,
embeddings,
distance_strategy="COSINE"
)
# Save index to disk
vectorstore.save_local("/home/user/faiss_index")
# Load from disk
vectorstore = FAISS.load_local(
"/home/user/faiss_index",
embeddings
)
Part 3: Retrieval-Augmented Generation (RAG)
RAG combines retrieval with generation, allowing agents to access external knowledge.
#!/usr/bin/env python3
"""
RAG System for AI Agents
Retrieval-Augmented Generation with Memory
"""
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore.document import Document
import os
class RAGMemoryAgent:
"""Agent with RAG-based knowledge retrieval"""
def __init__(self, knowledge_base_path="./knowledge"):
self.llm = ChatOpenAI(model="gpt-4-turbo-preview")
self.embeddings = OpenAIEmbeddings()
self.knowledge_base_path = knowledge_base_path
# Initialize vector store
self.vectorstore = self._load_or_create_vectorstore()
# Create retrieval chain
self.qa_chain = RetrievalQA.from_chain_type(
llm=self.llm,
chain_type="stuff",
retriever=self.vectorstore.as_retriever(
search_kwargs={"k": 5}
),
return_source_documents=True
)
def _load_or_create_vectorstore(self):
"""Load existing vector store or create new one"""
persist_dir = f"{self.knowledge_base_path}/chroma"
if os.path.exists(persist_dir):
print("Loading existing knowledge base...")
return Chroma(
persist_directory=persist_dir,
embedding_function=self.embeddings
)
else:
print("Creating new knowledge base...")
return Chroma(
persist_directory=persist_dir,
embedding_function=self.embeddings
)
def ingest_documents(self, documents_dir):
"""Ingest documents into knowledge base"""
documents = []
# Read all text files
for filename in os.listdir(documents_dir):
if filename.endswith('.txt') or filename.endswith('.md'):
filepath = os.path.join(documents_dir, filename)
with open(filepath, 'r') as f:
content = f.read()
documents.append(
Document(
page_content=content,
metadata={"source": filename}
)
)
# Split into chunks
text_splitter = CharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
split_docs = text_splitter.split_documents(documents)
# Add to vector store
self.vectorstore.add_documents(split_docs)
print(f"✓ Ingested {len(split_docs)} document chunks")
def query(self, question):
"""Query the knowledge base"""
result = self.qa_chain({"query": question})
return {
"answer": result["result"],
"sources": [doc.metadata["source"] for doc in result["source_documents"]]
}
def interactive_mode(self):
"""Interactive Q&A mode"""
print("\n" + "="*60)
print("RAG Memory Agent - Knowledge Base Q&A")
print("="*60)
print("Type 'exit' to quit\n")
while True:
question = input("Question: ").strip()
if question.lower() in ['exit', 'quit']:
break
result = self.query(question)
print(f"\nAnswer: {result['answer']}")
print(f"Sources: {', '.join(set(result['sources']))}\n")
# Example usage
if __name__ == "__main__":
# Create agent
agent = RAGMemoryAgent()
# Ingest knowledge (one-time setup)
# agent.ingest_documents("/home/user/docs")
# Interactive mode
agent.interactive_mode()
Part 4: Redis for Fast Memory Access
#!/usr/bin/env python3
"""
Redis-backed Agent Memory
Fast key-value storage for agent state
"""
import redis
import json
from datetime import datetime, timedelta
class RedisMemory:
"""Fast memory using Redis"""
def __init__(self, host='localhost', port=6379, db=0):
self.redis = redis.Redis(
host=host,
port=port,
db=db,
decode_responses=True
)
def store_session(self, session_id, data, ttl=3600):
"""Store session data with expiration"""
key = f"session:{session_id}"
self.redis.setex(
key,
ttl, # Time-to-live in seconds
json.dumps(data)
)
def get_session(self, session_id):
"""Retrieve session data"""
key = f"session:{session_id}"
data = self.redis.get(key)
return json.loads(data) if data else None
def store_user_context(self, user_id, context):
"""Store persistent user context"""
key = f"user:{user_id}:context"
self.redis.hset(key, mapping=context)
def get_user_context(self, user_id):
"""Get user context"""
key = f"user:{user_id}:context"
return self.redis.hgetall(key)
def add_to_history(self, user_id, message, max_history=100):
"""Add message to user history"""
key = f"user:{user_id}:history"
# Add message with timestamp
entry = json.dumps({
"message": message,
"timestamp": datetime.now().isoformat()
})
self.redis.lpush(key, entry)
self.redis.ltrim(key, 0, max_history - 1)
def get_history(self, user_id, limit=10):
"""Get recent history"""
key = f"user:{user_id}:history"
history = self.redis.lrange(key, 0, limit - 1)
return [json.loads(h) for h in history]
def cache_result(self, query, result, ttl=300):
"""Cache query results"""
key = f"cache:{hash(query)}"
self.redis.setex(key, ttl, json.dumps(result))
def get_cached_result(self, query):
"""Get cached result"""
key = f"cache:{hash(query)}"
result = self.redis.get(key)
return json.loads(result) if result else None
# Example usage
if __name__ == "__main__":
memory = RedisMemory()
# Store session
memory.store_session("sess_123", {
"user_id": "user_456",
"started_at": datetime.now().isoformat(),
"context": "troubleshooting Linux server"
}, ttl=3600)
# Store user context
memory.store_user_context("user_456", {
"name": "John",
"role": "DevOps Engineer",
"preferred_os": "RHEL 9"
})
# Add to history
memory.add_to_history("user_456", "How do I configure firewalld?")
# Retrieve
context = memory.get_user_context("user_456")
history = memory.get_history("user_456")
print(f"User Context: {context}")
print(f"Recent History: {history}")
Part 5: Knowledge Graphs with Neo4j
Knowledge graphs store relationships between entities, enabling complex reasoning.
#!/usr/bin/env python3
"""
Knowledge Graph Memory using Neo4j
"""
from neo4j import GraphDatabase
from typing import List, Dict
class KnowledgeGraphMemory:
"""Graph-based memory for complex relationships"""
def __init__(self, uri="bolt://localhost:7687", user="neo4j", password="password"):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def add_fact(self, subject, predicate, object, metadata=None):
"""Add a fact (triple) to knowledge graph"""
with self.driver.session() as session:
session.run("""
MERGE (s:Entity {name: $subject})
MERGE (o:Entity {name: $object})
CREATE (s)-[r:RELATION {
type: $predicate,
timestamp: datetime(),
metadata: $metadata
}]->(o)
""", subject=subject, predicate=predicate, object=object,
metadata=metadata or {})
def query_relationships(self, entity):
"""Find all relationships for an entity"""
with self.driver.session() as session:
result = session.run("""
MATCH (e:Entity {name: $entity})-[r]-(connected)
RETURN connected.name as entity, type(r) as relation, r.type as relation_type
""", entity=entity)
return [dict(record) for record in result]
def find_path(self, start, end, max_depth=5):
"""Find connection between two entities"""
with self.driver.session() as session:
result = session.run("""
MATCH path = shortestPath(
(start:Entity {name: $start})-[*..{max_depth}]-(end:Entity {name: $end})
)
RETURN path
""", start=start, end=end, max_depth=max_depth)
return result.single()
# Example usage
if __name__ == "__main__":
kg = KnowledgeGraphMemory()
# Build knowledge
kg.add_fact("User", "prefers", "Python")
kg.add_fact("User", "works_on", "Linux")
kg.add_fact("Linux", "requires", "Shell scripting")
kg.add_fact("Python", "runs_on", "Linux")
# Query
relationships = kg.query_relationships("User")
print(f"User relationships: {relationships}")
kg.close()
Part 6: Hybrid Memory System (Production-Ready)
#!/usr/bin/env python3
"""
Production Hybrid Memory System
Combines multiple memory types for optimal performance
"""
from typing import Dict, List
import json
class HybridMemorySystem:
"""
Production-grade memory system combining:
- Short-term (conversation buffer)
- Long-term (vector store)
- Fast cache (Redis)
- Knowledge graph (Neo4j)
"""
def __init__(self):
self.short_term = ConversationBufferWindowMemory(k=10)
self.long_term = AgentMemorySystem()
self.cache = RedisMemory()
self.knowledge_graph = KnowledgeGraphMemory()
def remember(self, content, memory_type="episodic", metadata=None):
"""Store information in appropriate memory system"""
if memory_type == "conversation":
# Short-term conversation memory
self.short_term.save_context(
{"input": content},
{"output": ""}
)
elif memory_type == "episodic":
# Long-term episodic memory
self.long_term.store_memory(content, metadata)
elif memory_type == "fact":
# Knowledge graph for facts/relationships
# Parse fact as triple (subject, predicate, object)
self.knowledge_graph.add_fact(*content.split(","))
def recall(self, query, search_all=True):
"""Retrieve from all memory systems"""
# Check cache first (fastest)
cached = self.cache.get_cached_result(query)
if cached:
return {"source": "cache", "data": cached}
results = {}
if search_all:
# Search long-term memory
memories = self.long_term.recall(query)
results["long_term"] = memories
# Get conversation context
conversation = self.short_term.load_memory_variables({})
results["conversation"] = conversation
# Cache result
self.cache.cache_result(query, results, ttl=300)
return results
def forget(self, memory_type, filter_criteria):
"""Selective forgetting"""
if memory_type == "long_term":
self.long_term.forget(filter_criteria)
elif memory_type == "cache":
# Clear cache
self.cache.redis.flushdb()
def get_full_context(self, query):
"""Get comprehensive context for agent reasoning"""
context = {
"query": query,
"conversation_history": self.short_term.load_memory_variables({}),
"relevant_memories": self.long_term.recall(query, k=5),
"timestamp": datetime.now().isoformat()
}
return context
# Example usage
if __name__ == "__main__":
memory = HybridMemorySystem()
# Store different types of memories
memory.remember("User asked about Docker containers", memory_type="conversation")
memory.remember("User prefers RHEL 9 for production", memory_type="episodic")
memory.remember("Docker,runs_on,Linux", memory_type="fact")
# Recall
context = memory.recall("What does user use for production?")
print(json.dumps(context, indent=2))
Performance Optimization
| Memory Type | Lookup Speed | Storage Capacity | Best Use Case |
|---|---|---|---|
| Redis Cache | <1ms | GB scale | Session data, recent context |
| Vector Store (ChromaDB) | 10-100ms | Millions of vectors | Semantic search, knowledge |
| Knowledge Graph | 50-200ms | Billions of relationships | Complex reasoning |
| Conversation Buffer | <1ms | Limited by tokens | Active conversation |
Monitoring and Metrics
class MemoryMetrics:
"""Track memory system performance"""
def __init__(self):
self.metrics = {
"queries": 0,
"cache_hits": 0,
"cache_misses": 0,
"avg_retrieval_time": []
}
def record_query(self, query_time, cache_hit=False):
self.metrics["queries"] += 1
self.metrics["avg_retrieval_time"].append(query_time)
if cache_hit:
self.metrics["cache_hits"] += 1
else:
self.metrics["cache_misses"] += 1
def get_stats(self):
cache_hit_rate = (
self.metrics["cache_hits"] / self.metrics["queries"] * 100
if self.metrics["queries"] > 0 else 0
)
avg_time = (
sum(self.metrics["avg_retrieval_time"]) / len(self.metrics["avg_retrieval_time"])
if self.metrics["avg_retrieval_time"] else 0
)
return {
"total_queries": self.metrics["queries"],
"cache_hit_rate": f"{cache_hit_rate:.2f}%",
"avg_retrieval_time": f"{avg_time*1000:.2f}ms"
}
Best Practices
- Use appropriate memory type: Cache for speed, vectors for semantic search, graphs for relationships
- Set TTL on cached data: Prevent stale information
- Implement memory limits: Use windowed or summary memory to control token usage
- Index strategically: Proper indexing dramatically improves retrieval speed
- Monitor performance: Track cache hit rates and retrieval times
- Implement forgetting: Not all memories need to be permanent
Security Considerations
- Encrypt sensitive data in vector stores
- Implement access controls on memory systems
- Audit memory access and modifications
- Sanitize inputs to prevent injection attacks
- Regular backups of knowledge bases
Conclusion
Advanced memory systems transform simple chatbots into intelligent agents that learn, remember, and improve over time. By combining short-term conversation memory, long-term vector storage, fast caching, and knowledge graphs, you can build production-ready AI agents that handle complex, ongoing tasks.
Next: Learn how to deploy these systems in production on Linux with Docker and Kubernetes in Article 7.
Was this article helpful?
About Ramesh Sundararamaiah
Red Hat Certified Architect
Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.