User Context API Tutorial: Building AI Agents That Actually Know Your Users
User Context API Tutorial: Building AI Agents That Actually Know Your Users
Your AI agent answers questions accurately, executes tasks reliably, and generally works great—until it asks "What's your name?" for the hundredth time. This is the stateless agent problem: every conversation starts from scratch, every user is a stranger, and personalization is impossible without manual intervention.
This tutorial shows you how to fix that. You'll learn how to integrate user context into AI agents using a context API, with working code examples you can run today.
What You'll Learn
- Why AI agents need external context and why conversation history alone isn't enough
- Three approaches to adding user context (and when to use each)
- How to integrate a context API into OpenAI function calls
- Building a context-aware LangChain agent from scratch
- Patterns for caching, refreshing, and optimizing context retrieval
- Testing strategies to verify your context layer works correctly
Prerequisites
Before diving in, you should have:
- Basic familiarity with Python and REST APIs
- An OpenAI API key (or equivalent LLM provider)
- Understanding of how LLM agents work (tool calling, system prompts)
Why AI Agents Need External Context
Consider this interaction with a typical AI assistant:
User: What should I have for lunch?
AI: I'd be happy to help! What kind of food do you enjoy? Do you have any dietary restrictions? What's your budget? Are you cooking or ordering?
The AI doesn't know anything about the user, so it asks for context. Every. Single. Time.
Now imagine the same interaction with user context:
User: What should I have for lunch?
AI: Since you mentioned wanting to eat healthier this week, and you're near the office in Cambridge, there's that Mediterranean place you liked on Mass Ave. They have the quinoa bowl you ordered last month—high protein, around $14.
The difference isn't intelligence—it's information. The second agent has access to:
- User preferences (eating healthier)
- Location (Cambridge, near office)
- History (Mediterranean place, quinoa bowl)
- Constraints (budget context from past orders)
This context transforms a generic assistant into a personalized one.
Why Conversation History Isn't Enough
You might think: "Just store conversation history. Problem solved."
Not quite. Conversation history has three fundamental limitations:
-
Context windows are finite. Even with 128K or 200K token windows, years of user interactions won't fit. You need selective retrieval.
-
Relevant context is distributed. The user's food preferences might be spread across 50 conversations over 6 months. RAG on conversation logs can help, but it's inefficient and noisy.
-
Some context was never stated. Users don't announce "I'm in Cambridge today" or "my budget is tight this week." This context comes from external sources—calendars, location, spending patterns.
User context APIs solve these problems by maintaining a structured, queryable representation of who the user is, what they've done, and what matters to them.
Three Approaches to User Context
Before we dive into implementation, let's compare your options:
Approach 1: Manual Context in System Prompts
The simplest approach: hard-code user information into your system prompt.
system_prompt = """You are a helpful assistant for Sarah.
User context:
- Name: Sarah Chen
- Role: Product Manager at TechCorp
- Preferences: Prefers concise responses, uses Notion for notes
- Current project: Q1 roadmap planning
"""
Pros:
- Zero infrastructure required
- Full control over what context is included
- Works immediately with any LLM
Cons:
- Doesn't scale beyond a handful of users
- Context quickly becomes stale
- Manual updates required for every change
- No dynamic retrieval based on query
Best for: Personal assistants with a single user, prototypes, demos.
Approach 2: RAG on User Data
Use retrieval-augmented generation on user documents, messages, or activity logs.
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Index user data
vectorstore = Chroma.from_documents(
user_documents,
embedding=OpenAIEmbeddings()
)
# Retrieve relevant context at query time
relevant_context = vectorstore.similarity_search(
query=user_query,
k=5
)
Pros:
- Scales to large amounts of user data
- Retrieves contextually relevant information
- Can include conversation history, documents, notes
Cons:
- Retrieval quality varies significantly
- Embedding + search latency on every request
- Raw documents often need summarization
- Privacy concerns with storing raw user data
Best for: Knowledge-heavy applications, document-based assistants, when you have unstructured user data you want to leverage.
Approach 3: Dedicated Context API
Use a specialized API that maintains structured user context and serves it on demand.
import requests
# Fetch user context
response = requests.get(
"https://api.dytto.app/v1/context",
headers={"Authorization": f"Bearer {token}"}
)
context = response.json()
# Context is structured, summarized, ready to inject
user_summary = context["narrative_summary"]
preferences = context["preferences"]
recent_activity = context["recent_patterns"]
Pros:
- Pre-structured and summarized context
- Single API call, low latency
- Handles context aggregation across sources
- Built-in privacy and consent management
Cons:
- Requires integration with context provider
- Depends on user opting into context collection
- Monthly API costs at scale
Best for: Production applications with many users, when you need rich behavioral context, when privacy/consent is important.
Tutorial: Building a Context-Aware Agent
Let's build a practical example. We'll create an AI assistant that uses the Dytto Context API to personalize responses.
Step 1: Set Up Your Environment
pip install openai requests python-dotenv
Create a .env file:
OPENAI_API_KEY=sk-...
DYTTO_API_KEY=dyt_...
Step 2: Create the Context Fetcher
First, build a function that retrieves user context:
# context_service.py
import os
import requests
from typing import Optional
from dataclasses import dataclass
@dataclass
class UserContext:
"""Structured user context from the API."""
user_id: str
summary: str
preferences: dict
recent_patterns: list
location: Optional[dict] = None
def to_prompt_string(self) -> str:
"""Format context for injection into system prompt."""
lines = [
f"## User Context",
f"Summary: {self.summary}",
"",
"### Preferences:",
]
for key, value in self.preferences.items():
lines.append(f"- {key}: {value}")
if self.recent_patterns:
lines.append("")
lines.append("### Recent Patterns:")
for pattern in self.recent_patterns[:5]: # Limit to 5
lines.append(f"- {pattern}")
if self.location:
lines.append("")
lines.append(f"### Current Location: {self.location.get('city', 'Unknown')}")
return "\n".join(lines)
class ContextService:
"""Service for fetching user context from Dytto API."""
def __init__(self, api_key: str):
self.api_key = api_key
self.base_url = "https://api.dytto.app/v1"
def get_context(self, user_id: str) -> Optional[UserContext]:
"""Fetch context for a specific user."""
try:
response = requests.get(
f"{self.base_url}/context",
headers={
"Authorization": f"Bearer {self.api_key}",
"X-User-ID": user_id
},
timeout=5.0
)
response.raise_for_status()
data = response.json()
return UserContext(
user_id=user_id,
summary=data.get("narrative_summary", ""),
preferences=data.get("preferences", {}),
recent_patterns=data.get("patterns", []),
location=data.get("location")
)
except requests.RequestException as e:
print(f"Context fetch failed: {e}")
return None
def search_context(self, user_id: str, query: str) -> list:
"""Search user context for specific topics."""
try:
response = requests.post(
f"{self.base_url}/context/search",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"user_id": user_id, "query": query},
timeout=5.0
)
response.raise_for_status()
return response.json().get("results", [])
except requests.RequestException:
return []
Step 3: Build the Context-Aware Agent
Now create an agent that uses this context:
# agent.py
import os
from openai import OpenAI
from context_service import ContextService, UserContext
from dotenv import load_dotenv
load_dotenv()
client = OpenAI()
context_service = ContextService(os.getenv("DYTTO_API_KEY"))
def build_system_prompt(base_prompt: str, user_context: UserContext = None) -> str:
"""Build system prompt with optional user context."""
if not user_context:
return base_prompt
return f"""{base_prompt}
{user_context.to_prompt_string()}
Use this context to personalize your responses. Reference specific details
when relevant, but don't be creepy about it—integrate naturally.
"""
def run_agent(user_id: str, message: str) -> str:
"""Run the agent with user context."""
# Fetch user context
user_context = context_service.get_context(user_id)
# Build the system prompt
base_prompt = """You are a helpful personal assistant. You help users
with tasks, answer questions, and provide recommendations. Be concise
but thorough."""
system_prompt = build_system_prompt(base_prompt, user_context)
# Call the LLM
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": message}
],
temperature=0.7
)
return response.choices[0].message.content
# Example usage
if __name__ == "__main__":
user_id = "user_12345"
# Without context, this would prompt for clarification
response = run_agent(user_id, "What should I have for dinner tonight?")
print(response)
Step 4: Add Context as a Tool
For more dynamic context retrieval, expose context search as a tool the agent can call:
# agent_with_tools.py
import json
from openai import OpenAI
from context_service import ContextService
client = OpenAI()
context_service = ContextService(os.getenv("DYTTO_API_KEY"))
tools = [
{
"type": "function",
"function": {
"name": "search_user_context",
"description": "Search the user's personal context for specific information. Use this when you need to know something about the user that isn't in the current conversation.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to search for in user context (e.g., 'food preferences', 'work schedule', 'recent purchases')"
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "get_user_location",
"description": "Get the user's current location if available.",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
}
}
]
def handle_tool_call(user_id: str, tool_name: str, arguments: dict) -> str:
"""Execute a tool call and return the result."""
if tool_name == "search_user_context":
results = context_service.search_context(user_id, arguments["query"])
if results:
return json.dumps({"found": True, "results": results})
return json.dumps({"found": False, "message": "No relevant context found"})
elif tool_name == "get_user_location":
context = context_service.get_context(user_id)
if context and context.location:
return json.dumps(context.location)
return json.dumps({"available": False})
return json.dumps({"error": "Unknown tool"})
def run_agent_with_tools(user_id: str, message: str) -> str:
"""Run agent with tool-calling capability."""
# Start with base context in system prompt
user_context = context_service.get_context(user_id)
messages = [
{
"role": "system",
"content": f"""You are a helpful personal assistant with access to the user's
personal context. You can search their context for specific information.
Current user summary: {user_context.summary if user_context else 'No context available'}
"""
},
{"role": "user", "content": message}
]
# Initial call
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
assistant_message = response.choices[0].message
# Handle tool calls if any
while assistant_message.tool_calls:
messages.append(assistant_message)
for tool_call in assistant_message.tool_calls:
result = handle_tool_call(
user_id,
tool_call.function.name,
json.loads(tool_call.function.arguments)
)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
# Get next response
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
assistant_message = response.choices[0].message
return assistant_message.content
LangChain Integration
If you're using LangChain, here's how to integrate user context:
# langchain_agent.py
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain.tools import Tool, StructuredTool
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from pydantic import BaseModel, Field
class ContextSearchInput(BaseModel):
query: str = Field(description="What to search for in user context")
def create_context_tools(context_service, user_id: str):
"""Create LangChain tools for context access."""
def search_context(query: str) -> str:
results = context_service.search_context(user_id, query)
if results:
return f"Found relevant context: {results}"
return "No relevant context found for this query."
def get_full_context() -> str:
context = context_service.get_context(user_id)
if context:
return context.to_prompt_string()
return "No user context available."
return [
StructuredTool.from_function(
func=search_context,
name="search_user_context",
description="Search for specific information in the user's personal context",
args_schema=ContextSearchInput
),
Tool.from_function(
func=get_full_context,
name="get_user_profile",
description="Get the user's full context profile including preferences and patterns"
)
]
def create_context_aware_agent(context_service, user_id: str):
"""Create a LangChain agent with context access."""
# Fetch initial context for system prompt
user_context = context_service.get_context(user_id)
context_summary = user_context.summary if user_context else "No context available"
# Create prompt with context
prompt = ChatPromptTemplate.from_messages([
("system", f"""You are a helpful personal assistant. You have access to
the user's personal context and can search for specific information.
Current user summary: {context_summary}
Use the available tools to look up specific details when needed.
Personalize your responses based on what you know about the user."""),
MessagesPlaceholder(variable_name="chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
# Create tools
tools = create_context_tools(context_service, user_id)
# Create agent
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
agent = create_openai_tools_agent(llm, tools, prompt)
return AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
handle_parsing_errors=True
)
# Usage
if __name__ == "__main__":
from context_service import ContextService
context_service = ContextService(os.getenv("DYTTO_API_KEY"))
agent = create_context_aware_agent(context_service, "user_12345")
result = agent.invoke({"input": "What's a good restaurant for dinner tonight?"})
print(result["output"])
Optimization Patterns
Pattern 1: Context Caching
Don't fetch context on every request. Cache it with a reasonable TTL:
from functools import lru_cache
from datetime import datetime, timedelta
import threading
class CachedContextService:
def __init__(self, context_service, ttl_seconds: int = 300):
self.context_service = context_service
self.ttl_seconds = ttl_seconds
self.cache = {}
self.cache_times = {}
self.lock = threading.Lock()
def get_context(self, user_id: str):
with self.lock:
now = datetime.now()
# Check cache
if user_id in self.cache:
cache_time = self.cache_times[user_id]
if now - cache_time < timedelta(seconds=self.ttl_seconds):
return self.cache[user_id]
# Fetch fresh
context = self.context_service.get_context(user_id)
if context:
self.cache[user_id] = context
self.cache_times[user_id] = now
return context
def invalidate(self, user_id: str):
"""Invalidate cache for a user (call when context changes)."""
with self.lock:
self.cache.pop(user_id, None)
self.cache_times.pop(user_id, None)
Pattern 2: Lazy Context Loading
Only fetch detailed context when the agent actually needs it:
def run_with_lazy_context(user_id: str, message: str):
"""Start with minimal context, fetch more if needed."""
# Quick classification: does this query need personalization?
classification = client.chat.completions.create(
model="gpt-4o-mini", # Fast, cheap model for classification
messages=[
{"role": "system", "content": "Classify if this query needs user personalization. Reply YES or NO only."},
{"role": "user", "content": message}
],
max_tokens=10
)
needs_context = "YES" in classification.choices[0].message.content.upper()
if needs_context:
# Fetch full context
user_context = context_service.get_context(user_id)
system_prompt = build_system_prompt(base_prompt, user_context)
else:
# Generic response is fine
system_prompt = base_prompt
# Continue with appropriate prompt
return run_completion(system_prompt, message)
Pattern 3: Context Summarization
If context is large, summarize it before injection:
def summarize_context_for_query(context: UserContext, query: str) -> str:
"""Generate a query-relevant context summary."""
full_context = context.to_prompt_string()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": """Extract only the context relevant to
answering the user's query. Be concise—include only what's directly useful."""},
{"role": "user", "content": f"Query: {query}\n\nFull context:\n{full_context}"}
],
max_tokens=500
)
return response.choices[0].message.content
Testing Your Context Integration
Test 1: Context Injection Verification
Verify context actually influences responses:
def test_context_affects_response():
"""Test that context changes the agent's response."""
# Same query, different contexts
query = "What should I eat for lunch?"
# Context A: Health-focused user
context_a = UserContext(
user_id="test_a",
summary="Health-conscious professional tracking macros",
preferences={"diet": "high protein", "cuisine": "Mediterranean"},
recent_patterns=["Meal prepped chicken salads this week"]
)
# Context B: Convenience-focused user
context_b = UserContext(
user_id="test_b",
summary="Busy parent with limited lunch breaks",
preferences={"priority": "fast", "budget": "under $15"},
recent_patterns=["Ordered DoorDash 3x this week"]
)
response_a = run_with_context(query, context_a)
response_b = run_with_context(query, context_b)
# Responses should differ based on context
assert "salad" in response_a.lower() or "protein" in response_a.lower()
assert "quick" in response_b.lower() or "delivery" in response_b.lower()
Test 2: Graceful Degradation
Ensure the agent works when context is unavailable:
def test_graceful_degradation():
"""Test agent handles missing context gracefully."""
# Simulate context service failure
response = run_agent("nonexistent_user", "What's the weather like?")
# Should still provide a reasonable response
assert response is not None
assert "error" not in response.lower()
# Should ask for clarification or provide general help
assert "location" in response.lower() or "weather" in response.lower()
Test 3: Privacy Boundaries
Verify context doesn't leak between users:
def test_context_isolation():
"""Test that one user's context doesn't leak to another."""
# User A has specific preferences
response_a = run_agent("user_a", "What's my favorite restaurant?")
# User B asks the same question
response_b = run_agent("user_b", "What's my favorite restaurant?")
# Responses should be different or both ask for clarification
# They should NOT share User A's preferences
assert response_a != response_b or "don't have information" in response_b.lower()
Common Pitfalls and Solutions
Pitfall 1: Over-Personalization
Problem: The agent references context unnecessarily, making responses feel intrusive.
User: What time is it?
Bad: Based on your preference for concise responses and your location in Cambridge
where you typically have meetings at 3pm, it's currently 2:47 PM EST.
Good: It's 2:47 PM EST.
Solution: Only reference context when it adds value. Classify queries and skip personalization for generic requests.
Pitfall 2: Stale Context
Problem: Context doesn't reflect recent changes.
Solution:
- Set appropriate cache TTLs (5-15 minutes for active sessions)
- Implement webhooks for context updates
- Allow users to trigger context refresh
Pitfall 3: Context Window Bloat
Problem: Injecting full context on every request wastes tokens.
Solution:
- Summarize context relevant to the current query
- Use tools for on-demand context lookup instead of upfront injection
- Cache summarizations for common query types
Pitfall 4: Single Point of Failure
Problem: Context API downtime breaks your agent.
Solution:
- Implement fallback to generic responses
- Cache last-known-good context
- Set aggressive timeouts (2-5 seconds)
- Alert on elevated failure rates
Frequently Asked Questions
What's the difference between user context and conversation history?
Conversation history is what was said in the current (or recent) conversation. It's sequential, exact, and grows with each message.
User context is a structured summary of who the user is across all interactions and external signals. It includes preferences, patterns, facts, and metadata that may never have been explicitly stated.
Think of it this way: conversation history is the transcript; user context is what the agent "knows" about the user.
How much context should I inject into the system prompt?
Start small—200-400 tokens of summarized context is usually enough. If you need more, use on-demand tools instead of upfront injection. The goal is signal density: every token should add value.
Can I use RAG instead of a context API?
Yes, and sometimes you should. RAG excels when you have unstructured user data (documents, emails, notes) and need to retrieve specific passages. Context APIs excel when you need structured, pre-summarized context that's ready to inject without additional processing.
Many production systems use both: a context API for user profile/preferences and RAG for document retrieval.
How do I handle privacy and consent?
This is critical. Best practices:
- Explicit consent: Users should opt into context collection
- Granular controls: Let users choose what context to share
- Data minimization: Only collect context you'll actually use
- Transparency: Show users what context you have about them
- Right to delete: Provide clear mechanisms for users to delete their context
What if the context is wrong?
Build correction mechanisms:
- Let users explicitly correct facts ("No, I'm vegetarian now")
- Weight recent signals over old ones
- Implement confidence scoring for context items
- Allow agents to ask for clarification when context seems contradictory
How often should context refresh?
Depends on your use case:
- Real-time apps (chat): Cache for 5-15 minutes
- Async apps (email): Refresh on each request
- Background jobs: Fetch fresh context at job start
For context that changes frequently (location), consider streaming updates or websockets.
Can I build my own context layer instead of using an API?
Absolutely. Here's what you'll need:
- Data ingestion: Collect signals from various sources
- Storage: Time-series database for events, key-value for preferences
- Aggregation: Reduce raw signals into structured context
- Summarization: LLM-powered synthesis of context into prompts
- Serving: Low-latency API for context retrieval
- Privacy: Consent management, encryption, audit logging
Building this yourself gives you full control but requires significant engineering investment. Context APIs let you ship faster.
Conclusion
Adding user context transforms AI agents from stateless question-answerers into personalized assistants that know their users. The key insights:
-
Conversation history alone isn't enough. You need structured context that persists across sessions and includes signals the user never explicitly stated.
-
Start simple. Manual context in system prompts works for single-user apps. Graduate to context APIs as you scale.
-
Context as tools, not just prompts. Exposing context search as a tool gives agents flexibility to fetch what they need, when they need it.
-
Cache aggressively. Context doesn't change on every request. Cache it with appropriate TTLs.
-
Degrade gracefully. Your agent should work—less well, but still work—when context is unavailable.
The difference between a demo and a product is often the quality of context. Users don't notice when personalization works—it just feels natural. But they definitely notice when the agent asks for their name for the hundredth time.
Ready to build context-aware agents? Check out the Dytto API documentation to get started with user context in minutes.
This tutorial is part of our series on building production-ready AI agents. For more on context engineering, see our guide to context-aware AI agents.