Back to Blog

User Context API Tutorial: Building AI Agents That Actually Know Your Users

Dytto Team
dyttotutorialcontext-apiai-agentsopenailangchainpersonalizationdeveloper-toolsllm

User Context API Tutorial: Building AI Agents That Actually Know Your Users

Your AI agent answers questions accurately, executes tasks reliably, and generally works great—until it asks "What's your name?" for the hundredth time. This is the stateless agent problem: every conversation starts from scratch, every user is a stranger, and personalization is impossible without manual intervention.

This tutorial shows you how to fix that. You'll learn how to integrate user context into AI agents using a context API, with working code examples you can run today.

What You'll Learn

  • Why AI agents need external context and why conversation history alone isn't enough
  • Three approaches to adding user context (and when to use each)
  • How to integrate a context API into OpenAI function calls
  • Building a context-aware LangChain agent from scratch
  • Patterns for caching, refreshing, and optimizing context retrieval
  • Testing strategies to verify your context layer works correctly

Prerequisites

Before diving in, you should have:

  • Basic familiarity with Python and REST APIs
  • An OpenAI API key (or equivalent LLM provider)
  • Understanding of how LLM agents work (tool calling, system prompts)

Why AI Agents Need External Context

Consider this interaction with a typical AI assistant:

User: What should I have for lunch?
AI: I'd be happy to help! What kind of food do you enjoy? Do you have any dietary restrictions? What's your budget? Are you cooking or ordering?

The AI doesn't know anything about the user, so it asks for context. Every. Single. Time.

Now imagine the same interaction with user context:

User: What should I have for lunch?
AI: Since you mentioned wanting to eat healthier this week, and you're near the office in Cambridge, there's that Mediterranean place you liked on Mass Ave. They have the quinoa bowl you ordered last month—high protein, around $14.

The difference isn't intelligence—it's information. The second agent has access to:

  • User preferences (eating healthier)
  • Location (Cambridge, near office)
  • History (Mediterranean place, quinoa bowl)
  • Constraints (budget context from past orders)

This context transforms a generic assistant into a personalized one.

Why Conversation History Isn't Enough

You might think: "Just store conversation history. Problem solved."

Not quite. Conversation history has three fundamental limitations:

  1. Context windows are finite. Even with 128K or 200K token windows, years of user interactions won't fit. You need selective retrieval.

  2. Relevant context is distributed. The user's food preferences might be spread across 50 conversations over 6 months. RAG on conversation logs can help, but it's inefficient and noisy.

  3. Some context was never stated. Users don't announce "I'm in Cambridge today" or "my budget is tight this week." This context comes from external sources—calendars, location, spending patterns.

User context APIs solve these problems by maintaining a structured, queryable representation of who the user is, what they've done, and what matters to them.


Three Approaches to User Context

Before we dive into implementation, let's compare your options:

Approach 1: Manual Context in System Prompts

The simplest approach: hard-code user information into your system prompt.

system_prompt = """You are a helpful assistant for Sarah.

User context:
- Name: Sarah Chen
- Role: Product Manager at TechCorp
- Preferences: Prefers concise responses, uses Notion for notes
- Current project: Q1 roadmap planning
"""

Pros:

  • Zero infrastructure required
  • Full control over what context is included
  • Works immediately with any LLM

Cons:

  • Doesn't scale beyond a handful of users
  • Context quickly becomes stale
  • Manual updates required for every change
  • No dynamic retrieval based on query

Best for: Personal assistants with a single user, prototypes, demos.

Approach 2: RAG on User Data

Use retrieval-augmented generation on user documents, messages, or activity logs.

from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Index user data
vectorstore = Chroma.from_documents(
    user_documents,
    embedding=OpenAIEmbeddings()
)

# Retrieve relevant context at query time
relevant_context = vectorstore.similarity_search(
    query=user_query,
    k=5
)

Pros:

  • Scales to large amounts of user data
  • Retrieves contextually relevant information
  • Can include conversation history, documents, notes

Cons:

  • Retrieval quality varies significantly
  • Embedding + search latency on every request
  • Raw documents often need summarization
  • Privacy concerns with storing raw user data

Best for: Knowledge-heavy applications, document-based assistants, when you have unstructured user data you want to leverage.

Approach 3: Dedicated Context API

Use a specialized API that maintains structured user context and serves it on demand.

import requests

# Fetch user context
response = requests.get(
    "https://api.dytto.app/v1/context",
    headers={"Authorization": f"Bearer {token}"}
)
context = response.json()

# Context is structured, summarized, ready to inject
user_summary = context["narrative_summary"]
preferences = context["preferences"]
recent_activity = context["recent_patterns"]

Pros:

  • Pre-structured and summarized context
  • Single API call, low latency
  • Handles context aggregation across sources
  • Built-in privacy and consent management

Cons:

  • Requires integration with context provider
  • Depends on user opting into context collection
  • Monthly API costs at scale

Best for: Production applications with many users, when you need rich behavioral context, when privacy/consent is important.


Tutorial: Building a Context-Aware Agent

Let's build a practical example. We'll create an AI assistant that uses the Dytto Context API to personalize responses.

Step 1: Set Up Your Environment

pip install openai requests python-dotenv

Create a .env file:

OPENAI_API_KEY=sk-...
DYTTO_API_KEY=dyt_...

Step 2: Create the Context Fetcher

First, build a function that retrieves user context:

# context_service.py
import os
import requests
from typing import Optional
from dataclasses import dataclass

@dataclass
class UserContext:
    """Structured user context from the API."""
    user_id: str
    summary: str
    preferences: dict
    recent_patterns: list
    location: Optional[dict] = None
    
    def to_prompt_string(self) -> str:
        """Format context for injection into system prompt."""
        lines = [
            f"## User Context",
            f"Summary: {self.summary}",
            "",
            "### Preferences:",
        ]
        for key, value in self.preferences.items():
            lines.append(f"- {key}: {value}")
        
        if self.recent_patterns:
            lines.append("")
            lines.append("### Recent Patterns:")
            for pattern in self.recent_patterns[:5]:  # Limit to 5
                lines.append(f"- {pattern}")
        
        if self.location:
            lines.append("")
            lines.append(f"### Current Location: {self.location.get('city', 'Unknown')}")
        
        return "\n".join(lines)


class ContextService:
    """Service for fetching user context from Dytto API."""
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://api.dytto.app/v1"
    
    def get_context(self, user_id: str) -> Optional[UserContext]:
        """Fetch context for a specific user."""
        try:
            response = requests.get(
                f"{self.base_url}/context",
                headers={
                    "Authorization": f"Bearer {self.api_key}",
                    "X-User-ID": user_id
                },
                timeout=5.0
            )
            response.raise_for_status()
            data = response.json()
            
            return UserContext(
                user_id=user_id,
                summary=data.get("narrative_summary", ""),
                preferences=data.get("preferences", {}),
                recent_patterns=data.get("patterns", []),
                location=data.get("location")
            )
        except requests.RequestException as e:
            print(f"Context fetch failed: {e}")
            return None
    
    def search_context(self, user_id: str, query: str) -> list:
        """Search user context for specific topics."""
        try:
            response = requests.post(
                f"{self.base_url}/context/search",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={"user_id": user_id, "query": query},
                timeout=5.0
            )
            response.raise_for_status()
            return response.json().get("results", [])
        except requests.RequestException:
            return []

Step 3: Build the Context-Aware Agent

Now create an agent that uses this context:

# agent.py
import os
from openai import OpenAI
from context_service import ContextService, UserContext
from dotenv import load_dotenv

load_dotenv()

client = OpenAI()
context_service = ContextService(os.getenv("DYTTO_API_KEY"))

def build_system_prompt(base_prompt: str, user_context: UserContext = None) -> str:
    """Build system prompt with optional user context."""
    if not user_context:
        return base_prompt
    
    return f"""{base_prompt}

{user_context.to_prompt_string()}

Use this context to personalize your responses. Reference specific details 
when relevant, but don't be creepy about it—integrate naturally.
"""

def run_agent(user_id: str, message: str) -> str:
    """Run the agent with user context."""
    
    # Fetch user context
    user_context = context_service.get_context(user_id)
    
    # Build the system prompt
    base_prompt = """You are a helpful personal assistant. You help users 
    with tasks, answer questions, and provide recommendations. Be concise 
    but thorough."""
    
    system_prompt = build_system_prompt(base_prompt, user_context)
    
    # Call the LLM
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": message}
        ],
        temperature=0.7
    )
    
    return response.choices[0].message.content


# Example usage
if __name__ == "__main__":
    user_id = "user_12345"
    
    # Without context, this would prompt for clarification
    response = run_agent(user_id, "What should I have for dinner tonight?")
    print(response)

Step 4: Add Context as a Tool

For more dynamic context retrieval, expose context search as a tool the agent can call:

# agent_with_tools.py
import json
from openai import OpenAI
from context_service import ContextService

client = OpenAI()
context_service = ContextService(os.getenv("DYTTO_API_KEY"))

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_user_context",
            "description": "Search the user's personal context for specific information. Use this when you need to know something about the user that isn't in the current conversation.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "What to search for in user context (e.g., 'food preferences', 'work schedule', 'recent purchases')"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_user_location",
            "description": "Get the user's current location if available.",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": []
            }
        }
    }
]

def handle_tool_call(user_id: str, tool_name: str, arguments: dict) -> str:
    """Execute a tool call and return the result."""
    if tool_name == "search_user_context":
        results = context_service.search_context(user_id, arguments["query"])
        if results:
            return json.dumps({"found": True, "results": results})
        return json.dumps({"found": False, "message": "No relevant context found"})
    
    elif tool_name == "get_user_location":
        context = context_service.get_context(user_id)
        if context and context.location:
            return json.dumps(context.location)
        return json.dumps({"available": False})
    
    return json.dumps({"error": "Unknown tool"})


def run_agent_with_tools(user_id: str, message: str) -> str:
    """Run agent with tool-calling capability."""
    
    # Start with base context in system prompt
    user_context = context_service.get_context(user_id)
    
    messages = [
        {
            "role": "system", 
            "content": f"""You are a helpful personal assistant with access to the user's 
            personal context. You can search their context for specific information.
            
            Current user summary: {user_context.summary if user_context else 'No context available'}
            """
        },
        {"role": "user", "content": message}
    ]
    
    # Initial call
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    assistant_message = response.choices[0].message
    
    # Handle tool calls if any
    while assistant_message.tool_calls:
        messages.append(assistant_message)
        
        for tool_call in assistant_message.tool_calls:
            result = handle_tool_call(
                user_id,
                tool_call.function.name,
                json.loads(tool_call.function.arguments)
            )
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })
        
        # Get next response
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        assistant_message = response.choices[0].message
    
    return assistant_message.content

LangChain Integration

If you're using LangChain, here's how to integrate user context:

# langchain_agent.py
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain.tools import Tool, StructuredTool
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from pydantic import BaseModel, Field

class ContextSearchInput(BaseModel):
    query: str = Field(description="What to search for in user context")

def create_context_tools(context_service, user_id: str):
    """Create LangChain tools for context access."""
    
    def search_context(query: str) -> str:
        results = context_service.search_context(user_id, query)
        if results:
            return f"Found relevant context: {results}"
        return "No relevant context found for this query."
    
    def get_full_context() -> str:
        context = context_service.get_context(user_id)
        if context:
            return context.to_prompt_string()
        return "No user context available."
    
    return [
        StructuredTool.from_function(
            func=search_context,
            name="search_user_context",
            description="Search for specific information in the user's personal context",
            args_schema=ContextSearchInput
        ),
        Tool.from_function(
            func=get_full_context,
            name="get_user_profile",
            description="Get the user's full context profile including preferences and patterns"
        )
    ]


def create_context_aware_agent(context_service, user_id: str):
    """Create a LangChain agent with context access."""
    
    # Fetch initial context for system prompt
    user_context = context_service.get_context(user_id)
    context_summary = user_context.summary if user_context else "No context available"
    
    # Create prompt with context
    prompt = ChatPromptTemplate.from_messages([
        ("system", f"""You are a helpful personal assistant. You have access to 
        the user's personal context and can search for specific information.
        
        Current user summary: {context_summary}
        
        Use the available tools to look up specific details when needed.
        Personalize your responses based on what you know about the user."""),
        MessagesPlaceholder(variable_name="chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad")
    ])
    
    # Create tools
    tools = create_context_tools(context_service, user_id)
    
    # Create agent
    llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
    agent = create_openai_tools_agent(llm, tools, prompt)
    
    return AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=True,
        handle_parsing_errors=True
    )


# Usage
if __name__ == "__main__":
    from context_service import ContextService
    
    context_service = ContextService(os.getenv("DYTTO_API_KEY"))
    agent = create_context_aware_agent(context_service, "user_12345")
    
    result = agent.invoke({"input": "What's a good restaurant for dinner tonight?"})
    print(result["output"])

Optimization Patterns

Pattern 1: Context Caching

Don't fetch context on every request. Cache it with a reasonable TTL:

from functools import lru_cache
from datetime import datetime, timedelta
import threading

class CachedContextService:
    def __init__(self, context_service, ttl_seconds: int = 300):
        self.context_service = context_service
        self.ttl_seconds = ttl_seconds
        self.cache = {}
        self.cache_times = {}
        self.lock = threading.Lock()
    
    def get_context(self, user_id: str):
        with self.lock:
            now = datetime.now()
            
            # Check cache
            if user_id in self.cache:
                cache_time = self.cache_times[user_id]
                if now - cache_time < timedelta(seconds=self.ttl_seconds):
                    return self.cache[user_id]
            
            # Fetch fresh
            context = self.context_service.get_context(user_id)
            if context:
                self.cache[user_id] = context
                self.cache_times[user_id] = now
            
            return context
    
    def invalidate(self, user_id: str):
        """Invalidate cache for a user (call when context changes)."""
        with self.lock:
            self.cache.pop(user_id, None)
            self.cache_times.pop(user_id, None)

Pattern 2: Lazy Context Loading

Only fetch detailed context when the agent actually needs it:

def run_with_lazy_context(user_id: str, message: str):
    """Start with minimal context, fetch more if needed."""
    
    # Quick classification: does this query need personalization?
    classification = client.chat.completions.create(
        model="gpt-4o-mini",  # Fast, cheap model for classification
        messages=[
            {"role": "system", "content": "Classify if this query needs user personalization. Reply YES or NO only."},
            {"role": "user", "content": message}
        ],
        max_tokens=10
    )
    
    needs_context = "YES" in classification.choices[0].message.content.upper()
    
    if needs_context:
        # Fetch full context
        user_context = context_service.get_context(user_id)
        system_prompt = build_system_prompt(base_prompt, user_context)
    else:
        # Generic response is fine
        system_prompt = base_prompt
    
    # Continue with appropriate prompt
    return run_completion(system_prompt, message)

Pattern 3: Context Summarization

If context is large, summarize it before injection:

def summarize_context_for_query(context: UserContext, query: str) -> str:
    """Generate a query-relevant context summary."""
    
    full_context = context.to_prompt_string()
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": """Extract only the context relevant to 
            answering the user's query. Be concise—include only what's directly useful."""},
            {"role": "user", "content": f"Query: {query}\n\nFull context:\n{full_context}"}
        ],
        max_tokens=500
    )
    
    return response.choices[0].message.content

Testing Your Context Integration

Test 1: Context Injection Verification

Verify context actually influences responses:

def test_context_affects_response():
    """Test that context changes the agent's response."""
    
    # Same query, different contexts
    query = "What should I eat for lunch?"
    
    # Context A: Health-focused user
    context_a = UserContext(
        user_id="test_a",
        summary="Health-conscious professional tracking macros",
        preferences={"diet": "high protein", "cuisine": "Mediterranean"},
        recent_patterns=["Meal prepped chicken salads this week"]
    )
    
    # Context B: Convenience-focused user
    context_b = UserContext(
        user_id="test_b", 
        summary="Busy parent with limited lunch breaks",
        preferences={"priority": "fast", "budget": "under $15"},
        recent_patterns=["Ordered DoorDash 3x this week"]
    )
    
    response_a = run_with_context(query, context_a)
    response_b = run_with_context(query, context_b)
    
    # Responses should differ based on context
    assert "salad" in response_a.lower() or "protein" in response_a.lower()
    assert "quick" in response_b.lower() or "delivery" in response_b.lower()

Test 2: Graceful Degradation

Ensure the agent works when context is unavailable:

def test_graceful_degradation():
    """Test agent handles missing context gracefully."""
    
    # Simulate context service failure
    response = run_agent("nonexistent_user", "What's the weather like?")
    
    # Should still provide a reasonable response
    assert response is not None
    assert "error" not in response.lower()
    # Should ask for clarification or provide general help
    assert "location" in response.lower() or "weather" in response.lower()

Test 3: Privacy Boundaries

Verify context doesn't leak between users:

def test_context_isolation():
    """Test that one user's context doesn't leak to another."""
    
    # User A has specific preferences
    response_a = run_agent("user_a", "What's my favorite restaurant?")
    
    # User B asks the same question
    response_b = run_agent("user_b", "What's my favorite restaurant?")
    
    # Responses should be different or both ask for clarification
    # They should NOT share User A's preferences
    assert response_a != response_b or "don't have information" in response_b.lower()

Common Pitfalls and Solutions

Pitfall 1: Over-Personalization

Problem: The agent references context unnecessarily, making responses feel intrusive.

User: What time is it?
Bad: Based on your preference for concise responses and your location in Cambridge 
     where you typically have meetings at 3pm, it's currently 2:47 PM EST.
Good: It's 2:47 PM EST.

Solution: Only reference context when it adds value. Classify queries and skip personalization for generic requests.

Pitfall 2: Stale Context

Problem: Context doesn't reflect recent changes.

Solution:

  • Set appropriate cache TTLs (5-15 minutes for active sessions)
  • Implement webhooks for context updates
  • Allow users to trigger context refresh

Pitfall 3: Context Window Bloat

Problem: Injecting full context on every request wastes tokens.

Solution:

  • Summarize context relevant to the current query
  • Use tools for on-demand context lookup instead of upfront injection
  • Cache summarizations for common query types

Pitfall 4: Single Point of Failure

Problem: Context API downtime breaks your agent.

Solution:

  • Implement fallback to generic responses
  • Cache last-known-good context
  • Set aggressive timeouts (2-5 seconds)
  • Alert on elevated failure rates

Frequently Asked Questions

What's the difference between user context and conversation history?

Conversation history is what was said in the current (or recent) conversation. It's sequential, exact, and grows with each message.

User context is a structured summary of who the user is across all interactions and external signals. It includes preferences, patterns, facts, and metadata that may never have been explicitly stated.

Think of it this way: conversation history is the transcript; user context is what the agent "knows" about the user.

How much context should I inject into the system prompt?

Start small—200-400 tokens of summarized context is usually enough. If you need more, use on-demand tools instead of upfront injection. The goal is signal density: every token should add value.

Can I use RAG instead of a context API?

Yes, and sometimes you should. RAG excels when you have unstructured user data (documents, emails, notes) and need to retrieve specific passages. Context APIs excel when you need structured, pre-summarized context that's ready to inject without additional processing.

Many production systems use both: a context API for user profile/preferences and RAG for document retrieval.

This is critical. Best practices:

  1. Explicit consent: Users should opt into context collection
  2. Granular controls: Let users choose what context to share
  3. Data minimization: Only collect context you'll actually use
  4. Transparency: Show users what context you have about them
  5. Right to delete: Provide clear mechanisms for users to delete their context

What if the context is wrong?

Build correction mechanisms:

  • Let users explicitly correct facts ("No, I'm vegetarian now")
  • Weight recent signals over old ones
  • Implement confidence scoring for context items
  • Allow agents to ask for clarification when context seems contradictory

How often should context refresh?

Depends on your use case:

  • Real-time apps (chat): Cache for 5-15 minutes
  • Async apps (email): Refresh on each request
  • Background jobs: Fetch fresh context at job start

For context that changes frequently (location), consider streaming updates or websockets.

Can I build my own context layer instead of using an API?

Absolutely. Here's what you'll need:

  1. Data ingestion: Collect signals from various sources
  2. Storage: Time-series database for events, key-value for preferences
  3. Aggregation: Reduce raw signals into structured context
  4. Summarization: LLM-powered synthesis of context into prompts
  5. Serving: Low-latency API for context retrieval
  6. Privacy: Consent management, encryption, audit logging

Building this yourself gives you full control but requires significant engineering investment. Context APIs let you ship faster.


Conclusion

Adding user context transforms AI agents from stateless question-answerers into personalized assistants that know their users. The key insights:

  1. Conversation history alone isn't enough. You need structured context that persists across sessions and includes signals the user never explicitly stated.

  2. Start simple. Manual context in system prompts works for single-user apps. Graduate to context APIs as you scale.

  3. Context as tools, not just prompts. Exposing context search as a tool gives agents flexibility to fetch what they need, when they need it.

  4. Cache aggressively. Context doesn't change on every request. Cache it with appropriate TTLs.

  5. Degrade gracefully. Your agent should work—less well, but still work—when context is unavailable.

The difference between a demo and a product is often the quality of context. Users don't notice when personalization works—it just feels natural. But they definitely notice when the agent asks for their name for the hundredth time.

Ready to build context-aware agents? Check out the Dytto API documentation to get started with user context in minutes.


This tutorial is part of our series on building production-ready AI agents. For more on context engineering, see our guide to context-aware AI agents.

All posts
Published on