How to Monitor Every API Call Your LangChain Agent Makes
Your LangChain agent is making API calls you never see. Here's how to watch them.
You built a LangChain agent. It works. It calls OpenAI, fetches from APIs, queries vector stores. But do you actually know what it's sending? What headers, what payloads, what tokens it's burning through on each request?
Most developers don't. LangChain abstracts away the HTTP layer — that's the point. But when something breaks, when costs spike, or when you're debugging a hallucination, you need to see the raw traffic.
In this tutorial, I'll show you how to use toran.sh to intercept and inspect every outbound API call from a LangChain agent — in real-time, with zero code changes beyond swapping a URL.
The Problem: LangChain's Black Box
Here's a typical LangChain agent:
from langchain.agents import initialize_agent, load_tools
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
agent.invoke("What is the population of Tokyo divided by the population of Paris?")
Even with verbose=True, you see the chain-of-thought — not the actual HTTP requests. You don't see:
- How many API calls were made to OpenAI
- The exact prompt tokens sent in each request
- Response latencies for each call
- Whether retry logic kicked in
- What your SerpAPI calls actually looked like
LangSmith exists for tracing chains, but it operates at the framework level. Sometimes you need to see the raw HTTP — the actual bytes on the wire.
The Fix: Route Through toran.sh
toran.sh works by giving you a unique URL that proxies your API calls. You swap your base URL, and every request flows through toran's dashboard where you can inspect it in real-time.
No SDK. No proxy configuration. No code beyond changing one URL string.
Step 1: Get Your toran.sh Endpoint
Head to toran.sh and create a channel. You'll get a slug like my-langchain-debug. No signup required for basic use.
Your proxy URL becomes: https://my-langchain-debug.toran.sh
Step 2: Point LangChain at toran
Here's the key change — swap the base_url on your OpenAI client:
from langchain_openai import ChatOpenAI
# Before: calls api.openai.com directly
# llm = ChatOpenAI(model="gpt-4o")
# After: routes through toran.sh
llm = ChatOpenAI(
model="gpt-4o",
base_url="https://my-langchain-debug.toran.sh/v1",
)
That's it. One line. LangChain sends all OpenAI requests through toran, which forwards them to OpenAI and logs everything.
Step 3: Run Your Agent and Watch
Open your toran.sh dashboard in one window. Run your agent in another:
from langchain.agents import initialize_agent, load_tools
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
temperature=0,
base_url="https://my-langchain-debug.toran.sh/v1",
)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
result = agent.invoke("What is the population of Tokyo divided by the population of Paris?")
print(result)
Now check your toran dashboard. You'll see every request in real-time:
- POST /v1/chat/completions — the initial reasoning call
- POST /v1/chat/completions — after the agent gets SerpAPI results and thinks again
- POST /v1/chat/completions — the math step
- POST /v1/chat/completions — the final answer synthesis
Click any request to see the full payload: the system prompt, the conversation history that LangChain assembled, the function definitions, token counts, and the complete response.
What You'll Discover
Once you start watching, you'll notice things:
1. Agents make more calls than you think
A simple ReAct agent answering one question might make 4-6 OpenAI calls. An agent with multiple tools can easily hit 10+. Each one costs tokens.
2. The prompts are massive
LangChain injects tool descriptions, formatting instructions, and conversation history into every call. Your "simple question" might be wrapped in 2,000 tokens of scaffolding.
3. Retries happen silently
If OpenAI returns a 429 or 500, the client retries. Without toran, you'd never know. With it, you see the failed request, the delay, and the retry.
4. Response times vary wildly
Some calls return in 500ms. Others take 8 seconds. The dashboard shows you exactly which calls are slow, so you can optimize the right thing.
Monitoring Multiple Services
LangChain agents often call more than just OpenAI. If you're using other LLM providers or APIs that support base URL configuration, you can route them through toran too:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
# Monitor OpenAI calls
openai_llm = ChatOpenAI(
model="gpt-4o",
base_url="https://my-langchain-debug.toran.sh/v1",
)
# For any HTTP-based tool or API, swap the base URL
# to route through your toran channel
Every call from every service shows up in one dashboard.
Production Use: Keep It Running
toran.sh isn't just for debugging. Keep it in your staging or production pipeline:
import os
# Toggle monitoring via environment variable
base_url = os.getenv("TORAN_BASE_URL", "https://api.openai.com/v1")
llm = ChatOpenAI(
model="gpt-4o",
base_url=base_url,
)
Set TORAN_BASE_URL in staging to monitor. Remove it in production for direct calls. Or leave it on — the proxy adds minimal latency.
Pricing
toran.sh has a free tier that requires no signup — perfect for debugging sessions. If you need persistent logging and team features:
- Free — No signup, basic real-time inspection
- Pro ($29/mo) — Extended history, team access
- Pro Plus ($99/mo) — Full retention, priority support
Wrap-Up
LangChain is powerful, but abstraction comes at a cost: you lose visibility into what's actually happening at the network level. toran.sh gives it back with a one-line change.
Next time your agent burns through $5 of tokens on a single question, or takes 30 seconds to respond, or returns something bizarre — don't guess. Watch the calls.
👉 toran.sh — start monitoring in 30 seconds.
Top comments (0)