DEV Community

Kilo Spark
Kilo Spark

Posted on

How to Monitor Every API Call Your LangChain Agent Makes

How to Monitor Every API Call Your LangChain Agent Makes

Your LangChain agent is making API calls you never see. Here's how to watch them.


You built a LangChain agent. It works. It calls OpenAI, fetches from APIs, queries vector stores. But do you actually know what it's sending? What headers, what payloads, what tokens it's burning through on each request?

Most developers don't. LangChain abstracts away the HTTP layer — that's the point. But when something breaks, when costs spike, or when you're debugging a hallucination, you need to see the raw traffic.

In this tutorial, I'll show you how to use toran.sh to intercept and inspect every outbound API call from a LangChain agent — in real-time, with zero code changes beyond swapping a URL.

The Problem: LangChain's Black Box

Here's a typical LangChain agent:

from langchain.agents import initialize_agent, load_tools
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

agent.invoke("What is the population of Tokyo divided by the population of Paris?")
Enter fullscreen mode Exit fullscreen mode

Even with verbose=True, you see the chain-of-thought — not the actual HTTP requests. You don't see:

  • How many API calls were made to OpenAI
  • The exact prompt tokens sent in each request
  • Response latencies for each call
  • Whether retry logic kicked in
  • What your SerpAPI calls actually looked like

LangSmith exists for tracing chains, but it operates at the framework level. Sometimes you need to see the raw HTTP — the actual bytes on the wire.

The Fix: Route Through toran.sh

toran.sh works by giving you a unique URL that proxies your API calls. You swap your base URL, and every request flows through toran's dashboard where you can inspect it in real-time.

No SDK. No proxy configuration. No code beyond changing one URL string.

Step 1: Get Your toran.sh Endpoint

Head to toran.sh and create a channel. You'll get a slug like my-langchain-debug. No signup required for basic use.

Your proxy URL becomes: https://my-langchain-debug.toran.sh

Step 2: Point LangChain at toran

Here's the key change — swap the base_url on your OpenAI client:

from langchain_openai import ChatOpenAI

# Before: calls api.openai.com directly
# llm = ChatOpenAI(model="gpt-4o")

# After: routes through toran.sh
llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://my-langchain-debug.toran.sh/v1",
)
Enter fullscreen mode Exit fullscreen mode

That's it. One line. LangChain sends all OpenAI requests through toran, which forwards them to OpenAI and logs everything.

Step 3: Run Your Agent and Watch

Open your toran.sh dashboard in one window. Run your agent in another:

from langchain.agents import initialize_agent, load_tools
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    base_url="https://my-langchain-debug.toran.sh/v1",
)

tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

result = agent.invoke("What is the population of Tokyo divided by the population of Paris?")
print(result)
Enter fullscreen mode Exit fullscreen mode

Now check your toran dashboard. You'll see every request in real-time:

  • POST /v1/chat/completions — the initial reasoning call
  • POST /v1/chat/completions — after the agent gets SerpAPI results and thinks again
  • POST /v1/chat/completions — the math step
  • POST /v1/chat/completions — the final answer synthesis

Click any request to see the full payload: the system prompt, the conversation history that LangChain assembled, the function definitions, token counts, and the complete response.

What You'll Discover

Once you start watching, you'll notice things:

1. Agents make more calls than you think

A simple ReAct agent answering one question might make 4-6 OpenAI calls. An agent with multiple tools can easily hit 10+. Each one costs tokens.

2. The prompts are massive

LangChain injects tool descriptions, formatting instructions, and conversation history into every call. Your "simple question" might be wrapped in 2,000 tokens of scaffolding.

3. Retries happen silently

If OpenAI returns a 429 or 500, the client retries. Without toran, you'd never know. With it, you see the failed request, the delay, and the retry.

4. Response times vary wildly

Some calls return in 500ms. Others take 8 seconds. The dashboard shows you exactly which calls are slow, so you can optimize the right thing.

Monitoring Multiple Services

LangChain agents often call more than just OpenAI. If you're using other LLM providers or APIs that support base URL configuration, you can route them through toran too:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

# Monitor OpenAI calls
openai_llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://my-langchain-debug.toran.sh/v1",
)

# For any HTTP-based tool or API, swap the base URL
# to route through your toran channel
Enter fullscreen mode Exit fullscreen mode

Every call from every service shows up in one dashboard.

Production Use: Keep It Running

toran.sh isn't just for debugging. Keep it in your staging or production pipeline:

import os

# Toggle monitoring via environment variable
base_url = os.getenv("TORAN_BASE_URL", "https://api.openai.com/v1")

llm = ChatOpenAI(
    model="gpt-4o",
    base_url=base_url,
)
Enter fullscreen mode Exit fullscreen mode

Set TORAN_BASE_URL in staging to monitor. Remove it in production for direct calls. Or leave it on — the proxy adds minimal latency.

Pricing

toran.sh has a free tier that requires no signup — perfect for debugging sessions. If you need persistent logging and team features:

  • Free — No signup, basic real-time inspection
  • Pro ($29/mo) — Extended history, team access
  • Pro Plus ($99/mo) — Full retention, priority support

Wrap-Up

LangChain is powerful, but abstraction comes at a cost: you lose visibility into what's actually happening at the network level. toran.sh gives it back with a one-line change.

Next time your agent burns through $5 of tokens on a single question, or takes 30 seconds to respond, or returns something bizarre — don't guess. Watch the calls.

👉 toran.sh — start monitoring in 30 seconds.

Top comments (0)