DEV Community

Cover image for After 7 Next.js 16 Caching Bugs, I Stopped Guessing and Built a System
Shubhra Pokhariya
Shubhra Pokhariya

Posted on

After 7 Next.js 16 Caching Bugs, I Stopped Guessing and Built a System

Silent manual tag mismatch fixes

There's a specific feeling you get after your third production caching incident.

It's not panic. It's worse than panic. It's that quiet realisation that you fixed the last bug correctly, and you still have no idea where the next one is hiding.

After the 7 silent caching bugs post, a pattern kept coming up in the comments. Everyone understood what was breaking, but not what the correct setup should look like. This is that answer.

Not theory. The actual system I use now, after getting burned enough times to understand why each piece exists.

The first problem: tag strings written from memory in different files

Most silent cache bugs in Next.js 16 start here. Not because anyone is being careless. Because there is nothing stopping two people from writing two different strings that should be the same.

Developer A writes the data function on Monday:

async function getProducts() {
  'use cache'
  cacheTag('product-list')
  return db.query('SELECT * FROM products')
}
Enter fullscreen mode Exit fullscreen mode

Developer B writes the mutation two weeks later in a different file:

export async function createProduct(data: ProductData) {
  await db.query('INSERT INTO products ...', [...])
  revalidateTag('products', 'max')
}
Enter fullscreen mode Exit fullscreen mode

product-list and products. Two different strings. Zero errors from TypeScript, zero warnings from Next.js. The product list never refreshes after a new product is created and nobody knows why until someone reads both files at the same time.

This is Bug 3 from the previous post. I kept hitting variations of it across different parts of the codebase even after I knew about it, because knowing about a problem and having something that prevents it are not the same thing.

The fix is one file that owns all your tag strings:

// lib/tags.ts
export const tags = {
  product: (id: string | number) => `product-${id}`,
  user: (id: string | number) => `user-${id}`,
  productList: 'products',
  userList: 'users',
  navigation: 'navigation',
} as const
Enter fullscreen mode Exit fullscreen mode

Now both files import from tags. A typo is a TypeScript compile error. The string mismatch bug cannot happen. Everyone on the team gets autocomplete instead of muscle memory.

// data function
cacheTag(tags.productList)

// mutation — same import, same string, guaranteed
revalidateTag(tags.productList, 'max')
Enter fullscreen mode Exit fullscreen mode

This is the single change that removed the most bugs from my codebase, by far. Set it up before you write a single cached function on a new project.

The second problem: three different places to invalidate cache, three different correct APIs

This is where I see the most confusion, including in my own early code. The API you reach for depends entirely on where you're calling from and what the user needs to see. Get it wrong and you either throw at runtime or silently give someone stale data.

Here is how I think about it now:

Inside a Server Action where the user who just made a change needs to see it immediately:

'use server'
import { updateTag, revalidateTag } from 'next/cache'

export async function updateProductPrice(id: string, newPrice: number) {
  await db.query('UPDATE products SET price = $1 WHERE id = $2', [newPrice, id])

  updateTag(tags.product(id))               // acting user sees fresh data right away
  revalidateTag(tags.product(id), 'max')    // everyone else gets SWR update
  revalidateTag(tags.productList, 'max')    // product list refreshes too
}
Enter fullscreen mode Exit fullscreen mode

The order matters here. updateTag runs first. This is what prevents the admin from clicking save, navigating back to the product page, and seeing the old price. That looks like the save failed. It causes people to click save again. updateTag fixes it.

updateTag is Server Actions only. Calling it anywhere else throws at runtime.

Inside a Route Handler (webhooks, external services):

// app/api/webhooks/stripe/route.ts
import { revalidateTag } from 'next/cache'

export async function POST(req: Request) {
  const event = await parseStripeWebhook(req)
  if (event.type === 'price.updated') {
    revalidateTag(tags.productList, { expire: 0 })
  }
  return new Response('ok', { status: 200 })
}
Enter fullscreen mode Exit fullscreen mode

updateTag is not available in Route Handlers. { expire: 0 } is the equivalent for immediate expiry here. This is what you want for webhooks where a third-party system just told you something changed.

Background updates where a brief stale window is fine:

revalidateTag(tags.productList, 'max')
Enter fullscreen mode Exit fullscreen mode

Stale-while-revalidate. Users get a fast cached response while fresh data loads behind the scenes. For most content this is exactly right. An admin publishes a new post, readers might see the old list for a moment, that is usually acceptable.

Here is the whole thing as a decision table:

Situation Use
User edits their own data and needs to see it immediately updateTag then revalidateTag
Webhook fires, third-party service needs immediate consistency revalidateTag(tag, { expire: 0 })
Background refresh, brief stale window is acceptable revalidateTag(tag, 'max')

Write this down somewhere your team can see it. Saves a lot of "why is the user seeing old data after saving" conversations.

The third problem: the PPR split is invisible by default

With cacheComponents: true, Next.js uses Partial Prerendering. Your page has a static shell that renders instantly from cache and dynamic holes that stream in after. The performance win is real. The problem is that what ends up in the shell versus what ends up as a dynamic hole is not obvious until something behaves wrong.

One component with cacheLife('seconds') gets quietly excluded from the static shell. A cookies() call inside a cached scope throws at build time with "Uncached data was accessed outside of Suspense" and gives you no component name, no file path, nothing useful. A dynamic component added without a Suspense boundary pushes part of the page out of the shell.

The way I stopped guessing about this is to document intent at the component level:

// components/UserCart.tsx
export const boundary = {
  name: 'UserCart',
  isDynamic: true,
  reason: 'Reads user session cookie — different per user',
}
Enter fullscreen mode Exit fullscreen mode

Then in the page that uses it, I reference that intent explicitly:

export default async function ProductPage({
  params,
}: {
  params: Promise<{ id: string }>
}) {
  const { id } = await params

  // UserCart is dynamic — must be in Suspense or it breaks the static shell
  return (
    <div>
      <ProductDetails id={id} />       {/* cached, part of static shell */}
      <RelatedProducts id={id} />      {/* cached, part of static shell */}
      <Suspense fallback={<CartSkeleton />}>
        <UserCart productId={id} />    {/* dynamic, streams in after */}
      </Suspense>
    </div>
  )
}
Enter fullscreen mode Exit fullscreen mode

The cached components look like this:

async function ProductDetails({ id }: { id: string }) {
  'use cache'
  cacheLife('hours')
  cacheTag(tags.product(id))

  const product = await db.query(
    'SELECT * FROM products WHERE id = $1', [id]
  )
  return <article>...</article>
}
Enter fullscreen mode Exit fullscreen mode

The dynamic component has no 'use cache' at all:

async function UserCart({ productId }: { productId: string }) {
  const cookieStore = await cookies()
  const userId = cookieStore.get('user-id')?.value
  const cartItem = await db.query(
    'SELECT * FROM cart WHERE user_id = $1 AND product_id = $2',
    [userId, productId]
  )
  return cartItem ? <InCartButton /> : <AddToCartButton />
}
Enter fullscreen mode Exit fullscreen mode

Static shell hits the user instantly. Cart streams in after. The split is intentional and documented, not whatever survived the algorithm.

One more thing on this: never call cookies(), headers(), or draftMode() inside a 'use cache' scope. Read them outside, pass the values as props. Those values become part of the cache key automatically — different users produce separate cache entries without you doing anything extra.

The fourth problem: cold starts hurt the first visitor after every deploy

This one is separate from the bugs but connects to the same goal. Your caching is set up correctly. You deploy. The first visitor hits the page and every cached function runs from scratch sequentially because the cache is empty.

PPR is fast once the cache is warm. That first request after a deploy is not.

The fix is React's cache() for request-level deduplication. Fire all your data fetches in parallel at the top of the page before any component needs them:

import { cache } from 'react'
import { getProductById, getRelatedProducts } from '@/lib/data'

const prefetch = {
  product: cache(getProductById),
  related: cache(getRelatedProducts),
}

export default async function ProductPage({
  params,
}: {
  params: Promise<{ id: string }>
}) {
  const { id } = await params

  void prefetch.product(id)
  void prefetch.related(id)

  return (
    <div>
      <ProductDetails id={id} />
      <RelatedProducts id={id} />
      <Suspense fallback={<CartSkeleton />}>
        <UserCart productId={id} />
      </Suspense>
    </div>
  )
}
Enter fullscreen mode Exit fullscreen mode

Both fetches fire immediately in parallel. Child components that call the same functions get deduplicated results from React's cache(). If a prefetch fails it fails silently. It is an optimisation, not a requirement. The actual fetching in child components still works.

The distinction worth knowing: React's cache() deduplicates within a single request. 'use cache' persists across requests. You need both, they solve different problems.

What the full system looks like

One tags file. Everyone imports from it. A typo is a compile error, not a production incident.

A clear decision for invalidation context: Server Action with a user waiting for their change uses updateTag first, then revalidateTag. Route Handler uses revalidateTag with { expire: 0 }. Background broadcast uses revalidateTag with 'max'.

Dynamic components documented and always wrapped in Suspense. The static shell is explicit, not accidental.

Prefetch fired in parallel at the top of heavy pages so the first visitor after a deploy is not the one paying the cold start cost.

None of this is complicated once you have it written down. The hard part was figuring out that I needed all of it, which took enough production bugs to see the pattern.

The earlier posts in this series cover how I got here. Building the debugger when development was a black box. The seven bugs that compile and break silently. The upgrade breaks that the build never warns you about.

If you want the full migration reference, I wrote that at shubhra.dev/tutorials/nextjs-16-cache-components.

I kept hitting these edge cases often enough that I eventually pulled the whole system into a single utility. The Cache Pro Kit is the production version of everything in this post. Type-safe tag registry, safeRevalidate that blocks the single-arg call at compile time, serverActionInvalidate that enforces the correct order, routeHandlerInvalidate so updateTag in a Route Handler is impossible. One file, drop into lib/.

What does your caching setup look like right now? Have you hit any of these in your own projects?

Top comments (23)

Collapse
 
syedahmershah profile image
Syed Ahmer Shah

This is exactly what the Next.js community needs right now. Honestly, 'stopped guessing and built a system' should be the official slogan for dealing with App Router caching.

The way you broke down the mental model—especially the interplay between the Request Memoization and the Data Cache—makes a notoriously opaque topic actually click. It’s one thing to read the official docs, but seeing how someone else tamed the beast in production is incredibly valuable. Saving this for the next time revalidatePath decides to ghost me. Thanks for putting this system together!

Collapse
 
shubhradev profile image
Shubhra Pokhariya

"revalidatePath ghosting you" is such a real feeling. Hopefully the tag system saves you from that next time, once you start thinking in tags it just feels more intentional and precise.

Glad the breakdown was useful, appreciate you reading it.

Collapse
 
99tools profile image
99Tools

Really useful article. The centralized tag registry idea is simple but can prevent a lot of frustrating cache invalidation bugs. I also liked the clear explanation of when to use updateTag vs revalidateTag—that's something many Next.js developers struggle with. Thanks for sharing practical solutions instead of just highlighting the problems.

Collapse
 
shubhradev profile image
Shubhra Pokhariya

Yeah that tag registry was the one that removed the most headaches for me. It’s such a small change but it stops a whole class of bugs before they even happen.

The updateTag vs revalidateTag confusion took me a while too. They look similar at first, but once I tied them to “who needs fresh data right now” vs “who can wait”, it started to click.

Appreciate you taking the time to read it. I mostly wrote it because I kept hitting the same issues over and over, so good to know it’s useful for others too.

Collapse
 
webdeveloperhyper profile image
Web Developer Hyper

Next.js 16 cache problems seem to have many patterns and are hard to figure out, but your post and tool will help solve them. Good work! 👍

Collapse
 
shubhradev profile image
Shubhra Pokhariya

Appreciate it, glad it was useful. That’s exactly what I kept running into too, same patterns, just hard to spot until something breaks.

Collapse
 
lcmd007 profile image
Andy Stewart

Next.js 16's caching mechanism is a black box with too many implicit traps. Your approach of unifying tag definitions and leveraging parallel prefetching is pure engineering. Turning fragile, silent runtime bugs into compile-time type safety is the exact way to terminate production cache killers!

Collapse
 
shubhradev profile image
Shubhra Pokhariya

Yeah, that’s exactly the pain point. The hardest part is the implicit behavior you only discover in production.
Moving those failure cases into something the type system can catch early was the main goal.

Collapse
 
halbonlabs profile image
Dan

The part about tag strings being written from memory in different files is painfully real. I like the "one tags file" fix because it turns a silent runtime bug into a typescript/autocomplete problem.

One extra place I'd be careful with this in saas apps is entitlement and identity state, not just product/data lists.

Things like:

  • current plan
  • hasAccess
  • role/admin permissions
  • team membership
  • cancellation state
  • account deletion or restore state

Those can become much nastier than a stale product list because the UI may look correct until someone crosses a billing or permission boundary.

Your updatetag-then-revalidate order is exactly what these need too: the person who just upgraded, downgraded, or had a role changed has to see the new state on the very next render, while everyone else gets the SWR update. The test I'd add to the system is to prove that after every auth/billing/admin mutation. Upgrade, downgrade, cancel, role change, team removal, admin edit. If any of those still read cached state, the cache bug becomes a support or billing bug instead of a performance bug.

Collapse
 
shubhradev profile image
Shubhra Pokhariya

The billing and permissions angle is something I hadn't thought to document explicitly. Was mostly thinking about product data when writing this, but you're right that entitlement state is a different category entirely.

Stale product list is annoying. Stale hasAccess or plan state is someone getting access they shouldn't, or locked out of something they paid for. That goes to support fast, and billing tickets are the worst kind to handle.

The invalidation pattern is already there in serverActionInvalidate but I hadn't thought to list out billing and auth mutations explicitly as things to verify. Upgrade, downgrade, cancel, role change, team removal, all treated as cache-critical. Only think to write that down after you've had the "user cancelled but still sees pro features" incident once.

Good addition, it makes the system stronger.

Collapse
 
nasifsid profile image
Nasif Sid

Really useful breakdown. The tag mismatch example is probably the most relatable part for me because it is exactly the kind of issue that looks small in code but becomes painful in production.

I like the idea of treating cache tags like shared constants instead of strings people write from memory. That single change makes the setup much safer, especially when multiple developers are touching data fetching and mutations in different files.

The decision table for invalidation is also helpful. updateTag, revalidateTag(tag, { expire: 0 }), and revalidateTag(tag, 'max') can be confusing if the team does not clearly define when to use each one.

Caching should not depend on memory or guessing. It needs structure, naming rules, and team-level conventions.

The biggest takeaway for me is that Next.js caching is powerful, but without a system, it can create silent bugs that are hard to trace. Great practical post.

Collapse
 
shubhradev profile image
Shubhra Pokhariya

Tag mismatch is probably the one that hurts the most because it looks totally fine in code review. Two different strings, zero errors, and you only realize something is wrong when users start seeing stale data in production. Moving to constants basically removes that problem entirely.

The decision table took me a while to get right too. Those three APIs look similar on the surface, but where you're calling from changes everything. Writing it down like that just removes a lot of “wait, which one do I use here” moments for the team.

“Caching should not depend on memory or guessing” is honestly the best one-line summary of the whole thing. Glad it was useful.

Collapse
 
mudassirworks profile image
Mudassir Khan

the updateTag before revalidateTag ordering in Server Actions is the piece i've seen most teams miss. we shipped with revalidateTag only for about two weeks before a product manager spotted the pattern: save button, navigate back, see old data. reports it as "save not working". it takes longer to triage than it should because the mutation succeeded and the cache did update — just not fast enough for the user who made the change.

the tags.ts singleton is the one i'm stealing. we've been enforcing this through code review which is the worst possible mechanism for it. compile time errors beat review comments on every axis.

quick question: do you type the tag registry values, or just use as const and let inference handle the rest?

Collapse
 
shubhradev profile image
Shubhra Pokhariya

Yeah that ordering is exactly where it starts breaking in a way users actually notice.

For the tags registry I just stick with as const and let TS infer it. Something like:

export const tags = {
product: (id: string | number) => product-${id},
productList: 'products',
userList: 'users',
} as const

I pull types from it when needed, but most of the time inference is enough. Didn’t feel worth adding more structure on top of that.

Curious how it works out for your team once you switch over.

Collapse
 
leob profile image
leob

I believe this is why so many people are getting fed up with Next.js and are looking for alternatives ...

Collapse
 
shubhradev profile image
Shubhra Pokhariya

I get why people are reacting that way.

For me it wasn’t the features, it was how often things fail silently. Build passes, CI is green, and you still ship something that behaves wrong under real conditions.

I still like working with Next.js, but that part caught me off guard more than once.

Once I stopped treating them as one-off bugs and put some structure around it, things got a lot more stable. But the path to that isn’t very obvious from the docs right now.

Collapse
 
leob profile image
leob • Edited

Maybe try to offer some advice or code to the Next.js team! Who knows, maybe they'll offer to make you a core maintainer :-)

P.S. of course it's (in most cases) not an option to "simply" migrate an app (especially a bigger one) to a different framework - and those other frameworks might have their own quirks/issues ...

Thread Thread
 
shubhradev profile image
Shubhra Pokhariya

Haha, that would be something 😄

Yeah, totally agree on migrations. Most of the time it’s not really practical, especially on bigger apps. Every framework comes with its own set of tradeoffs anyway.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.