I use NotebookLM as an attack layer before I publish anything.
Not for summaries. Not for research. I upload my draft, let the audio overview run, and listen to two AI hosts interrogate my argument. If the narration exposes a gap — a claim that doesn't land, a section that wanders — I go back and fix it before the article goes live.
It works. Better than re-reading. Better than asking a colleague. The distance of hearing your argument spoken back at you by something that doesn't share your assumptions is genuinely useful.
One thing is missing.
The voice. My voice.
What NotebookLM already does
The hard problems are solved.
NotebookLM ingests PDFs, Google Docs, audio files, YouTube links, and web URLs. It grounds its narration in your sources — it doesn't hallucinate beyond them. It produces coherent two-host audio that actually sounds like a conversation, not a text-to-speech dump. It maintains source fidelity across long documents.
These are not small engineering problems. Source grounding alone eliminates an entire class of failure that plagues every generic AI summarizer. The coherence of the narration — the way the two hosts disagree, interrupt, and redirect — required significant work to build.
Most people use it to summarize research papers. Some use it for meeting notes. I use it to stress-test arguments before they're public.
What nobody is talking about: Google solved the hard part already.
The one feature that changes everything
Personalized voice.
Not a voice clone of a podcast host. Not a generic British narrator. Your voice. Trained on your recordings, matched to your cadence, deployed on your content.
The moment Google ships that, three things happen simultaneously:
Audio overviews stop sounding like someone else reading your work. They start sounding like you, presenting your argument, in your register. The use case expands from "summarize this" to "publish this." And every startup selling AI-powered personal audio becomes a feature comparison in a product Google already owns.
The countdown timers
Let's name them.
ElevenLabs built a genuinely impressive voice cloning product. The quality is real. The API is well-documented. Developers use it. But ElevenLabs' core value proposition — "your voice, anywhere" — is exactly what Google would ship as a NotebookLM toggle. Not a new product. A settings page.
Podcastle sells AI-powered podcast creation with voice cloning and audio cleanup. It's a prosumer tool aimed at creators who want professional audio without a studio. It's also a collection of features that sit inside what NotebookLM already does structurally, minus the voice layer.
Wondercraft is an AI audio platform for turning written content into audio. Good product. Direct overlap with NotebookLM's architecture. One product update away from redundancy.
Descript is the most defensible of the group — it has a video editing layer, a timeline, a collaboration workflow. It isn't purely an audio generation tool. But its AI voice layer, "Overdub," is exactly the feature that would become noise the day Google ships personal voice to NotebookLM.
None of them are bad products. That's not the argument.
The argument is that their moat is a gap Google hasn't prioritized filling. That's a countdown timer, not a competitive advantage.
The Google tax
This pattern has a name. Developers who've been around long enough know it.
Google Workspace has a task manager. It's called Tasks. It's fine. Todoist, TickTick, and Things 3 all exist because Tasks is fine and not great, and "fine" left a gap big enough to build companies in.
Google Calendar handles scheduling. Calendly became a $3 billion company because Calendar doesn't do one specific thing — let other people book time in your calendar without an email thread. One feature. Entire company.
Google Keep exists. Notion exists anyway. The overlap is real; the gap was real enough to matter.
Some of these survive. Calendly survived because the booking workflow is genuinely distinct from what Calendar does natively. Notion survived because it expanded beyond the gap — documents, databases, wikis — before Google closed it.
The AI audio startups don't have that runway. They're not building adjacent to NotebookLM. They're building inside it. Their entire value proposition sits in the gap between what NotebookLM already does and the one feature Google hasn't shipped.
Why Google probably won't ship it next quarter
Let's be honest about the timeline.
Google is not aggressively pursuing this. NotebookLM is a Google Labs product — impressive, genuinely useful, and clearly not the company's primary focus. The team is small relative to the broader Gemini push. Personal voice cloning has real regulatory and ethical baggage — deepfakes, consent, liability — that slows any large company down in ways a startup can ignore.
The AI audio startups might have 18 months. Maybe 24.
But here's the thing about building on a platform gap: the clock isn't running on whether Google ships it. The clock is running on whether the market believes Google will ship it. The moment that belief takes hold — a Google I/O demo, a leak, a product page — the fundraising environment for AI audio personalization startups changes overnight.
They're not being hunted. They're being ignored.
Ignored by Google is its own kind of death sentence.
What this actually means for developers
If you're building in this space, the question isn't "can we build a better voice cloning product than ElevenLabs?" The question is: "What does this product do that would survive a Google I/O keynote?"
The companies that survive the Google tax survive it by expanding beyond the gap before it closes. Calendly survived by becoming a scheduling platform — reminders, routing, integrations — not just a booking link. Notion survived by becoming a workspace, not just a note-taking tool.
AI audio startups that survive will be the ones that embed voice into a workflow Google doesn't own. Video production pipelines. Podcast distribution. Live audio. Language learning. The ones that build deep into a workflow Google has no reason to touch.
The ones building "NotebookLM but with your voice" are the countdown timers.
I still use NotebookLM every time I publish. I still listen to two AI hosts interrogate my arguments while I cook.
I just do it in someone else's voice.
For now.
Top comments (18)
The NotebookLM-as-adversary trick I stole immediately. On moats though, the missing feature is never the moat, because platforms close those gaps on a whim. What survives is the workflow you built around the gap: deciding which critiques to act on and which to ignore is your call, and that doesn't ship in a Google update.
The judgment layer is what I didn't name explicitly. NotebookLM surfaces the gaps . That part is automatable. But deciding which gaps are real versus which ones reflect the tool misreading the frame is the editorial call that doesn't transfer. An AI host flagging a "weak argument" might just be encountering a deliberately provocative claim. Only the writer knows the difference.
That's also why the workflow compounds in a way the feature can't. Every decision you make about which critiques to act on sharpens your judgment about your own arguments. The tool stays the same. The editor using it gets better.
The 'editor gets better, tool stays the same' line is the whole moat in one sentence. The gap NotebookLM can't close is knowing when a flagged weakness is actually a deliberate choice. That's the part that compounds, and it's why I stopped trying to automate the last call.
Stopping the attempt is the unlock. Most of the frustration people have with AI writing tools is the fighting stage. trying to get the tool to make the last call so they don't have to. The ones who get real value out of it have quietly made the same decision you made. The tool handles everything up to the call. The call stays yours. Once that's settled it stops feeling like a limitation and starts feeling like the design.
The call stays yours is the cleanest way I've heard it put. The moment you stop fighting the tool to make the decision and just let it do everything up to that point, it goes from frustrating to genuinely useful.
Glad it landed. go build something worth stress-testing....
Strong piece, but I think your own examples argue against your headline variable.
Calendly and Notion didn't survive because they had a feature Google lacked. They survived because they own a workflow with switching costs — the booking loop, the workspace graph. The feature was the wedge; the workflow was the moat. By the time Google could copy the feature, the lock-in was elsewhere.
So "personalized voice" is the wrong thing to watch. The startups that die aren't the ones missing voice — they're the ones whose entire product is a generation step. Voice clone, audio overview, summary: these are all single transforms. Anything that reduces to one transform Google can ship is a countdown timer, regardless of how good the transform is.
The reframe I'd offer: the question isn't "what does this product do that survives a Google I/O keynote?" It's "is this product a feature or a system?" A feature is one transform. A system owns the loop around the transform and treats the model as a swappable component. The day a better model ships, a feature gets obsoleted and a system just upgrades its backend.
That's also why the generation layer is the worst place to build a moat and the best place to commoditize. The durable position is the layer that governs, verifies, and routes around the generation — the part Google has no incentive to build because it's specific to a workflow they don't own. Same conclusion you reach at the end, but I'd put the emphasis there from the start rather than on voice.
The feature/system split is the sharper cut. What I was circling — "does this survive a keynote?" is really asking whether the product is a loop or a transform. A transform is stateless. You put something in, get something out, nothing accumulates. The moat has to live somewhere else.
The Calendly case makes this precise: the switching cost isn't in the booking UI . it's distributed across every invitee who learned the link, every client who integrated it. Google can copy the feature; it can't ship the network externality.
Same test for AI audio: a voice clone that generates and forgets nothing is a transform. A product that accumulates correction patterns, audience signals, episode-level feedback over time — that's building something a settings toggle can't replicate. The question isn't voice. It's what persists after the generation step.
Which layer do you think is most defensible right now — the routing/verification layer you mention, or the data accumulation layer?
A verification/routing layer with nothing persisting behind it is just a transform with extra steps. Stateless gate in, gate out. Google ships that as a quality toggle. So routing alone isn't the moat.
But accumulation alone isn't either. A pile of correction patterns and audience signals that nothing acts on is inventory, and inventory decays - preferences drift, the model underneath changes, last year's signal is noise. Raw accumulated data is a liability with storage costs until something converts it into better output.
So the defensible thing isn't either layer. It's the coupling: accumulation gives you data nobody else has, and the verification/routing layer is the pump that turns that data into improved next-generation output. That feedback is what compounds the switching cost - same structure as your Calendly network externality, except the externality is internal. Every correction makes the next generation better for this user, which a settings toggle starting from zero can't replicate.
If you force me to pick a single layer, accumulation, because it's the harder asset to copy. But the test I'd actually apply: is the accumulated data being resolved faster than it's created? If corrections pile up faster than the loop consumes them, you don't have a moat - you have a growing backlog wearing a moat costume. The defensibility is in the resolution rate, not the pile.
The resolution rate is the diagnostic I was missing. Accumulation without a pump is just technical debt with a personalization story on top and you're right that the base model changing is the silent killer. A correction corpus built against one model doesn't transfer cleanly to whatever ships next quarter. The moat inverts without anyone noticing.
The Calendly externality holds precisely because it doesn't decay . a booking link in someone's calendar doesn't go stale. But correction patterns do. Which means the compounding only works if the loop is tight enough that the data stays ahead of model drift.
So the real question for any of these products: what's the half-life of their accumulated signal? If it's shorter than their release cycle, they're not compounding . they're treading water with extra steps.
Half-life vs release cycle is the right diagnostic, and I'd push it one step into a design lever: half-life isn't a fixed property of the signal, it's set by what layer you choose to accumulate at.
Persist model-specific artifacts - "this clone's outputs, corrected against this model's quirks" - and the half-life is one release cycle by construction. The corpus is entangled with the substrate that churns.
Persist a layer up - what the user actually wanted, the intent behind the correction rather than the correction itself - and it survives the swap, because intent doesn't care which model rendered it. "Make me sound less hedged" outlives any particular voice engine. The new model just re-renders against the same accumulated intent.
So the defensive move isn't tightening the loop until it outruns drift - that's a race you eventually lose as release cycles compress. It's accumulating above the line where churn happens. Disposable generation, durable intent. The products treading water are the ones storing outputs; the ones compounding are storing intent and treating every model release as a free upgrade to the renderer.
Which loops back to your Calendly point: the booking link doesn't decay because it's stored at the layer of "this person wants to meet me," not "this is what the UI looked like in 2019." Same principle. Accumulate the thing that doesn't move.
Accumulate the thing that doesn't move . That's the whole principle in 5 words.
The output/intent split also reframes what "personalization" actually means. Most products personalize at the wrong layer . They remember what you produced not what you were trying to do. That's why they feel personalized until a model update and then feel broken. The drift isn't in the user. It's in the stored artifact being entangled with a substrate that churned...
The Calendly closer lands because it confirms the principle holds outside AI entirely. "This person wants to meet me" is intent. It doesn't care what the calendar UI looks like. The products that last store the invariant, not the render.
Which makes the design question simple to state and hard to execute: can you identify the layer in your product where user intent lives, separate from how it gets rendered? Most teams never ask it.
Counterintuitive take: if Google boosts a feature, that may sharpen the field. A giant handling the boring bits frees true innovators to chase tougher problems and carve niches the platform won’t reach.
The platform-as-floor argument holds when the platform has no interest in what's above it. AWS commoditized infrastructure and stayed there . it had no reason to build your app. Google ships voice and owns the content distribution layer, the search index, the podcast platform, the assistant. The floor rises, but so does the ceiling. The innovators chasing tougher problems are doing it inside a building Google also owns.
The sharpening happens but the field it sharpens is smaller than it looks.
NotebookLM as an 'attack layer' is a really good mental model. Using AI to expose your own gaps before publishing is a 10x productivity move for solo writers. Curious how the 'my voice' framing will land with Google — this kind of personal-vocabulary tooling is exactly where category-defining startups tend to come from.
The attack layer framing only works because the distance is real . you're hearing your argument through something that has no stake in whether it's right. That's harder to replicate with a human editor who knows what you meant to say.
On the "my voice" as startup wedge: the history here is tricky. Personal vocabulary tooling tends to either get absorbed by the platform or stay niche forever. The ones that escape both outcomes usually found a workflow Google had no reason to own. Voice alone isn't that — it needs something around it that accumulates.
Great piece. The NotebookLM example was really interesting. I hadn't thought about using it that way before publishing, but it makes a lot of sense.
Is it used for learning coding books?
Some comments may only be visible to logged-in visitors. Sign in to view all comments.