Giorgi Kobaidze

Posted on May 24

NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

The Exact Moment It Clicked

Two weeks ago, when Jess posted about the Gemma 4 challenge, I got stuck in a decision-making loop. I didn't know which idea to build, and I had a few competing options.

Usually, when I think about a new project idea, I don't tell anyone until it is completely done. That is just how I like working. I speak with results, not with plans.

Because of that, I did not really have anyone to brainstorm with. I found myself wishing I had a room full of people I could talk through the decision with, to help me figure out which idea to actually commit to.

Then it suddenly reminded me of Edward de Bono's Six Thinking Hats, which I had read about five years ago. And I thought, damn, I wish I had a local AI system where I could actually run that kind of structured discussion.

Then I stopped...

Whoa, wait a second... Why am I wishing for this? Why don't I just build it RIGHT NOW?

And not just build it, but make it fully local on my own PC. No APIs, no cloud. Just something I can run instantly and talk to like a thinking room inside my machine!

That felt like the idea!

What if I could conjure six of those personas on demand, locally, for free, and let them argue about anything I wanted? And even participate in the discussion when needed?

So I built NeuralHats - a local web app where six AI personas, each running on its own tuned instance of Gemma 4, sit around a virtual debate table and argue about any topic you give them. They follow the canonical order. They actually disagree. The Blue Hat, the chairperson, decides when the debate is over. And when the dust settles, a seventh model, the Facilitator, writes a final report you can save as a PDF.

What it Actually Does

🎩 Six tuned personas debate any topic you choose
🔄 Up to 5 rounds, with the Blue Hat deciding when to wrap up via a CONTINUE / STOP token
🧑‍💼 You can join in, claim one of the hats and contribute your own perspective live
📡 Server-Sent Events stream each hat's turn the moment it's ready
📄 PDF report synthesised by a dedicated Facilitator model at the end
💯 100% local: no API keys, no cloud calls, no telemetry, no internet required after setup

Demo

Check out the video walkthrough:

Code

The Essentials

georgekobaidze / neuralhats

Six AI personas debate any topic using Edward de Bono's Six Thinking Hats framework. Powered by Gemma 4 via Ollama. Runs fully local.

NeuralHats

Six AI personas. One structured debate. Every perspective covered

🎥 Demo Video · 📖 Article · 🐛 Report a Bug

About

NeuralHats brings Edward de Bono's legendary Six Thinking Hats framework to life through AI. Instead of reading about the method, you experience it. Six distinct AI personas debate any topic you choose, each embodying a different mode of thinking.

Each hat is a fully independent AI model persona powered by Gemma 4 via Ollama, with its own system prompt, voice, and reasoning style:

Hat	Role	Focus
⚪ White	The Analyst	Pure facts, data, and objective information
⚫ Black	The Critic	Risks, flaws, and devil's advocacy
🟢 Green	The Creative	Bold ideas, lateral thinking, alternatives
🔴 Red	The Feeler	Emotions, gut instinct, raw reaction
🟡 Yellow

…

View on GitHub

To run it yourself:

git clone https://github.com/georgekobaidze/neuralhats.git
cd neuralhats
./setup.sh   # or .\setup.ps1 on Windows
./start.sh   # or .\start.ps1 on Windows

That's it. The setup script pulls Gemma 4, creates the seven custom models, installs the Python and Node dependencies, and start boots the FastAPI backend and Vite frontend together. You'll be debating at http://localhost:5173 within minutes.

Architecture in one breath

React + Vite + Tailwind v4  ──HTTP/SSE──►  FastAPI (Python)  ──HTTP──►  Ollama  ──►  Gemma 4
                                                  │
                                                  └──► SQLite (aiosqlite, ON DELETE CASCADE)

Three layers, zero external services. The frontend is a single-page React app with a virtual debate table. The backend is a small FastAPI server with one main orchestrator and an SSE stream. The AI layer is seven custom Ollama models - six hats plus a Facilitator, all built from the same Gemma 4 base.

Let me walk you through the parts I'm most proud of.

One Base Model, Seven Personalities

Running seven separate copies of Gemma 4 would turn my GPU into lava. Instead, I used Ollama's Modelfile system to create seven lightweight aliases over the same base weights - each with its own temperature, top-p, and system prompt:

# backend/modelfiles/Modelfile.template
FROM {{BASE_MODEL}}

PARAMETER temperature {{TEMPERATURE}}
PARAMETER top_p {{TOP_P}}
PARAMETER num_ctx 8192

The setup script bakes in personality through parameters:

# setup.ps1
$HatParams = @{
    white       = @{ temp = "0.3";  top_p = "0.9"  }   # cold facts
    black       = @{ temp = "0.4";  top_p = "0.9"  }   # cautious critic
    green       = @{ temp = "0.9";  top_p = "0.95" }   # creative chaos
    red         = @{ temp = "0.85"; top_p = "0.95" }   # raw emotion
    yellow      = @{ temp = "0.6";  top_p = "0.9"  }   # warm optimist
    blue        = @{ temp = "0.3";  top_p = "0.9"  }   # disciplined chair
    facilitator = @{ temp = "0.2";  top_p = "0.9"  }   # near-deterministic synthesis
}

Red Hat runs hot (0.85) - its job is intuition, gut feelings, vibes. White Hat runs cold (0.3) - its job is facts and only facts. Switching from one to another costs nothing because they all share weights in memory. Personality is just parameters and prompts.

The Blue Hat is a Controller, Not Just a Debater

The Blue Hat is the chairperson. Its prompt forces it to end every response with exactly one of two tokens on its own line:

End your response with exactly one of these two tokens on its own line:
    CONTINUE — if meaningful new ground can still be explored
    STOP — if consensus has been reached or no new insights are likely

The orchestrator parses that token to decide whether to start another round or end the debate. The LLM's output literally becomes control flow.

# backend/orchestrator.py
def _parse_blue_decision(blue_response: str) -> bool:
    """Return True if debate should CONTINUE, False if it should STOP.
    Scans lines in reverse to handle trailing text. Defaults to CONTINUE."""
    for line in reversed(blue_response.strip().splitlines()):
        token = line.strip().upper()
        if token == "CONTINUE":
            return True
        if token == "STOP":
            return False
    return True

That tiny function is the heartbeat of the whole loop. Reverse-scanning so trailing whitespace or quote marks don't break parsing. Safe default to CONTINUE because terminating early is worse than running one too many rounds.

The debate loop

Here's the actual orchestrator stripped down. Six hats, in order, up to five rounds, controlled by the Blue Hat's verdict:

HAT_ORDER = [HatColor.WHITE, HatColor.BLACK, HatColor.GREEN,
             HatColor.RED, HatColor.YELLOW, HatColor.BLUE]
MAX_ROUNDS = 5

for round_num in range(1, MAX_ROUNDS + 1):
    await _push({"type": "round_start", "round": round_num})

    for hat in HAT_ORDER:
        if hat == user_hat:
            content = await _await_user_turn(hat)   # human steps in
        else:
            await _push({"type": "hat_thinking", "hat": hat})
            messages = _build_messages(topic, conversation_history,
                                       hat=hat, round_num=round_num)
            content = await ollama_client.chat(messages, hat=hat, mode=mode)

        conversation_history.append({"hat": hat, "content": content,
                                     "round": round_num, "is_user": is_user})
        await _push({"type": "message", "hat": hat, "content": content, ...})

        if hat == HatColor.BLUE:
            blue_response = content

    if not _parse_blue_decision(blue_response) or round_num == MAX_ROUNDS:
        await _push({"type": "debate_end", "status": "completed"})
        return

That's almost the entire thing. No agent framework, no LangChain, no LangGraph. Just a loop, a queue, and a parsed token. The simplicity is the point.

Real-time streaming with SSE

Waiting 30 seconds for an entire debate to finish before showing anything would be unbearable. So I push each completed hat turn over Server-Sent Events the moment it's ready:

async def event_stream():
    while True:
        event = await _event_queue.get()
        yield event
        if event.get("type") in ("debate_end", "error"):
            # Hold the connection open briefly so the browser receives the
            # final event before the server closes.
            await asyncio.sleep(2)
            break

The frontend's EventSource reacts in real time, a new chat bubble appears as soon as each hat finishes thinking. Watching it unfold feels like watching a real panel discussion.

🎯 Structured conversation history beats flat transcripts

Earlier on I noticed the hats were ignoring each other. The Yellow Hat would give a generic positive answer that didn't actually respond to the Black Hat's specific risk. That was a context problem, they were getting a flat blob of text and skimming it.

So I restructured the history: separated previous rounds from current round so far, surfaced the most recent Blue Hat direction prominently, and gave each hat per-hat reminders to prevent drift:

_HAT_REMINDERS = {
    HatColor.WHITE: (
        "REMINDER: Review the conversation history above. Do not repeat any fact, "
        "statistic, or metric you have already stated in a previous round. "
        "Every sentence must be new information."
    ),
    HatColor.YELLOW: (
        "REMINDER: White Hat's data points and Black Hat's identified risks are "
        "valuable findings — not just Green Hat's ideas. If you endorsed Green Hat "
        "last round, you MUST endorse a different hat this round."
    ),
    HatColor.RED: (
        "REMINDER: Pick ONE emotional state for this response and stay in it the "
        "whole way through. Do NOT swing between opposite feelings in a single turn."
    ),
    # ... and three more
}

After this change, the debates suddenly felt coherent. Hats started naming each other ("As Black Hat just pointed out..."). The Yellow Hat actually engaged with risks instead of pretending they didn't exist. Same model, same temperatures, just a smarter conversation envelope.

A separate Facilitator

The seventh model, neuralhats-facilitator, runs at temperature 0.2, almost deterministic. It's not in HAT_ORDER. It never debates. Its only two jobs:

Title generation: when the user types a topic, the Facilitator drafts a short title for the debate
Final report synthesis: after the Blue Hat votes STOP, the Facilitator reads the entire transcript and writes a neutral, structured summary the user can export as PDF

Splitting it off from the hats keeps the synthesis voice neutral and the temperature low enough to actually be useful as a summary. Mixing those jobs into one of the colored hats would compromise both.

Cascade Deletes

The schema looks like this:

CREATE TABLE rounds (
    id          TEXT PRIMARY KEY,
    debate_id   TEXT NOT NULL,
    round_number INTEGER NOT NULL,
    created_at  TEXT NOT NULL,
    FOREIGN KEY (debate_id) REFERENCES debates(id) ON DELETE CASCADE
);

CREATE TABLE messages (
    id          TEXT PRIMARY KEY,
    round_id    TEXT NOT NULL,
    hat         TEXT NOT NULL,
    content     TEXT NOT NULL,
    is_user_message INTEGER NOT NULL,
    timestamp   TEXT NOT NULL,
    FOREIGN KEY (round_id) REFERENCES rounds(id) ON DELETE CASCADE
);

ON DELETE CASCADE from messages → rounds → debate means deleting a debate is a single atomic operation. Hundreds of related rows disappear with one DELETE FROM debates WHERE id = ?. No application-level cleanup, no orphaned data, no foot-guns.

How I Used Gemma 4

I went with Gemma 4 E4B as my default base model.

Here's why:

The constraint: it has to be local, and it has to be fast

NeuralHats fires 6–7 model invocations per debate round (one per hat, plus the facilitator for final synthesis). With 5 rounds max, that's up to 31 inference calls in a single debate. If each call takes 30 seconds, that's a 15-minute debate which is pretty unusable.

I needed a model that was:

Small enough to run smoothly on consumer hardware (laptops, mid-range desktops)
Fast enough that a hat's response feels like watching someone think, not waiting for a printer
Capable enough to actually hold a position and engage with arguments, not just produce plausible-sounding mush

Why E4B specifically

The 26B model would have been the safe "capability" choice, clearly better at reasoning. But it still turned out to be too much for the turn-based UX I needed. Each round would take minutes, killing the live-panel feeling.

The E2B (2B) model is lightning fast but it didn't hold its hat persona well enough, under pressure it would drift, lose the role, or repeat itself.

E4B hit the sweet spot. It runs comfortably on a 16 GB VRAM machine, generates a hat response in 3–8 seconds depending on hardware, and is capable enough that with the right system prompt and per-hat parameters it genuinely stays in character. Watching the Red Hat shift emotional tone between rounds, or the Black Hat surface genuinely novel risks each time, that's all E4B.

What Gemma 4 unlocked that nothing else could

Three things, specifically:

1. Native multi-instance personality. Because Ollama lets me create lightweight aliases over the same base weights, I get seven distinct AI personas without seven copies of the weights in RAM. Try that with a hosted API and you're paying for seven independent context windows. With Gemma 4 local, it's free.

2. The Blue Hat's CONTINUE / STOP discipline. Small models often fail at strict format constraints, they want to ramble. Gemma 4 E4B reliably ends every Blue Hat turn with exactly one of those tokens on its own line. Without that reliability, the whole control-flow trick falls apart.

3. The freedom to ship "100% local" as a feature, not a constraint. No API costs, no rate limits, no internet dependency, no privacy concerns about feeding personal dilemmas to a third party. For an app whose entire premise is "let six minds help you think through something you wouldn't want to discuss with anyone else" - that's not a nice-to-have. That's the product.

Summary

NeuralHats started because I was stuck inside my own head and needed another perspective. It turned into a project about how, with the right architecture, a single E4B model can play six different roles convincingly enough to actually help you think.

The Gemma 4 family made that possible, small enough to run on my own machine, smart enough to genuinely disagree with itself, and disciplined enough that a 200-word Blue Hat summary ends with the exact token my orchestrator needs to make a decision.

If you've ever been stuck inside your own head, clone it, run it, give it your problem, and let the hats argue. Worst case, you have a good laugh. Best case, you get unstuck.

Top comments (19)

Andy Stewart • May 28

Splendid execution! Carving out seven low-latency personas from a single base model via Modelfile is exactly how local-first should be done. Controlling the application flow directly through the Blue Hat's output tokens is pure engineering elegance.

No bloated agent frameworks, just minimal loops, queues, and local compounding—this is what a hardcore, AI-native application looks like.

Giorgi Kobaidze • May 28

Thank you so much for such a great and detailed feedback. I put so much effort into this, I’m still recovering😄But 100% worth it for feedback like this!

Sylwia Laskowska • May 24

Ah, decision-making loop! Sounds familiar, maybe I should try it 😁

Giorgi Kobaidze • May 24

You definitely should. 😄

Natia Bekauri • May 25

Now that's the kind of use of AI I respect and support, such cool idea and interesting implementation. Thanks for explaining too, In the beginning of the article I had some questions you all answered perfectly later. Great job brother, really. I'll need some advices later from you, on how to use AI in smart way to stay connected to modern reality but still not lose critical thinking of a good software dev

Giorgi Kobaidze • May 25

Thank you! And feel free to reach out anytime, I’m happy to share all the experience I have!🙏

Ashiha Mahesh Kumar • May 25 • Edited

Hey Giorgi, been following your work since the Notion MCP Challenge - NoteRunway was incredible, and this NeuralHats project takes the structured-debate concept to a whole new level.
What caught my eye is the overlap with something I built recently. For the ETHGlobal Open Agents hackathon, I built Deliberate — a Telegram-style crypto group chat where AI agents debate market decisions using structured roles. Different domain, but the same core idea: multiple AI personas with distinct thinking styles arguing through a problem to reach better decisions.
Seeing how you handled the Blue Hat as a controller with the CONTINUE/STOP token, the per-hat temperature tuning, and the structured conversation history to make hats actually engage with each other — that's exactly the kind of architecture problems I ran into with Deliberate.
Would love to connect and potentially collaborate on something in the future. Your work genuinely inspires me and I'd learn a lot working alongside you. Feel free to reach out anytime.

Giorgi Kobaidze • May 25

Hey, appreciate that! Tuning those models was definitely the hardes part of the app.

xulingfeng • May 28

The llm angle here is really well thought out. We've been running something similar with Hermes and found the biggest challenge is actually knowing when NOT to use an agent — sometimes a simple script does the job better.

Great stuff — followed you! 🤝

Giorgi Kobaidze • May 28

Thanks a lot! And yes, that’s absolutely the hardest part of the whole thing.

Suny Choudhary • May 27

This is a clever use of local LLMs because it does not just ask the model for “better reasoning.” It gives the reasoning process a structure.

The Six Thinking Hats approach is useful here because it forces separation between facts, risks, creativity, benefits, emotions, and process control. That can reduce the usual problem where an LLM blends everything into one polished but shallow answer.

I also like that this works well with local models. For personal brainstorming, decision review, product ideas, or code architecture discussions, keeping the thinking loop local can be useful from both privacy and experimentation angles.

The real test would be consistency. Does each “hat” actually stay in its role, or does the model slowly collapse back into generic advice after a few turns?

That would be interesting to evaluate.

Giorgi Kobaidze • May 27

Thank you! And that’s a great question. Teaching those AI models how to think and interact was the hardest part of the application, and for perspective, pretty much every major part of this app was quite challenging.

I spent about 2 days trying to instruct each hat how to communicate and I had to be very specific with the instructions, because otherwise they’d start deviating from both, the topic and the role, big time.

Okeke Chukwudubem • May 24

Exactly my kind of idea but I ran mine on my phone

Giorgi Kobaidze • May 25

Interesting. I’ll check out yours later. I still haven’t seen other submissions yet.

Harjot Singh • Jun 1

i can totally relate to the struggle of decision-making loops. having a structured approach like de Bono's Six Thinking Hats is a great way to break free from that. if you're looking to build something quickly, check out Moonshift. you can get a full next.js + postgres + auth app deployed in about 7 minutes, and you own the code on your github. hit me up if you want to give it a try for free.

EXDEV-ops • May 28

Any dev here with backend experience

xulingfeng • May 28

Glad it resonated! The coordination overhead between hats was the surprise — curious if you found certain transitions harder than others when running locally?

Giorgi Kobaidze • May 28

One of the hardest part was also making those hats actually interact with each other rather than just throwing out their opinions, you need to be REALLY specific with your prompts. Especially with the blue hat’s prompt, cause that one steers the whole discussion.

View full discussion (19 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community

NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4

What I Built

The Exact Moment It Clicked

What it Actually Does

Demo

Code

The Essentials

georgekobaidze / neuralhats

Six AI personas debate any topic using Edward de Bono's Six Thinking Hats framework. Powered by Gemma 4 via Ollama. Runs fully local.

NeuralHats

Table of Contents

About

Architecture in one breath

One Base Model, Seven Personalities

The Blue Hat is a Controller, Not Just a Debater

The debate loop

Real-time streaming with SSE

🎯 Structured conversation history beats flat transcripts

A separate Facilitator

Cascade Deletes

How I Used Gemma 4

The constraint: it has to be local, and it has to be fast

Why E4B specifically

What Gemma 4 unlocked that nothing else could

Summary

Top comments (19)