So, I checked out my project, bit — a version-control tool for binary files, that I talked about in my previous article, and the code looked pretty decent. However, it really looked like it was imperative code written in Haskell. So true, the types and the ADTs were great (though not fully taken advantage of in a lot of the project), and I believe that the mere fact that a function is pure and has no side-effects, lets AI reason about the function much more easily. But it looked like the AI wasn't fully taking advantage of all of Haskell's features.
Letting AI do the research
So.. I asked Claude to research and give me two documents, a Guide for Writing Idiomatic Haskell a Guide for Type Safety in Haskell . I then used Cursor IDE to run Opus 4.6 and use those guides to refactor the code. At first, it just changed a lot of returns to pures (and not all of them). I told the agent it looks a little weird that it only changed that, so it admitted and made a deeper pass over the code base, that had a lot of changes. I then asked it again, and it did another one, again with a lot of changes. I did that for 12 (!) rounds, and each time it found something new.. I'm talking about Opus 4.6 here.. One of the times I tried Sonnet 4.5, but it did some weird refactoring.. These refactors require more subtlety and reasoning I guess.
This isn't surprising — recent research found that LLMs struggle with Haskell specifically because functional languages make up a tiny fraction of training data (Haskell is just 0.29% of The Stack]), a major code training dataset). The models know the syntax, but they default to imperative patterns unless pushed.
What the AI actually changed
After twelve rounds of refactoring guided by the two reference documents, the diff touched 30 Haskell files across roughly 1,500 lines. Many changes were mechanical — pure over return, void over _ <- — but three categories stood out as the AI applying genuine Haskell reasoning, not just surface-level substitution.
Killing boolean blindness with sum types
The original code tracked push behavior with two booleans on the environment record:
data BitEnv = BitEnv
{ ...
, envForce :: Bool
, envForceWithLease :: Bool
, ...
}
Since force with lease is already force, the combination (True, True) was meaningless. And indeed, the command parser had a runtime guard to reject it:
when (isForce && isForceWithLease) $ do
hPutStrLn stderr "fatal: Cannot use both --force and --force-with-lease"
exitWith (ExitFailure 1)
So, on one of the passes, the AI replaced both booleans with a single sum type:
data ForceMode = NoForce | Force | ForceWithLease
It was one field less, no runtime guard, and every consumer switched from a nested if-else-if to case fMode of — which looks nicer, and is checked by the compiler exhaustiveness. And on the safety angle, the illegal state (True, True) is no longer representable.
The AI applied this same transformation twice more, and in each case, the pattern was the same: an undernamed boolean became a properly named type, that anyone (including AI) can read immediately and understand.
This is what Robert Harper calls boolean blindness — a Bool carries no information beyond its value, so the moment you branch on it, you've lost the meaning of what was tested. And in software safety terms, no test can prove the absence of a (True, True) code path as reliably as a type that simply can't express it. (An ICSE 2017 study found that static type systems catch roughly 15% of public bugs in JavaScript projects — bugs that tests missed...)
Replacing verbose case expressions with combinators
A nice replacement is this one:
bs <- BS.readFile path
let content = case decodeUtf8' bs of
Left _ -> ""
Right txt -> T.unpack txt
The AI recognized every instance as the either eliminator and collapsed them:
bs <- BS.readFile path
let content = either (const "") T.unpack (decodeUtf8' bs)
A lot of boilerplate is gone here, less clutter for the mind (and for AI) to deal with.
Reducing ambiguity per token with Functor
This classifyRemoteState example though, has a deeper advantage:
-- Before
classifyRemoteState remote = do
result <- Transport.listRemoteItems remote 1
case result of
Left err -> pure (StateNetworkError err)
Right items -> pure (interpretRemoteItems items)
-- After
classifyRemoteState remote =
either StateNetworkError interpretRemoteItems
<$> Transport.listRemoteItems remote 1
The <$> means that we're chaining the result from Transport.listRemoteItems remote 1 to either one of the options, an error or the remote items, but it also means that no side effects happen when chaining.
The former do block on the other hand, uses monadic bind (>>=), which tells the compiler and the reader "the next step might depend on the result of the previous one." But the <$> version uses Functor, which says something stronger: "this is a pure transformation over an effectful value — unlike the do notation, the function inside the <$> can't print, can't read files, can't launch side effects based on whether it got a Left or Right.
The model processes both versions either way — but the difference lies in pattern recognition load. When the model sees <$>, it can classify the entire expression in one step: "pure function applied over an effect, move on." When it sees the do version, it has to read each line to reach the same conclusion: "bind, then case, then pure in both branches — ok, so this is just a pure transformation."
This change actually reduces ambiguity per token. Each expression carries more information about what it can't do, which means the model's context window is doing more useful work. It's simple information theory: higher signal per token means less work to resolve the meaning of the surrounding context.
There's growing evidence for this. A NeurIPS 2024 paper showed that not all tokens contribute equally to learning — roughly half are "easy tokens" that carry little information, while training selectively on high-information tokens improved math reasoning by up to 30%. And research on tokenization theory has shown that how information is packed into tokens directly affects whether transformers can learn underlying structure. The implication for code is that expressions which encode more meaning per token — like <$> signaling purity — give the model richer signal to work with.
The win is local reasoning. The <$> version communicates its intent in its type structure rather than requiring you to read the implementation to confirm "yes, result is only used once, in a pure context, and pure is the only effect after the bind."
Introducing an ADT and finding a bug
Sometimes the refactor itself lets AI unravel logical bugs. compareHistory in RemoteManagement.hs compared local and remote histories, to check whether a push would fast-forward or not. It checked both directions, and had a pattern match on the resulting booleans:
localAhead <- Git.checkIsAhead rHash lHash
remoteAhead <- Git.checkIsAhead lHash rHash
case (localAhead, remoteAhead) of
(True, False) -> putStrLn " main pushes to main (fast-forwardable)"
(False, True) -> putStrLn " main pushes to main (local out of date)"
(False, False) -> putStrLn " main pushes to main (local out of date)"
(True, True) -> putStrLn " main pushes to main (up to date)"
When the AI first wrote this code, it was just filling in strings. Notice though, that for the (False, False) case — neither side ahead of the other, which means the repos diverged. The AI conveniently printed "local out of date", which sounds plausible enough if you're not thinking too hard. And clearly it wasn't. What caused this? The string has no structure, no type checker reading it, no compiler verifying it means what it says. It's just characters going to a terminal.
Then I asked the AI to refactor using an ADT. It introduced PushRefStatus with three constructors — PushRefUpToDate, PushRefFastForwardable, PushRefLocalOutOfDate — and a bridge function to convert the boolean pair. But when it got to (False, False) and had to map it to a constructor, something shifted. It couldn't just type a vague phrase and move on. It had to pick a name — a name that would appear in type signatures, in pattern matches, in code review. And PushRefLocalOutOfDate was the wrong name.
If the hashes aren't equal and neither side is ahead of the other, the histories have diverged — both sides have commits the other lacks. The AI flagged this itself during the refactor: the act of naming the state precisely made the incorrectness visible.
The fix was to add a fourth constructor:
data PushRefStatus
= PushRefUpToDate
| PushRefFastForwardable
| PushRefLocalOutOfDate
| PushRefDiverged
This is the principle Yaron Minsky articulated as "make illegal states unrepresentable" — but here it worked in a subtler way. The illegal state wasn't a type error; it was a semantic error that became visible when the type demanded precision. Alexis King's "Parse, don't validate" makes the same argument from a different angle: a parser (or an ADT) forces you to commit to what your data means, where a validator (or a string) lets you be vague.
This is something worth internalizing about AI-assisted development. When the output is a string, the AI can be vague and get away with it — "local out of date" is close enough, and no tool will object. But when the output is a type, vagueness has a cost. A constructor name is a commitment: it appears everywhere the value is handled, and it has to be accurate at every site. The ADT didn't just replace the booleans — it raised the precision bar high enough that the AI couldn't miss a case it had previously gotten wrong.
Conclusion
AI is used to writing imperative code, but it in fact knows how to write Haskell code, it just needs to be pushed in that direction. This gives FP beginners an easier start when trying to enter this world. It does the heavy-lifting (pun intended) for us, and the result is a more expressive, more reliable and robust code, that's easier for AI, or experienced FP programmers, to reason about.
Overall I'm having a lot of fun writing Haskell with AI. Claude Opus 4.6 doesn't seem to "struggle" with Haskell, it's smart enough. I'm learning a lot of cool concepts as I go along, an can apply them with a lot of the tedious work being done by AI.
Top comments (0)