What AI Reveals About How We Think
Here’s how I tend to work. I sit down with a collection of half-formed thoughts: scattered notes, a story that feels connected to an argument I can’t quite name, and some research articles that seem to be in the same ballpark. Any of these pieces might be tangential or might be central, but I can’t know which until I’ve figured out what “central” even means. Typically I sit down with all of this and try to write out the connection. The first attempt wanders off, so I try again. Circling and circling and if there’s something there, then eventually I surface it by rephrasing and trying new approaches or talking it through with a friend.
Lately, though, I’ve been experimenting with LLMs on this front. I feed all of this scattered material to an AI along with a prompt that’s my first attempt at describing what I think it is that holds it all together. I hit enter and then, more often than not, it tells me what I was thinking.
Not perfectly. Not always. But often enough to be unsettling. The AI draws out the connections I hadn’t articulated, names the pattern that was latent behind my scattered fragments, notes why that article felt connected but actually wasn’t, and then hands me back a description of my own thinking that is clearer than anything I had managed up to that point on my own. It doesn’t feel like search. It doesn’t feel like calculation. It feels like the thing I do when I iterate through drafts and feedback arriving at that moment when a vague sense of similarity sharpens into an actual idea.
What does it mean that a machine can do that?
The standard answers aren’t satisfying. “It’s just predicting the next token” is technically true but empty as an explanation—like saying a novelist is just putting words in order. “It’s not really thinking” may or may not be true, but feels an awful lot like trying to push down the uncomfortable reality that it looks an awful lot like thinking. I’m less interested in whether the machine is really thinking or not, for all intents and purposes I think we might as well assume that it is. What I want to think through is what the machine’s success reveals about our own ways of thinking.
For most of the twentieth century and still today, western science operated under a remarkably consistent theory of how minds work. In philosophy, in cognitive science, in artificial intelligence, and across the behavioral sciences, the dominant model treated thinking as rule-based. The mind was a logic machine. It took in information, applied rules to it, and produced conclusions. Intelligence meant getting the rules right.
This wasn’t just an abstract commitment. It was built into the tools researchers used to study the mind, and those tools quietly constrained what we were able to see.
Consider research psychology. For nearly a century, the standard method for studying human thought has been the general linear model — the statistical framework behind nearly every regression, ANOVA, and mediation analysis that has ever been published in a psychology journal. The GLM works by isolating variables and measuring their linear relationships. It asks: how much does X predict Y, holding everything else constant?
This is a powerful tool. It’s also, if you look at it carefully, a theory of the mind disguised as a method. Applying the GLM to the mind assumes that cognition is decomposable - that you can pull thinking apart into discrete variables that relate to each other in stable, linear ways.1 It assumes that the right way to understand a thought is to identify which input caused which output. It assumes, in other words, that the mind works more or less like the equations we use to study it: cleanly, one variable at a time, according to rules.
And for decades, nobody had much reason to question that assumption, because the method worked. It worked well enough to produce thousands of findings about priming and cognitive bias, about heuristics and decision-making, about attitude formation and social influence. And through this patchwork an image of the mind emerges as a rule-following, proposition-handling, bias-prone information processor. The model felt solid.
The same assumption drove the first decades of artificial intelligence.2 The original dream of AI, from the 1950s through the 1980s, was to build thinking machines by encoding rules. If intelligence was rule-following, then a sufficiently detailed set of rules should produce intelligence. Natural language processing meant writing algorithms that decomposed sentences into parts of speech and applied grammatical rules. Expert systems meant interviewing specialists and encoding their decision procedures. The project was explicit: capture the rules, and you capture the mind.
The only problem is that it didn’t work. Rules-based AI could do narrow, well-defined tasks—chess moves, medical diagnoses in constrained domains—but it couldn’t do the things that felt easy to actual humans.3 It couldn’t hold a conversation. It couldn’t read a room. It couldn’t take a pile of vague, half-connected ideas and tell you what you were thinking. The things that feel the most natural to us turn out to be the hardest to encode as rules.
What’s striking, looking back, is that the functional project of AI was the first to realize it had hit a wall. In many ways, psychology’s implicit rule-based model of cognition is still going strong, in part because you can find lots of linear associations in the mind (I’ve done this plenty). And this feels like progress, as if we might be able to piece enough of these together into some sort of general understanding of the mind. This issue, though, is the same one that rule-based AI ran into. And it’s not a lack of data or computing power. It’s an ontological mismatch. Researchers, myself included, were trying to build something rule-shaped to model something that isn’t.
While this rule-based perspective was going strong, there was an alternative circling around these worlds. Connectionism—the idea that cognition isn’t rule-following but pattern-completion across massively distributed networks—had been proposed as early as the 1940s and began to be developed more seriously in the 1980s.4 Instead of encoding rules, connectionist models learn statistical regularities from exposure. They don’t apply grammar; they absorb it. They don’t follow procedures; they recognize patterns.
If you’ve heard the term neural network, then you’ve heard of connectionism. The basic architecture is modeled, loosely, on what brains actually do: layers of simple units that strengthen or weaken their connections based on feedback. No unit “knows” anything. There are no rules written down anywhere in the system. The knowledge, such as it is, lives in the pattern of connections: distributed, implicit, and resistant to clean decomposition.
This was not a popular idea among the people whose job it was to understand the mind. In 1988, the philosophers Jerry Fodor and Zenon Pylyshyn published what was taken to be a devastatingly simple death blow to connectionism.5 The main thrust of their argument was that neural networks couldn’t account for the systematicity of thought: the fact that if you can think “the dog chased the cat,” you can also think “the cat chased the dog.” The same elements, rearranged, produce a different meaning. Real thinking, they argued, requires that kind of compositional structure. Rules. Symbols that can be recombined. Pattern-matching can’t handle this most basic process.
Their critique landed, and it held the high ground for decades. Within philosophy of mind and much of cognitive science, the case was considered closed. Thinking was propositional. Cognition was computation. Connectionism was a curiosity—interesting in the way that dead ends are interesting.
Then the dead end started talking.
What changed wasn’t a philosophical breakthrough. It was engineering at scale. When connectionist architectures were given enough data—truly massive amounts of text, more than any human could read in a thousand lifetimes—they started doing things that the symbolic tradition said they shouldn’t be able to do. They wrote coherent prose. They reasoned by analogy. They translated between languages they were never explicitly taught to translate. They took a pile of scattered ideas and found the latent pattern.
Large language models are the purest expression of connectionism’s bet: that you don’t need rules to produce intelligent behavior. You need exposure, scale, and pattern completion. And the bet paid off in ways that even the connectionists didn’t fully anticipate.
There’s a parallel here that’s easy to miss. Connectionist AI only became powerful when it had access to an enormous corpus of human-generated text — billions of documents representing human thought across centuries. Our own neural networks had an even longer training run. The structure of our thought emerged through millions of years of evolution, shaping biological neural architectures through small corrections on small pieces, generation after generation. This was very different training set—embodied, environmental, survival-driven—but the same underlying principle holds: pattern recognition refined across an incomprehensible scale of exposure. Small adjustments, accumulated over deep time, produced something that looks like understanding.
Comparing our minds to AI is a fraught activity, prone to all sorts of projected hopes and knee-jerk skepticism. But there’s a methodological reason the comparison also feels somewhat inevitable. The only access we have to any other mind is indirect. We don’t observe others’ thoughts; we observe what they do and infer about their thoughts. Movements, speech, decisions, reactions, written traces, numbers circled on scales from 1 to 7. From those outward signs, we reconstruct an inner process. In the mid-20th century behaviorism made this constraint explicit by refusing to posit anything beyond what could be observed. The information processing paradigm that overthrew behaviorism restored some complexity to the unseen mind, claiming it worked on representations, rules, and computations.6 But the method itself didn’t change. We still have no choice but to infer structure from output.
LLMs put pressure on this inference. When a machine produces something that is indistinguishable from what a human would produce, when it tells you what you were thinking, the gap between behavior and mechanism becomes harder to ignore. The output fits the pattern we associate with thinking and understanding, even though the architecture that produced it is radically different from the one we’ve assumed for ourselves. That doesn’t tell us that the machine thinks like we do. But it does unsettle our confidence that we know what “thinking like we do” actually means.
The point is not that LLMs think the way we do. They don’t.7 The point is that their success tells us something about the shape of cognition that the rules-based tradition was unable to see. For decades, the assumption was that a pattern-completion architecture could never match one built on rules and symbols, that it would always be a crude approximation of the real thing. But when it comes to some of the most distinctive tasks of human minds—writing, reasoning by analogy, synthesizing scattered ideas into something coherent—the pattern-completion architecture is the one that worked. Not the one built on the model of mind we seem to intuitively prefer. That’s not proof that human cognition is really just pattern completion. But I take it to be strong evidence that the connectionist picture of the mind—distributed, contextual, shaped by exposure rather than rules—was closer to the truth than the tradition that won the twentieth century.
One of the ways that AI is dismissed as not so intelligent, nor like us, is to catalog the basic tasks it fails on.
People talk about this as the jaggedness of AI: the way a model can write a surprisingly insightful paragraph about epistemology and then fail at basic arithmetic. This is treated as a flaw, a sign that the machine doesn’t really understand or think about the world in the way we do.
But human cognition is remarkably jagged too. We all know the caricature of the brilliant scientist who starts to sound pretty naive once they stray outside of their domain. This isn’t because they’ve become stupid. It’s because their expertise was never a general-purpose rule set. It was a pattern library, trained on years of exposure to a specific kind of problem. Move them to a new domain and the patterns don’t transfer. Intelligence has always been contextual. We just call our own intelligence general because we privilege the domains we tend to care about and don’t see the physicist trying to learn a new language.
LLMs make our own jaggedness visible. In many ways they’re a mirror, imperfect, distorted, but revealing. And what they reflect back is a picture of cognition that doesn’t match the flattering self-portrait we’ve been carrying around.
You can see the old self-portrait still at work in the way we describe LLMs hallucinating.
Think about what that word assumes. To say a machine hallucinates is to compare it to a system that should only be retrieving true propositions about the world, a logic engine that has malfunctioned, a rule-follower that has broken its own rules. The word only makes sense if you believe the machine was supposed to be looking things up and failed.
But LLMs aren’t looking anything up. They’re completing patterns. When the pattern is well-constrained by the input and the training data, the completion tends to be accurate. When the constraints are loose, the completion drifts along the grain of what would be plausible. The model fills in what fits.
This, of course, is exactly what human memory does too. We don’t retrieve memories like files from a cabinet. We reconstruct them, every time, from fragments, context, and the gravitational pull of coherence. We fill gaps with what fits the pattern of how we understand ourselves. We smooth contradictions. We confabulate constantly, and we experience the result as remembering. The psychological literature on this is vast and unsettling: eyewitness testimony warped by expectation, childhood memories that never happened, the eerie ease with which a plausible story becomes a felt truth.8
When LLMs hallucinate, they’re doing something very similar to what we do when we remember a conversation that didn’t quite happen that way, or reconstruct an argument we heard last week with our own conclusions smuggled in as though they were always there. The difference is that we freely give ourselves an exemption. When the machine fills in the pattern, we call it a hallucination. When we do it, we call it thinking.
I want to be careful here, because this argument has a reductive version that I’m not making. The reductive version says: “Humans are just pattern matchers. Reasoning is an illusion. We’re just stochastic parrots.” But clearly we do reason. We do apply rules. We can hold a proposition in our minds, examine it from multiple angles, test it against counterexamples, and revise it.
What the LLM mirror makes visible though, is that this kind of thinking is the exception, not the default. Explicit, step-by-step reasoning is a hard-won achievement, and it depends on an enormous amount of external scaffolding. We reason our best when we have other people to argue with, pen and paper to externalize our thoughts, formal systems that took centuries to develop, and institutions that train us to do it.
The scaffolding that supports our reasoning is magnificent. It gave us books, science, mathematics, law, philosophy. But it’s scaffolding, not the foundation. The foundation is pattern completion. Left to our own devices, in the wild, without such scaffolding, we are largely pattern-completers. We act on hunches and call them reasons. We recognize connections and call it analysis. We draw a line between the current moment and previous moments and call it judgment. We built the scaffolding because we needed it, because the pattern completion alone wasn’t enough for the kind of coordinated, self-correcting inquiry that complex societies require. This is not a diminishment. Seeing it clearly is the beginning of a more honest relationship with our own minds.
Maintaining this clear picture of ourselves is especially important right now because we’re handing more and more of this scaffolding over to these machines. The institutions and habits that trained us to reason—arguing, drafting, revising, the slow work of thinking with others—are being replaced by something faster and different. What happens when this scaffolding starts running on the same pattern-completion engine as our minds? None of us know, but it’s worth paying attention to.
So what was the machine doing when it told me what I was thinking?
It was completing a pattern. It took my scattered fragments—notes, stories, half-formed connections—and it did what a massive pattern-completion engine does: it found the statistical regularities, the latent structure, the thing that held the fragments together. It didn’t understand my argument. It recognized its shape.
And the reason that felt so much like the thing I do when I think well is that the thing I do when I think well is, at bottom, the same process. The vague sense that these ideas are connected, the groping toward a formulation, the moment when the shape finally clicks, that’s pattern completion too. It’s slower, embodied, shaped by a lifetime of experience rather than a training corpus, but it’s the same fundamental operation: recognizing coherence in a noisy field.
The machines didn’t learn to think like us. They were built on a different bet about what thinking is, a bet the experts rejected for decades, and the bet turned out to be closer to the truth than the model we preferred.
It’s easy to read AI news as an ongoing story about what these machines can do. And it is. But it’s also a story about what we’ve been doing all along.
Over a century ago the philosopher and psychologist William James was raising this exact concern when he coined the phrase “stream of consciousness” in the Principles of Psychology (1890).
Penn, J. (2020). Inventing Intelligence: On the History of Complex Information Processing and Artificial Intelligence in the United States in the Mid-Twentieth Century [Apollo - University of Cambridge Repository]. https://doi.org/10.17863/CAM.63087
Dreyfus, H. L. (1992). What Computers Still Can’t Do: A Critique of Artificial Reason. MIT Press.
Rumelhart, D. E., McClelland, J. L., & Group, P. R. (1986). Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations. The MIT Press. https://doi.org/10.7551/mitpress/5236.001.0001
Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1), 3–71. https://doi.org/10.1016/0010-0277(88)90031-5 (lots of non-gated pdfs of this one)
Miller, G. A. (2003). The cognitive revolution: A historical perspective. Trends in Cognitive Sciences, 7(3), 141–144. https://doi.org/10.1016/s1364-6613(03)00029-9
Though there is ongoing research that’s making headway by assuming that at least certain parts of our thinking does indeed work the same way:
Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118. https://doi.org/10.1073/pnas.2105646118
Schacter, D. L. (2022). The Seven Sins of Memory: An Update. Memory, 30(1), 37–42. https://doi.org/10.1080/09658211.2021.1873391






