The Two Meanings of Reasoning and the Future Path of AI
Reasoning the action and reasoning the path serve different but related purposes
Reasoning can be an activity — something that you do either well or poorly in a given instance. Reasoning is also an artifact and more specifically a path: “What’s your reasoning here” asked in the sense of “how did you get to this decision?”
Reasoning the noun is sometimes the trace left by reasoning the verb. We reason to a conclusion, and then say how we got to that conclusion. That’s our “reasoning” — how we got from the known to the unknown. What did we consider in our decision? How did we weigh different pieces of evidence?
Of course, psychological research shows it’s usually not as a much of a trace of the initial reasoning as we think. What usually happens is we come to a conclusion — often on instinct — but when challenged on it go back to see how we would get there through reasoning. When thinking how others would evaluate that path (or when hearing how others respond to that path) we often discover the reasoning isn’t as good as we think.
We do this dozens of times a day, in all sorts of small ways. In my workshops I often use the example of the following conversation:
Abelard: I’m headed to the store.
Heloise: Can you grab some milk?
Abelard: We have a full carton in the back of the fridge I think?
Heloise: And it’s still good?
[Heloise opens fridge, checks date]
Heloise: Yep, it’s good. See you in a bit.
It’s popular now to call this sensemaking and it is. But it’s also what people in argumentation theory have been studying since at least the 1950s as argument. A person advances a claim (you should pick up milk) with an implied reasons (we’re out, and you’re going to the store anyway). The reason is interrogated (are we truly out?) and found to not be sound. The plan is adjusted.
In presentations I go on a bunch about how this thing we do as humans is really one of the most amazing things we do. I suppose I’m going on about it a bit now. But it really is. Many creatures have ways they reason together — from schools of fish to ants with their stigmergy — but humans have one of the wildest collective reasoning methods out there. When someone presents a course of action or a belief we can ask them for a road map as to how one might get to embrace that belief or course of action. Or we can infer one. And then we can poke at it. In that process, beliefs or courses of action can change. We might say, actually, this is pretty convincing. Or maybe we say, look, you forgot about this.
Long ago, a lot of people believed that reasoning, the verb, was uniquely human, and of course it’s not at all. Crows and octopi can put humans to shame, and do on the regular. Ants, as I pointed out, have ways to reason collectively via stigmergy. Reasoning, the verb, is not very unique at all, and even collective reasoning is pretty common.
Reasoning the noun, however, taken as the shared linguistic or otherwise symbolic shared paths to belief is pretty human, at least at the scale and frequency we do it. A fish in a school breaks in a new direction, the fish around it sense the motion and decide to follow or not. Collective behavior emerges. But that fish does not, at least in the sense we mean here, get to make their case for right, left, or straight.
I mentioned we do this in many small ways throughout the day, but let me point out a couple bigger examples. Last night I posted a thread on Bluesky about AI being scapegoated for some broader, predictable failures of energy companies to meet growing demands on electric production. The points I made in the thread were solid, but I also showed a graph showing data center energy consumption growth vs. just AI growth. The thread started taking off and someone in my comments said, wait a second, this graph’s projection was done before everyone went completely mad for AI, surely it would differ if projected now. And they were right! This was the equivalent of saying — hey, check in the fridge, I think there’s milk in there. I checked, and there was milk in there.
Did this change my view of AI and electric companies? Yeah, a little bit. Not in a dramatic way, but some. It made me less sure of myself on it, and my belief more qualified. That graph was not a crucial piece of evidence, but it was additive. Without it my case was somewhat less intense.
Similarly, somewhere in that thread a person pushed back on a claim I made that electric demand was going to skyrocket with or without AI, because we are in the middle of a massive plan to shift transport from gas-powered to electric-powered, and that shift (along with shifts to electric heat pump heat, electric stoves and the like) made the needs of AI seem quite small. As I said, if a 40 terawatt hour increase (out of 30,000 global terawatt hours annually) is throwing us into a tailspin, how in the world are we going to accommodate the massive electrification project that addressing climate change requires?
Someone pointed out in the comments that cars would not consume nearly as much as AI. Now here’s the beautiful thing about humans. They could have just said, look, I don’t believe you. And I could have said, so what?
But we’re human. We want to appear reasonable. So we give reasons.
Just like I gave that graph as part of my evidence and got push back, they gave their reasoning. Electric cars were already 9% of all cars in the U.S. they said, and currently consumed 8 terawatt-hours a year. Multiply that by 10 to get what transport electrification would take and it’s less than what AI would consume three years from now.
I looked at that, and the 9% piece did not seem right to me. So I tried to find it and found the closest I could get to it was that if you lumped in plug-in hybrids with electrics, then about 8% of all new cars sold last year were electric. But that is what we call a flow measure (amount of EVs being added to existing stock). The figure you would need to use for this calculation is how many registered cars were EVs. And that number is about 0.8%.
There’s other problems with doing the math this way. The better way to do it is take existing barrels of oil consumed, convert that into kwh (accounting for electric car efficiency). That gives you less chance of missing things. But spotting this error at least shows the person’s estimates on cars were off by at least factor of 10. In this case they pushed back on my reasoning and I walked away a bit more grounded and sure.
Reasoning as pathway
One thing you’ll note about these examples is they are a little different than the milk. That graphic I shared that turned out to be lousy evidence? It was not that graphic that convinced me that the electric companies might be scapegoating AI for their own failures. (I probably came to my conclusions about electric companies watching ill-maintained infrastructure cause wildfire after wildfire and watching Texas power companies blame their predatory extractive behavior on solar energy. And I probably am prone to justify AI power consumption because I use and study it). Likewise, I’m almost certain that the person disagreeing with me about cars didn’t come to their conclusion reading the 9% statistic. They almost certainly went out and grabbed that after the fact, when they felt they needed to challenge my reasons.
In the end this works out, because everyone on these threads is motivated to prove they are right, which means they go out and grab evidence that might be relevant and push back on evidence that opposes their point. As Sperber and Mercier point out, when viewed from the point of view of collective reasoning, all the things Kahneman and Tversky label as flaws (such as myside bias) turn out to be strengths. Having people on multiple side of an issue, all with their myside bias, makes sure that relevant evidence for all sides of an argument is collected (usually by believers) and assessed (usually by doubters).
But in the end it does not matter if the “reasoning” provided is the way people got to their beliefs at all. In fact, it would often be suboptimal if that were the case. Reasoning — the noun — is most often used as a path for a non-believer or agnostic to get to belief. The path for them is not the same as the path for you, a believer. What you are trying to show is that a reasonable person can get there through reason, without necessarily embracing the whole existing belief system (or biases) of the believer.
And it’s the reasoning (noun) that’s the powerful part of the system. Because once the reasoning is articulated — and particularly if it is articulated well — then the listener or reader or viewer can poke at that and see if that pathway works for them, or if it doesn’t line up. And that’s the core of our social sense-making — we reason about reasoning that other give us, seeing if the path leads where it’s claimed to. Depending how that works out we either adopt, alter, or maintain beliefs.
Whether AI “reasons” or not is a bit irrelevant — the question is whether it produces valid reasons
The fact the reasoning (verb) doesn’t have to match the reasoning given (noun) to be useful is relevant to AI. In fact, it’s incredibly important.
For instance, I found this photo online, and wondered if ChatGPT’s recent o1 release could guess where it was from (it’s from the Russian city of Yakutsk, population 330,000, in east Siberia).
As you can see above, I asked it to take some guesses as to which city it was in, and then provide me with its reasoning.
I’m going to provide you the full output of o1, because it’s pretty impressive and the full output is important to my point. If you’re reading this, feel free to skim the output, I’ll reference what I need to later:
Below is one possible set of three guesses for the city shown in the photo, along with an explanation of the clues (natural, architectural, climatic, and cultural) that point in each direction. Each clue is assigned a numeric importance value on a scale of 1 (lowest) to 100 (highest).
1. Yakutsk (Sakha Republic, Russia)
Buildings elevated on stilts/piles (Importance: 90)
A common construction feature in regions with permafrost, especially in Yakutsk. Raising the foundation helps prevent heat from the building thawing the frozen ground underneath.
Severe weathering and utilitarian Soviet-era design (Importance: 70)
The concrete-panel and pastel-green paint fit mid- to late-Soviet construction typical of cities in northern and Far Eastern Russia.
Sparse vegetation and dusty, unpaved surroundings (Importance: 60)
Suggests a very cold climate with short summers (permafrost regions often have scant topsoil and limited greenery).
Russian signage referencing “01” (Importance: 50)
“01” was historically the fire emergency number in Russia. The fact that it’s still posted large on the building points to Russia, and the style of sign is common in smaller, remote Russian cities.
Given the combination of stilted housing, Soviet-block design, extremely cold/dry environment, and signage in Russian, Yakutsk is a very strong candidate.
2. Norilsk (Krasnoyarsk Krai, Russia)
Permafrost-related construction (Importance: 90)
Norilsk also has many buildings raised on piles for the same permafrost reasons.
Heavily weathered Soviet-style apartment blocks (Importance: 70)
The harsh climate of Norilsk (long, brutal winters and strong winds) similarly results in peeling paint and a heavily worn facade.
Minimal vegetation, industrial feel (Importance: 50)
Norilsk is known for its heavy industry and challenging environment, reflected in sparse street vegetation and utilitarian architecture.
Norilsk is a close second to Yakutsk because it shares the extreme climate and Soviet architectural style, though it is typically known for even more industrial surroundings and a more restricted entry policy for outsiders.
3. Magadan (Magadan Oblast, Russia)
Soviet-era building design in a Far Eastern setting (Importance: 70)
Magadan also features typical Soviet five-story blocks, with pastel facades that have weathered due to harsh conditions.
Remote, cold-climate city with unpaved/dirt roads (Importance: 60)
The city’s coastal-subarctic climate results in difficult road maintenance and limited growing season, which can look similar to the photo.
Russian emergency signage (Importance: 40)
Like other Russian locales, old-style “01” fire department signs remain common.
Magadan shares many of the same permafrost and architectural attributes, but the raised-pile style is often more strongly associated with cities deeper in the permafrost zone (like Yakutsk and Norilsk), so it comes in third.
Summary
Top Guess: Yakutsk (the prevalence of permafrost stilt construction, extreme weather wear, and the particular mix of dusty roads and sparse vegetation make it the most likely).
Second Guess: Norilsk (similar climate and construction, but typically more industrial).
Third Guess: Magadan (Far Eastern city with harsh climate and Soviet architecture, but slightly less hallmark stilt usage).
In this instance, I know that the top choice is correct, that this photo is from Yakutsk. And the first thing I have to say is this is pretty slick. However it comes to this conclusion, got it in one is good, and my guess is that these models will soon be a part of reverse image search functionality.
The second thing I have to say is I have no idea how it actually made the call that this building is in Yakutsk. I know the reasons it tells me. I think some of those reasons are related to how it made its decision (I am guessing that at least part of the process produces a photo description [concrete piles, etc] and looks for geography terms in adjacent semantic space, but I honestly don’t even know that for sure).
The third thing I have to say is — at least for the purposes of verification — it doesn’t matter as much as you might think how it got to that decision. Just as it didn’t really matter in our previous example how people really got to their impressions of power companies and AI. What matters is whether the outputted reasoning justifies the decision appropriately, in a verifiable way.
And it does that pretty well. The light green color is a typical Russian color, derived apparently from edicts of the Tsar in the 19th century, and carried forward in USSR design but not recent design. Five-story panel buildings of this sort, called khrushchyovka, were a common part of 1960s USSR architecture. The number 01 was the fire department number in the USSR (now 101). Some of these attributes you might find outside the historical USSR but it’s a lot to find together. More convincingly, you have this concrete piles design, which turns out to be heavily associated with Yakutsk’s approach to building on permafrost, where it’s really important for building heat to not thaw your foundation. There’s a limited number of large cities built in the historical USSR on permafrost; Yakutsk is the most prominent and the largest. A brief look through street view of Magadan and a few other Russian permafrost cities not mentioned shows that most buildings there don’t follow the concrete stilts approach of Yakutsk (at least visibly), though I have no idea why.
Eliminating Norilsk is much harder. It’s a closed city — you can only go there if approved by the Russian government and there is no land route to it (and during the colder months, of which there are many, there is no sea route to it either). There’s no Google Street View. Like Yakutsk, it favors in its older architecture some of these pastel green facades that o1 mentioned. The buildings I can see online, however, do seem to not have concrete stilts, at least displayed in such obvious ways. Without Street View, however, that may just be the selection of building that have been posted to Instagram, and non-representative. Or they may simply be obscured by the architecture or the point of view of the photo. I noticed in Yakutsk on Street View that newer buildings often had a panel that obscured most of the stilts, and you had to look at the very bottom to see them peeking through. Maybe that’s what’s going on.
I will say that as I looked through Norilsk and Yakutsk, I could sense a different vibe to the architecture. And our building fell on the Yakutsk side of the vibe. And maybe there’s something in that vibe that o1 picked up on that didn’t make the description. Or alternatively, maybe it’s just guessing that you’re far more likely to see a photo from an open city than a closed one. There’s dozens of things it might be using to guess that it might have not seen fit to put in the reasoning.
But this is how it works — you’re given the reasoning, and whether or not it matches how the computer or human came to their conclusion you see if it allows you to come to that conclusion. Reasoning isn’t necessarily a window into how one’s mental process functions (even if it can be). It’s a map from where the listener is to where you think they can get to.
In this case, I think it brings me to the desired conclusion, although I am of course hampered by knowing in advance what the correct answer is (a problem that could be easily addressed with a small research project and some student volunteers).
But aren’t you doing the work?
A question sometimes comes up at this point — well, look at all the work you had to do when you verified this. Isn’t it actually you who is doing the reasoning?
This question doesn’t really understand cognition well. In a sense, it’s always me doing the reasoning. Unless I can take an answer on authority — which is admittedly preferable on a question like this, but highly unlikely to be available for this niche a question — I’m going to have to reproduce the reasoning to check it. There’s not some property where reasons produced by a human can be directly transmitted into my brain and reasons produced by a computer must be “walked through” — you’ve got to walk through them regardless. That’s how you understand if the reasoning is solid. The question is how helpful the reasons given to you are in getting you in the vicinity of the truth.
Do the reasons provide adequate context? Surface relevant counter-arguments and counter-examples? Focus on the most determinative (and verifiable) evidence? Indicate certainty where certainty is warranted and hesitate where things are less clear? Use broad principles that can be applied elsewhere? Get associated facts right?
In other words, is the reasoning a good argument? You’re absolutely going to do the work, that’s how knowledge transfer works (again, outside being able to rely on authority). The test of reasoning is whether things make good (and I would say testable) arguments.
It’s quite clear that this current iteration of LLMs, in specific contexts, does very well at producing reasoning (the noun). And that they produce reasoning that normally would not be otherwise accessible, or which would be time consuming to obtain or produce. You can look at the amount of work I had to do to evaluate the assertion this was likely Yakutsk, and it’s quite a bit. But without this technology there is simply no way for me to start that journey at all. I’d maybe get the fact this building was Russian, I’d verify the 01 on the side and feel good about myself. I’d be able (maybe) to guess it was not from the really southern areas, based on vegetation or lack of it. But I’m not looking at that photo and going “Ah, yes, concrete stilt construction, clearly above the permafrost line” or noting the pastel green facade as significant. Those are not pathways that I can create on my own, and maybe I’m wrong, but I think they’re pathways that 99.999% of people not living in a Russian permafrost city are not going to get either.
Learning Critical Reasoning with Generative AI
The good news is we have many centuries of thought on how to analyze reasoning, and in particular a half-century of really solid thought from argumentation theory. We’ve also got models in education on how to assess reasoning. Together we could use these things to investigate the ways we can design and prompt AI to produce more useful reasoning, and teach students how to evaluate it.
In this example, the reasoning checked out. It won’t always. Sometimes you’ll need to push back — how widespread is that color green anyway? Did satellite republics use the same number for fire a Russia?
But when I look at what is going on here with these newer models, I just can’t see how this sort of assisted reasoning isn’t going to be the norm for a number of things. The way LLMs can find the intersections of two semantic spaces, like geographic locations and concrete stilts turns out to be unusually useful. It’s most obvious here as an intersection, but this is at the core of what it is doing when it looks at an article and generates a list of what it is evidence of and what it is not evidence of, or critiques a graph about administrative bloat.
Now note I am still very skittish about having generative AI produce outputs for students, largely because AI remains really unreliable with facts (it makes up a lot of bullshit) and because using it in that way doesn’t build the analytical skills students need.
But my experience is it models reasoning very well. And it seems to do well with different genres of reasoning, meaning it approaches (with a bit of prodding) a historical document like a historian and a geolocation task like a open source investigator. Having o1 look at a number of photos I started to note some patterns in approach. Obviously, looking for posted signs that have language on them, or notable numbers. But other stuff as well. Is this an industrial area? What does the vegetation tell you? Colors turn out to be a bigger deal than one might think. Bologna, Italy is a red brick city, compared to some other Italian cities that are beige stucco. But in particular I was fascinated by how much climate could be deduced from vegetation and architecture. Sloping roofs for snowy regions. Low widely spaced plants in arid regions. Telltale damage on roads that suffer frost heaves.
Putting photos up and walking through the reasoning I began to look at the photos differently myself. I don’t think I was looking at them the way the LLM was looking at them. But I started to look at them in the way that the LLM presented its reasoning. If rural, is there a specific type of crop?
Note that the reasoning provided was partially a result of what I’d asked the reasoning to use in its argument (cultural clues, climatic clues, etc). If talking to people who do this for a living we found we needed to focus on other things we could write the reasoning prompt differently. Focus on vehicles and we’d get more reasoning with car models, and the look of public transport, which is often distinct. Are there trolley wires? Lots of scooters? Are the trucks the flat-fronted cab-over-engine European trucks or the long-nosed American semis? What dimensions is the license plate?
That is, we can have the reasoning follow, relatively well, any sort of reasoning template we wish to emulate. If you know how an anthropologist looks at housing, you can have it try to look at housing as an anthropologist, you can have it looks at holiday decorations through the lens of economic history, or charts through a general statistical literacy frame. I know many faculty reading this post (and having read the godawful papers that LLMs produce) may doubt it can mimic reasoning in their discipline well. I can only speak to my experience running hundreds of experiments over the past few months using the disciplinary and professional lenses with which I’m familiar.
The newer releases of the LLMs still get a lot of facts wrong, still bullshit from time to time. They still often fail at facts, authority, provenance. They are remain horrible search engines. But they mimic genres of reasoning remarkably well, in ways that I feel can absolutely show students how to think in these ways, and how to evaluate these arguments, and in ways that produce a shocking amount of analytical value straight out of the gate.
How these things produce results will remain a mystery — but in some ways that’s always been the case for argument. I don’t know how you actually came to your conclusion, I can only ask you to draw a map that shows me how to reproduce your result, in the way that makes sense to me. My guess is that teaching students to produce usable chains of reasoning from LLMs will have some direct analytical benefits, and at my most hopeful I think it may also introduce the norms and nature of argument more generally to a world that has retreated into everything-is-opinionism, and show people what making a case actually looks like. More on that part in a bit. But for now, lets spend less time evaluating whether these technologies are reasoning, and more time evaluating — and learning to shape — the reasoning they produce.
Curse you Mike Caulfield for being so damn insightful and hello after a long time. I am tempted to still feel this is classic Benderian parroting. Crafting the prompt to outline its methods as you do provides a clear reasoning path and leaving it to use to decide as opposed to just asking “Where is this photo?”. We might get region, city, street address? And like the movie mentioned in the comments it’s still fraught. Unless we know ahead of time, as you did, or know the location first hand, it might always be open to interpretation, which of course is always true.
I’m reminded of seeing a landscape photo recently in Mastodon. As a hobby, in movies, paintings of landscapes I often try to guess the location. This photo spoke to me as being in southern Arizona or New Mexico based much on my time spent living there. The types of trees, rocks, sky color? I cants say I can dictate a formal chain but it just happened in seconds (the alt text told me it was Ramah New Mexico).
This flows into the speculation of AI requests for facts vs rendering judgement (is this person a suspect? Is that splotch cancer) as to the machine there is no difference.
Just want to thank you for flipping some ideas on for me and also putting up with one of those old comments that should be a post. Heck it’s an excuse for a long overdue hello.
Thank you. Searching for methods to guide students towards useful applications of AI tools.