The Signal FilesAI, Judgment & the Future of WorkPrimary Research

AI made building free. Your judgment is the moat now.

Karpathy says the new programming language is English. The harder lesson: when anyone can build anything, the scarce asset is the mind doing the directing.

3,400 WordsEighteen Cited SourcesStop Trying To Be Invisible

Somewhere in the last two years, the act of making stopped being the hard part. Code, copy, a first draft of almost anything — a model will now produce it on request. Andrej Karpathy — a founding member of OpenAI and Tesla's former director of AI — compressed the shift into one line: "the hottest new programming language is English." Take it one step further and you reach the thesis worth defending: when execution gets cheap, the bottleneck moves up the stack — into the quality of your thinking.

Not the tool. The clarity of the mind directing the tool. We find this mostly right, dangerously incomplete in its popular form, and far more demanding than the LinkedIn version makes it sound. This is the operator's read — what the evidence supports, what most people get wrong, and why none of it is a free lunch.

Section OneWhat Karpathy actually said — and didn't

The thesis gets stretched into claims he never made, so the attributions matter. In his own words: "the hottest new programming language is English" — posted in January 2023, not 2025; half the internet misdates it. In his June 2025 Y Combinator talk he formalised the arc: Software 1.0 is code you write, 2.0 is neural-net weights you train, 3.0 is prompts in plain English, with the model reframed as a new kind of operating system. He coined "vibe coding," then called it a "throwaway tweet."

Karpathy, "The hottest new programming language is English," X, Jan 24, 2023; "Software Is Changing (Again)," Y Combinator, June 2025.

The part the evangelists skip: Karpathy is the thesis's own best skeptic. On the Dwarkesh Patel podcast in October 2025 he was blunt that for real, non-boilerplate code the agents produce "slop," that "autocomplete is my sweet spot," and that he stays "the architect." His prescription is an "autonomy slider" and keeping AI "on a tight leash" — small, verifiable chunks, a human in the loop. The director's advantage is real, and the man who named it keeps both hands on the controls. Hold both.

The man who named the director's advantage keeps both hands on the controls.

Section TwoWhy the bottleneck really does move up

Strip the hype and there is a hard empirical core. Three findings carry most of the load.

AI levels execution — it raises the floor. In a field experiment with 5,179 customer-support agents, an AI assistant lifted resolved-issues-per-hour by 14% on average — but 34% for novices and near-zero for the most skilled. A GitHub Copilot randomised trial had developers finishing a task 55.8% faster. The pattern repeats: AI compresses the skill gap on well-scoped work. Execution is commoditising in real time.

Brynjolfsson, Li & Raymond, "Generative AI at Work," NBER 2023 / QJE 2025 (n=5,179); Peng et al., GitHub Copilot RCT, 2023.

But judgment about where to apply it is where value concentrates. The study to internalise is BCG × Harvard, 758 consultants. Inside the AI's competence frontier: ~40% higher quality, more tasks completed. Outside it — on a problem deliberately chosen to be AI-unsuitable — consultants using AI were about 19 percentage points more likely to reach the wrong answer. The tool does not know its own edge. Ethan Mollick named the shape of it: the "jagged frontier" — invisible, irregular, navigable only by judgment.

Dell'Acqua et al., "Navigating the Jagged Technological Frontier," Harvard / BCG, 2023 (n=758); Mollick, "Centaurs and Cyborgs," 2023.

And the work itself changes shape — from generating to verifying. A Microsoft Research / Carnegie Mellon study of 319 knowledge workers found AI shifts effort from producing output to overseeing it. That is the whole thesis in a sentence: your job is becoming the editing, not the typing.

When execution is free, the residual is knowing what to execute — and whether it's any good.

This is why a chorus of operators converges on one word. Paul Graham: taste — "very exacting taste, plus the ability to gratify it" — matters more when anything can be built but the machine "can't tell you what's worth building." Jensen Huang: the durable skill is "being someone who's really good at asking questions." Naval Ravikant has said it for a decade: in an age of infinite leverage, judgment is the most important skill.

Section Three"Cognitive architecture" — the part that's actually rigorous

Here the popular thesis gets sloppy, because the appealing version — "learn a few mental models and you'll out-think everyone" — is contradicted by the science.

Good thinking is not a free-floating skill. It runs on deep, organised domain knowledge. This is one of the most robust findings in cognitive science. Daniel Willingham: "critical thinking is not a set of skills that can be deployed at any time, in any context… it very much depends on domain knowledge." Chess masters reconstruct real board positions far better than novices, then lose the entire advantage on random boards — their edge is stored patterns, not raw brainpower.

Willingham, "Critical Thinking," American Educator, 2007; Chase & Simon, chunking studies, 1973.

The uncomfortable corollary: far transfer is rare. There is no context-free "thinking skill" you carry everywhere; skills transfer mainly when domains share deep structure. So cognitive architecture cannot mean a tidy latticework of borrowed models. As Cedric Chin puts it, "the most valuable mental models do not survive codification" — you cannot acquire elite judgment by reading someone else's summary of it. It is tacit, built by doing, with feedback.

AnalysisThis refinement strengthens the AI-era argument. If the scarce asset were codified knowledge, AI would already own it — it has read everything. The scarce asset is the tacit judgment of when, whether, and how to apply knowledge — exactly what doesn't fit in a prompt or sit in a training set in usable form.

One more pillar: writing is the stress test for thinking. Graham's sharpest recent essay, Writes and Write-Nots: "to write well you have to think clearly, and thinking clearly is hard… a world divided into writes and write-nots is more dangerous than it sounds. It will be a world of thinks and think-nots." When AI will draft the words for you, the temptation is to outsource the test — and with it, the thinking the test was forcing you to do.

If the scarce asset were codified knowledge, AI would already own it. It has read everything.

Section FourThe strongest objections, steelmanned

Carry only the optimistic thesis into a real decision and you will make expensive mistakes. Three objections are serious enough that ignoring them is malpractice.

One: judgment is built through execution — so automating execution may eat the thing it's meant to elevate. You cannot direct well what you have never done; taste is downstream of reps. The deskilling evidence is measured, not hypothetical: after adopting AI polyp-detection, gastroenterologists' unassisted detection rates declined; shown incorrect AI scores, experienced radiologists' accuracy fell from 82% to 45.5%. This is Lisanne Bainbridge's 1983 "Ironies of Automation": the more reliable the automation, the more the operator's skill atrophies — right when you need it for the rare hard case the machine can't handle. The pipeline version has hard data: early-career workers (22–25) in the most AI-exposed jobs saw a 13% relative employment decline while older workers in the same roles held steady. If AI does the junior work, how does anyone become senior?

AI Review, deskilling meta-analysis, 2025; Stanford Digital Economy Lab, "Canaries in the Coal Mine," 2025.

Two: direction is not a permanent human moat — AI is climbing it. Evaluation, planning, even prompt-writing are partly automatable already: LLM-as-judge matches human agreement (~80%) on quality; machine-optimised prompts beat human-designed ones by up to 50% on hard benchmarks; the length of tasks AI can complete autonomously is doubling roughly every seven months. The honest framing is Shrivu Shankar's: taste is alpha, not a moat — a real but time-limited, relative edge.

Zheng et al., LLM-as-judge, NeurIPS 2023; Google DeepMind, prompt optimisation, 2023; METR task-horizon, 2025.

Three: the experts are the ones most likely to misjudge how much value is direction. The keystone result is brutal. Sixteen experienced open-source developers, working on their own mature repositories, were 19% slower with AI tools — and believed they were 20% faster. Perception moved opposite to reality. Layer on the Microsoft finding that higher confidence in AI correlates with less critical thinking, and the real risk appears: the "director" who rubber-stamps instead of directing, amplifying the machine's errors instead of catching them.

AnalysisHonesty check: METR's February 2026 follow-up found late-2025 tools no longer showed that slowdown — so −19% is not a permanent number. The durable lesson is the perception trap, not the figure. A thesis that only survives by hiding its objections isn't worth operating on.

Direction is the advantage. It is not a free one, and it is not self-renewing.

Section FiveWhy education survives — and why "college is a scam" is the weakest take in the room

Start with the mechanism. Learning happens at the point of friction. "Desirable difficulties" — spacing, retrieval, struggle — feel harder and produce more durable, more transferable knowledge. Generating an answer beats re-reading it; the effort is the consolidation. Done with rigour, a degree measurably builds reasoning — roughly a standard deviation of critical-thinking gain, concentrated in the reading- and writing-heavy, demanding courses. The caveat is the point: it is the difficulty that does the work, not the attendance.

Bjork & Bjork, "Desirable Difficulties," 2011; Huber & Kuncel, critical-thinking gains review, 2016.

Now the threat, stated precisely: AI removes the friction exactly where the learning lives. An MIT Media Lab study found AI-assisted essay-writers showed the weakest neural engagement and often couldn't quote the essay they had just "written" — the authors call it "cognitive debt." A Wharton trial is cleaner on cause: students given unguarded GPT-4 to practise did 17% worse once it was taken away — but a "tutor" version that withheld answers erased the harm. Frictionless offloading produces fluency without mastery. That is not a reason to ban the tool; it is a reason to redesign where the friction sits.

Kosmyna et al., MIT Media Lab "cognitive debt," 2025 (preprint, n=54); Bastani et al., generative-AI tutors RCT, PNAS 2025.

And "college is obsolete"? The data doesn't cooperate. US undergraduate enrolment rose 3.5% in spring 2025. Skills-based hiring is mostly rhetoric — dropping degree requirements moved actual non-degree hiring by only a few points. And the AI twist cuts against obsolescence: when AI homogenises everyone's application materials and output, cheap signals lose value and employers lean harder on verified, hard-to-fake ability. What's happening isn't death. It's bifurcation and a collapse of trust — positive average returns but with roughly a third of programmes in negative-ROI territory, and recent-graduate underemployment near 41%.

NSC Research Center enrolment estimates, 2025; FREOPP ROI analysis, 2024; NY Fed labour-market data, 2026.

AI removes the friction exactly where the learning lives.

Section SixSo how does the work — and the schooling — have to change?

The shift is from information delivery to thinking architecture — from rewarding the product (which AI now generates) to building and assessing the judgment (which it can't fake). Concretely, and where there is evidence:

Move assessment from recall to defence. AI defeats the take-home essay; it does not defeat a viva. Oral exams are in a documented renaissance, and handwritten blue-book sales jumped roughly 80% at one university in two years. Grade the thinking, not the text.

Use AI as a Socratic tutor that preserves struggle. This is the one place with a peer-reviewed trial: a scaffolded AI tutor roughly doubled learning gains versus active in-class instruction at Harvard — while the unguarded-access version above eroded skill. The design choice — withhold the answer, force the generation — is everything.

Scaffolded AI tutor RCT, Scientific Reports (Harvard), 2025; contrast with Bastani et al., PNAS 2025.

Teach the jagged frontier explicitly. The core literacy is no longer "how to produce the artifact" but "how to judge whether the machine's artifact is right" — when to trust it, how to verify, where it confidently lies.

Protect the reps that build judgment. Deliberately preserve the hard, unaided practice that forges taste before handing over the tool that would have done it. Earn the autonomy slider. Education's job was always productive struggle — it just used to be able to assume the struggle, because there was no shortcut. Now the shortcut is one prompt away, so the friction has to be engineered back in on purpose.

In ClosingThe actual problem

The thesis is right in direction, wrong if taken as a free lunch. Execution is commoditising; judgment, taste and problem-framing are where value concentrates — the empirical anchor is the jagged frontier, not a slogan. Your cognitive architecture is real leverage, but it is deep domain knowledge plus tacit, hard-won judgment plus the discipline of writing — not a borrowed latticework of models. The trap is invisible and self-flattering: the people most certain they're directing are often the ones rubber-stamping, feeling faster while getting slower. Keep AI on a leash. Verify. Stay the architect.

The bottleneck was never the tool. It is the clarity of the mind holding it. AI just removed every excuse that used to hide behind a lack of tools — and made that clarity the only thing left to compete on.

Figure 01 · Software, three ways
The same job, written in a new language each era
1.0 — instructions, explicit2.0 — learned parameters3.0 — natural-language intent
Each era moves the human further from the keystrokes and closer to the intent. Source: Karpathy, "Software Is Changing (Again)," Y Combinator, June 2025.
Figure 02 — The jagged frontier Inside versus outside the AI's competence frontier Two outcomes for consultants using AI. Inside the frontier, quality rose about 40 percent. Outside it, on a task deliberately chosen to be unsuitable, they were about 19 percentage points more likely to be wrong. Source: Dell'Acqua and colleagues, Harvard and BCG, 2023. INSIDE THE FRONTIER OUTSIDE THE FRONTIER +40% quality more tasks done +19 pts the edge is invisible more likely to be wrong Same tool, opposite outcomes — and nothing on the surface tells you which side of the line a task sits on. The hatched bar is the danger zone. Source: Dell'Acqua et al., Harvard / BCG, 2023 (n=758).
Figure 03 · The perception trap
Felt faster. Measured slower.
+20%
Felt faster
Developers believed AI made them about 20% faster on their own mature codebases.
−19%
Were slower
The stopwatch said 19% slower. Confidence is not calibration — the gap is the whole risk.
Sixteen experienced open-source developers, own repositories. Source: METR, July 2025. (Honesty note: a Feb 2026 follow-up found the slowdown had faded with newer tools — the durable lesson is the perception gap, not the −19%.)
Figure 04 · Who the lift goes to
5,179 support agents — and AI helped the newest most
34% more resolved issues per hour for the least-experienced agents — against near-zero for the most skilled. AI raises the floor; it doesn't lift the ceiling.
Each dot is a share of the productivity lift; filled dots mark where it landed — on novices. Source: Brynjolfsson, Li & Raymond, NBER 2023 / QJE 2025 (n=5,179).
Illustration — stylized, not a screenshot The shift from producing output to directing and verifying it A stylized illustration. On the left, the old shape of knowledge work: producing the output, lines of typing. On the right, the new shape: a small amount of generation and a large amount of judgment and verification, with an autonomy slider held short of full. Based on Microsoft Research and Carnegie Mellon, 2025, and Karpathy's autonomy-slider framing. YESTERDAY — YOU PRODUCE the effort was the typing NOW — YOU DIRECT & VERIFY generate judge · frame · verify where the value now sits AUTONOMY SLIDER — KEPT SHORT OF FULL now the effort is the judging A stylized illustration, not a screenshot. The work moves from producing the output to directing and checking it — with the autonomy slider deliberately held back. Based on Microsoft Research / CMU (2025) and Karpathy's "keep AI on a leash."
The bottleneck was never the tool.

Sources

  1. Karpathy, A. — "The hottest new programming language is English," X, Jan 24, 2023.
  2. Karpathy, A. — "Software Is Changing (Again)" (Software 1.0 / 2.0 / 3.0; LLM-as-OS), Y Combinator, June 2025.
  3. Karpathy, A. — interview, Dwarkesh Patel podcast, October 2025 ("autocomplete is my sweet spot"; "keep AI on a tight leash").
  4. Brynjolfsson, Li & Raymond — "Generative AI at Work," NBER w31161 (2023) / QJE 2025 (n=5,179; +14% avg, +34% novices).
  5. Peng et al. — GitHub Copilot randomised controlled trial, 2023 (task completed 55.8% faster).
  6. Dell'Acqua et al. — "Navigating the Jagged Technological Frontier," Harvard / BCG, 2023 (n=758).
  7. Mollick, E. — "Centaurs and Cyborgs on the Jagged Frontier," One Useful Thing, 2023.
  8. Microsoft Research / Carnegie Mellon — AI and critical thinking among knowledge workers, CHI 2025 (n=319).
  9. Willingham, D. — "Critical Thinking: Why Is It So Hard to Teach?", American Educator, 2007.
  10. Chase & Simon — chunking and chess expertise, 1973; Barnett & Ceci — transfer of learning, Psychological Bulletin, 2002.
  11. Chin, C. — "The Mental Model Fallacy," Commoncog.
  12. Graham, P. — "Taste for Makers" (2002); "Putting Ideas Into Words" (2022); "Writes and Write-Nots" (Oct 2024).
  13. METR — "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developers," July 2025 (n=16; −19% measured, +20% perceived); follow-up, Feb 2026.
  14. Stanford Digital Economy Lab — "Canaries in the Coal Mine," 2025 (early-career −13% relative employment).
  15. Deskilling meta-analysis — AI Review, 2025 (radiologist accuracy 82% → 45.5% on incorrect AI cues); Bainbridge, L. — "Ironies of Automation," 1983.
  16. Bjork & Bjork — "Desirable Difficulties," 2011; Huber & Kuncel — critical-thinking gains in college, 2016.
  17. Kosmyna et al. — MIT Media Lab, "cognitive debt" EEG study, 2025 (preprint, n=54); Bastani et al. — generative-AI tutors RCT, PNAS 2025 (−17% post-removal); scaffolded AI tutor, Scientific Reports (Harvard), 2025.
  18. NSC Research Center — enrolment estimates, 2025; FREOPP — college ROI, 2024; NY Fed — recent-graduate underemployment (~41%), 2026.

A note on method & honesty. Produced by the Stop Trying To Be Invisible research desk; edited and approved by a human who takes responsibility. Karpathy's quotes are verified against primary tweets and his Y Combinator talk. Two of the "AI harms thinking" studies (MIT cognitive debt; the deskilling meta-analysis) are early or correlational — credible, not settled. The METR slowdown was real in early 2025 and had faded by late 2025; the durable lesson is the perception trap, not the number. We cite the counter-evidence on purpose: a thesis that only survives by hiding its objections isn't worth operating on.

— The Signal Index

How clearly can the AI era see you?

A free, reproducible score of how AI and search find, understand and recommend you — instant, from your domain.

Get your Signal Index →

— The Signal

Field notes on visibility, in your inbox.

The research behind how brands get seen now — the Signal Files, the moment they publish. No noise.

Double opt-in. Unsubscribe anytime.