
“Abstract form January 2026-1” by Alvesgaspar is licensed under CC BY-SA 4.0.
What Is Vibe Coding? And Why It's Not Enough.
In February 2025, Andrej Karpathy, co-founder of OpenAI and former director of AI at Tesla, posted a thread describing a new way of writing software. He called it "vibe coding": giving in to the vibes, talking to an AI tool with voice, accepting all the code it generates without reading it, and pasting error messages back in when things break. He described it as genuinely fun and surprisingly effective for small projects.
The term took off. Collins English Dictionary named it Word of the Year for 2025. Gartner projects that 60% of new code will be AI-generated by the end of 2026. Every board deck now mentions it. Every engineering team claims to be doing it.
The problem is that vibe coding, as Karpathy described it, is the lowest-sophistication way of using AI development tools. It is reactive, tactical, and inconsistent. It works brilliantly for prototypes and proofs of concept. It produces poor codebases and unpredictable results when applied to anything that needs to be maintained, scaled, or trusted in production.
For investors and business leaders, the distinction matters enormously. The question is not whether your engineering team is "using AI" - almost everyone is. The question is whether they are using it in a way that creates genuine, sustained productivity gains or just the feeling of productivity.
“There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”
The Perception Gap
In mid-2025, METR, a non-profit research institute, published the most rigorous study to date on AI coding tools and developer productivity. They ran a randomised controlled trial with sixteen experienced open-source developers, people who had contributed to repositories averaging over 22,000 stars and a million lines of code. Half worked with AI tools (primarily Cursor Pro with Claude 3.5 Sonnet). Half worked without.
The headline finding was striking: developers with AI tools were 19% slower than those without. But the developers themselves believed they were 20% faster. A perception gap of nearly 40 percentage points.
This does not mean AI tools are useless. An earlier study by MIT and Microsoft Research, involving 4,800 developers, found 55% faster task completion with GitHub Copilot: on isolated, well-defined tasks. Both findings are true simultaneously, and the tension between them is the most important thing for business leaders to understand.
AI tools make simple, bounded tasks faster. But in complex, mature codebases, the kind that actually run businesses, the time saved generating code can be consumed by the overhead of reviewing, debugging, and reworking inconsistent output. The net effect depends entirely on how the tools are used. Reactive, unstructured usage, vibe coding, can make experienced developers slower. Structured, deliberate usage can make them significantly faster.
The Maturity Spectrum
Karpathy himself recognised the distinction. By February 2026, he publicly moved away from the term he coined, proposing "agentic engineering" to describe the professional end of the spectrum. He broke it into two halves: "Agentic" because the default is orchestrating AI agents rather than writing code directly, and "Engineering" to emphasise that there is an art, science, and expertise to doing it well.
Simon Willison, creator of Django and one of the most respected voices on practical AI development, drew the sharpest line: if the LLM wrote the code but you reviewed it, tested it, and can explain how it works, that is software development. If you accepted everything without understanding it, that is something else entirely. He also observed something our experience confirms: good engineering practices - automated tests, clean documentation, CI/CD, well-factored code - make AI agents produce better results. The disciplines that mattered before AI matter more with it, not less.
From our experience training over a thousand engineers across multiple organisations, we see a clear spectrum of AI-assisted development maturity. Where a team sits on this spectrum determines whether AI tools are genuinely improving their output or just making them feel productive.
Level 1: Vibe Coding
What it looks like: The developer describes what they want in natural language, accepts most of what the AI generates, and iterates by pasting error messages back. Minimal code review, no structured approach to context, no consistent workflow.
Where it works: Prototypes, proofs of concept, throwaway scripts, personal projects, exploring ideas. Anywhere the code does not need to be maintained, understood by others, or trusted in production.
Where it breaks down: Production systems, team environments, anything with security requirements, regulated industries, codebases that need to survive beyond the next demo. The output is inconsistent: sometimes excellent, sometimes subtly wrong in ways that surface weeks later.
The business signal: If your engineering team describes their AI usage and it sounds like this, individual experimentation with no shared practices and no quality framework, they are capturing perhaps 20% of the available value. The METR study suggests they may actually be slower than they think.
Level 2: Structured Prompting
What it looks like: The developer writes deliberate, well-constructed prompts. They review AI output carefully, iterate on specific sections, and maintain awareness of what the code does. They have personal patterns that work for them.
Where it works: Individual productivity on familiar codebases. The developer is faster on routine tasks and uses AI as a sophisticated autocomplete.
Where it breaks down: Consistency across a team. Every developer has their own approach. Knowledge does not transfer. When someone leaves, their prompting patterns leave with them. Quality varies between individuals.
The business signal: The team is getting value from AI, but it is locked in individual habits rather than institutional capability. There is no multiplier effect.
Level 3: Context Engineering
What it looks like: The team writes specifications before writing code. They structure context deliberately, project documentation, architecture decisions, coding standards, so that AI tools operate with full understanding of the codebase and its conventions. They build repeatable workflows and shared context files. Quality is consistent because the inputs are consistent.
Where it works: Production systems, team environments, complex codebases. This is where AI-assisted development starts delivering the 1.2 to 2x productivity gains that the optimistic studies measure. The key insight: the engineering work is not writing code - it is writing the specifications and context that allow AI to write code well.
The business signal: The team has institutional AI capability. It survives personnel changes. It improves over time as shared context accumulates. This is where the sustained competitive advantage begins.
Level 4: Agentic Engineering
What it looks like: The team orchestrates multiple AI agents working in concert: one writing code, another reviewing it, another running tests, another updating documentation. Workflows are automated end-to-end. The developer's role shifts from writing code to directing and reviewing an AI-powered engineering pipeline.
Where it works: Mature teams with strong engineering foundations. The prerequisite is Level 3: without structured context and specifications, agents produce inconsistent results at scale. With them, a small team can operate with the output of a team several times its size.
The business signal: This is the frontier. Few teams operate here today, but the ones that do have a compounding advantage that widens every month.
What This Means for Investors and Business Leaders
When a portfolio company reports that their engineering team is "using AI," the follow-up question matters more than the headline. Are they vibe coding, individual experimentation that feels productive but may not be? Or have they built the structured practices that translate AI capability into measurable business outcomes?
The practical indicators are straightforward. A team at Level 1 or 2 will describe AI usage in terms of tools: "we use Cursor" or "we have Copilot licences." A team at Level 3 or 4 will describe it in terms of practices: "we write specifications first," "we have shared context standards," "we measure AI-assisted delivery velocity against a baseline."
Andrew Ng, the Stanford professor and founder of DeepLearning.AI, has built open-source tooling specifically to help teams structure context for AI agents. His Context Hub project reflects the same insight: the teams that get the most from AI tools are the ones that invest in giving those tools structured, high-quality context to work with. The ad hoc approach - vibe coding - hits a ceiling quickly.
The investment in moving from Level 1 to Level 3 is not large. It is primarily training and practice, giving engineers the structured techniques they need and the time to embed them. The return, measured in sustained productivity improvement, is significant. Our data across multiple bootcamp deliveries shows 76% of attendees saving five or more hours per week within one week of training, with the gains sustained and compounding as shared practices mature.
“It has been fascinating to watch how so many of the techniques associated with high-quality software engineering (automated tests and linting and clear documentation and CI and CD and cleanly factored code) turn out to help coding agents produce better results as well.”
The Questions Worth Asking
If you are an operating partner evaluating AI adoption across a portfolio, or a CEO trying to understand whether your engineering investment in AI is paying off, these are the questions that reveal where a team actually sits:
"How do your engineers use AI tools?" Listen for whether the answer describes individual tool usage (Level 1-2) or team-wide practices and standards (Level 3-4).
"What happens when a new engineer joins the team?" At Level 1-2, they start from scratch with their own AI habits. At Level 3-4, they inherit shared context, specifications, and workflows that make them productive with AI tools from day one.
"Can you measure the productivity impact?" At Level 1-2, the answer is typically anecdotal: "it feels faster." At Level 3-4, there are baselines and metrics. The METR study should make any business leader sceptical of subjective assessments.
"What is your AI security posture?" Vibe coding and security governance rarely coexist. Teams at Level 1 are often using AI tools in ways that create compliance risk without realising it. Our piece on deploying AI at speed without cutting corners covers the security framework in detail.
The good news is that the gap between levels is closable. It requires deliberate training, not more time, which is why a structured, intensive approach works better than letting teams figure it out on their own.
Related Reading
- AI Bootcamp - two-day intensive that moves engineering teams from vibe coding to structured AI development
- AI Enablement Services: strategy, assessment, and capability building for AI adoption
- AI Readiness Assessment: What Investors Should Actually Measure, how the maturity spectrum maps to portfolio-level readiness scoring
- How Mid-Market CEOs Can Deploy AI at Lightning Speed - the security framework for AI tool deployment
- What Can Intercom's £100M Bet Teach PE Operators?, lessons from a bold AI transformation
- From AI Chaos to Strategic Advantage, what boards need to understand about AI
Frequently Asked Questions
References
- Andrej Karpathy. Vibe Coding. X (Twitter) (2025).
- Andrej Karpathy. Agentic Engineering. X (Twitter) (2026).
- METR. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. arXiv (2025).
- Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M.. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. MIT / Microsoft Research (2023).
- Simon Willison. Agentic Engineering Patterns. simonwillison.net (2026).
- Andrew Ng. Context Hub: Giving Coding Agents Better Context. DeepLearning.AI (2026).
Want to move your team beyond vibe coding?
Our AI bootcamp takes engineering teams from reactive prompting to structured, production-grade AI development in two days. Delivered by practising CTOs, using your own codebase.