Vibe coding’s downsides are piling up, especially for open-source projects

A growing body of academic research warns that AI-assisted “vibe coding,” where language models assemble software from open-source components with minimal human oversight, is creating hidden costs for the projects it depends on. While the approach delivers short-term speed gains, studies now link it to security weaknesses, maintainer burnout, and a feedback loop that could erode the open-source ecosystem from the inside out.

What Vibe Coding Actually Means for Open Source

The term “vibe coding” gained traction after Andrej Karpathy used it to describe a workflow where developers guide AI agents through natural-language prompts rather than writing code line by line. A formal treatment of the concept, published on arXiv by Koren, Bekes, Hinz, and Lohmann, defines vibe coding as AI agents assembling software from open-source libraries with reduced direct user engagement. Their equilibrium model finds that while individual productivity rises, the bond between users and maintainers weakens. Fewer users transition into contributors, and the sharing incentives that sustain open-source projects degrade over time.

That dynamic matters because open-source software is not self-sustaining. Libraries survive on a cycle: users find bugs, file issues, submit patches, and eventually become maintainers themselves. When an AI agent silently pulls in a dependency, the human developer may never visit the project page, read its documentation, or understand its internals well enough to contribute back. The arXiv model predicts this pattern could reduce both entry into open-source development and the overall quality of available code, a conclusion that several empirical studies now support with hard numbers.

The research pipeline that surfaces these risks depends on its own shared infrastructure. Preprints and datasets about AI-assisted development often appear first on community-backed repositories whose institutional members collectively fund servers, curation, and moderation. Those services, in turn, rely on a mix of operational support and individual donations, plus clear guidance for contributors to keep the flow of new work manageable. The same pattern shows up at universities like Cornell, where open infrastructure and open research reinforce each other. In that sense, vibe coding is not just a software story; it is a stress test for the broader culture of shared digital resources.

Security Flaws Baked Into AI-Generated Code

Speed without scrutiny introduces risk. An empirical study examining 733 AI-generated code snippets in real GitHub projects found that 29.5% of Python snippets and 24.2% of JavaScript snippets contained security weaknesses spanning 43 CWE categories. These are not hypothetical lab exercises; the researchers identified the snippets in production repositories where other developers and automated tools depend on them.

The vulnerabilities ranged from classic injection flaws to improper input validation and insecure cryptographic use. Many of the issues were subtle enough that they slipped past casual review, especially when the AI-produced code looked stylistically consistent with the surrounding project. The study’s authors warned that as more developers rely on AI suggestions, these weaknesses can accumulate into systemic risk across the dependency graph.

A separate analysis reported by Help Net Security, based on CodeRabbit’s review of 470 open-source GitHub pull requests, found that AI-assisted PRs generated roughly 1.7 times more issues during review than their human-written counterparts, with “major issues” increasing sharply. The pattern suggests that even when AI code passes initial checks, it carries a higher density of problems that surface only under closer inspection, shifting the burden onto reviewers who may already be stretched thin.

Agents Build Well but Maintain Poorly

Not all AI-generated contributions are equal, and the distinction matters for open-source health. An observational study of 567 pull requests created with Claude Code across 157 open-source projects found that 83.8% were accepted or merged, and 54.9% were merged without further modification. Task types included refactoring, documentation, and tests. On the surface, those acceptance rates look strong and suggest that maintainers are open to AI help when it appears to save time on routine work.

But a deeper look at breaking changes tells a different story. A study accepted at MSR 2026 compared 7,191 agent-generated pull requests against 1,402 human pull requests using the AIDev dataset. Agents introduced fewer breaking changes in straightforward code-generation tasks, such as implementing self-contained features with clear specifications. In maintenance work, including refactoring and routine chores, agents carried a higher breaking-change risk. The researchers identified a “confidence trap” in which consistent formatting and passing test suites mask deeper compatibility problems, especially in complex dependency chains and edge-case behavior.

For maintainers, that means AI contributions can look polished while quietly introducing regressions that only emerge downstream, after releases propagate into user environments. The cost of those regressions is rarely borne by the AI vendor; it lands on project volunteers who must triage bug reports, bisect history, and unwind changes that seemed harmless at review time.

Maintainers Are Already Paying the Price

The theoretical risks have already become operational headaches. Daniel Stenberg, creator and maintainer of cURL, announced that the project ended its HackerOne bug bounty effective January 31, 2026, citing an “explosion in AI slop reports.” The confirmed-vulnerability rate had fallen below 5% starting in 2025, meaning the vast majority of incoming reports were plausible-sounding but ultimately false, consuming time and mental energy to debunk. For a security-critical tool like cURL, each bogus report still requires serious attention, because ignoring a real issue would be catastrophic.

The Godot game engine faces a parallel problem. Godot co-founder Remi Verschelde has described a surge of low-quality AI-generated pull requests that force maintainers to assess not just the code itself but whether the person submitting it actually understands what it does, according to Game Developer’s reporting. That added cognitive load falls on volunteer teams who already operate without corporate backing. When triage time doubles but the ratio of useful contributions stays flat, the math works against project survival.

Maintainers also report a shift in contributor expectations. Some AI-assisted contributors treat open-source projects as free debugging services for their model outputs, rather than communities with norms and limited capacity. Issue templates, contribution guidelines, and code of conduct documents help, but they were not designed for a world where a single user can generate dozens of semi-coherent patches in an afternoon.

Even the Tooling Is Fragile

Projects that have adopted AI review tools face their own reliability concerns. GitHub’s December 2025 availability report disclosed a Copilot Code Review service degradation in which a large fraction of pull request review requests failed and had to be re-requested. When AI-powered review becomes a standard part of the merge pipeline, outages do not just slow things down; they stall entire workflows and force maintainers to either wait or review manually under time pressure.

This fragility compounds the broader problem. If vibe coding increases the volume of pull requests while simultaneously making the review tools those PRs depend on less reliable, maintainers absorb the gap. Small open-source teams, which lack the staffing to absorb sudden spikes in traffic, are especially vulnerable. A few days of degraded AI review can leave a backlog of unreviewed changes, security patches, and user-facing fixes, all competing for attention.

Designing for Sustainability, Not Just Speed

The emerging research does not argue for abandoning AI assistance altogether. Instead, it points toward guardrails that align vibe coding with the long-term health of the commons it draws from. Projects can require that AI-generated contributions be explicitly labeled, so reviewers know when to probe more deeply. Maintainers can prioritize issues and pull requests that demonstrate understanding—clear rationales, tests, and documentation—over raw volume.

On the tooling side, teams can treat AI services as helpful but optional layers rather than hard dependencies. If a code-review model fails, the process should gracefully degrade to human review with clear expectations, not collapse into chaos. And organizations that benefit heavily from AI-accelerated development can invest back into the open-source and research infrastructure that makes it possible, whether through funding, staffing, or governance support.

The core tension of vibe coding is that it optimizes for local productivity while externalizing costs onto shared systems. The studies and maintainer accounts now emerging suggest those systems are already straining. Whether the ecosystem adapts will depend less on what AI can do in isolation and more on how its users choose to participate in the communities that make modern software—and modern research—possible.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X