AI generated passwords raise new security concerns, experts say

New research from computer scientists finds that large language models tasked with generating code are embedding hard-coded passwords and credentials into their outputs, even when developers never ask for them. The findings, drawn from static analysis of AI-generated code across multiple models, add fresh evidence to a growing body of work suggesting that AI tools designed to assist with security tasks may instead be quietly introducing new vulnerabilities. Federal standards from the National Institute of Standards and Technology already warn that passwords mimicking complexity often remain guessable, and the collision of these two problems threatens both individual users and enterprise software pipelines.

Hard-Coded Credentials Slip Into AI-Generated Code

A quantitative study analyzing LLM-generated code at scale used the static analysis tool SonarQube to scan outputs from several prominent models. The researchers found severe issues such as hard-coded passwords appearing across multiple models, a flaw that security teams have long flagged as one of the most dangerous vulnerabilities in production software. Hard-coded credentials bypass authentication controls entirely: anyone who reads the source code gains access. When an AI coding assistant inserts a placeholder password into a database configuration or API call, that string can persist through code review and into deployment if no one catches it.

What makes the finding especially troubling is that the models introduced credentials without being prompted to do so. The study explicitly notes that LLM-assisted coding can embed passwords even when the developer’s request contains no mention of authentication. This means the risk is not limited to careless prompting. It is baked into the model’s learned behavior, likely inherited from training data that included code repositories where developers left secrets in plaintext. For organizations adopting AI pair-programming tools at scale, every generated file becomes a potential vector for credential leakage unless teams add automated secret-scanning steps to their continuous integration pipelines. The fact that this kind of work is often shared via open repositories and preprint servers such as arXiv’s research archive also means that insecure patterns, once generated, can propagate widely through tutorials, examples, and copied snippets.

Why “Complex” Passwords Still Fail

The problem extends beyond code generation. When users ask chatbots to create passwords, the outputs tend to follow the same predictable patterns that federal guidelines have warned against for years. NIST Special Publication 800-63B, the primary standards document governing digital identity and authentication, explicitly states that composition and complexity rules drive predictable user behavior. The publication cites “Password1!” as a textbook example of a substitution that satisfies uppercase, lowercase, number, and special-character requirements while remaining trivially guessable. AI models trained on massive text corpora absorb exactly these patterns, meaning their generated passwords are likely to cluster around the same predictable structures that attackers already target with dictionary and rule-based cracking tools.

NIST’s guidance draws a sharp line between passwords that look complex and passwords that are cryptographically random. True randomness requires entropy sources that language models do not possess. LLMs predict the next likely token in a sequence, which is the opposite of randomness; it is statistical conformity. The federal standards framework, which also connects to NIST’s validation program for cryptographic modules, sets benchmarks for systems that handle sensitive data. Passwords produced by a text-prediction engine fall well outside those benchmarks, yet millions of users may trust AI-generated strings precisely because they appear to satisfy the old complexity rules that NIST itself has moved away from. Earlier empirical work on user authentication, including a 2017 usability and security study, has already shown that when people are nudged toward certain formats, they converge on a narrow band of predictable choices—precisely the kind of structure that language models are trained to reproduce.

Users Share Sensitive Data With AI Chatbots

The risk compounds when users treat AI chatbots as trusted security advisors. An academic survey examining how people interact with conversational AI agents found that a non-trivial subset of regular users engages in behaviors that could enable attacks, including disclosure of sensitive information during routine conversations. Sharing existing passwords, account details, or personal identifiers with a chatbot creates a secondary exposure channel. Even if the chatbot does not store the data permanently, the interaction may be logged, cached, or used in future model training, depending on the platform’s data-retention policies. If logs are compromised or misused, they can provide attackers with both raw credentials and rich contextual clues about how a person chooses and manages passwords.

This behavioral pathway matters because it turns a password-generation request into a two-sided vulnerability. On one side, the AI produces a weak password. On the other, the user may have already fed the system enough personal context for an attacker who gains access to conversation logs to reconstruct or guess the credential. Cornell-affiliated researchers studying AI risk behaviors suggest the problem is not hypothetical: users routinely share data they would never type into a traditional web form, treating the chatbot interface as a private conversation rather than a networked service. That misperception lowers inhibitions, especially when chatbots adopt a friendly tone and reassure users about privacy without offering granular, verifiable details about storage, retention, and access controls.

LLMs Struggle With Password-Domain Tasks

Separate experiments testing whether LLMs can crack passwords offer an indirect but revealing lens on why they also fail at generating strong ones. A study evaluating LLMs in an adversarial password-guessing setting used synthetic profiles and measured performance using Hit@k-style accuracy metrics, a standard benchmark for how many correct guesses a model produces within its top-k attempts. The results provided concrete evidence of LLM limitations in password-domain tasks: the models performed poorly at guessing passwords, which might sound reassuring until you consider the implication. If a model cannot reliably distinguish strong passwords from weak ones in an adversarial context, it also cannot reliably generate passwords that would resist such attacks, because it lacks a grounded notion of what makes one candidate meaningfully stronger than another.

The underlying mechanism is the same in both directions. Language models optimize for plausibility, not security. A password that “looks right” to a model trained on human text is, by definition, one that conforms to patterns humans have already used. That conformity is precisely what makes it vulnerable. Cryptographic password generators solve this by drawing from uniform random distributions, a process that has no analog in transformer-based text prediction. The gap between what LLMs produce and what true randomness requires is not a bug that better training data will fix, it is a structural limitation of the architecture itself. Regulatory discussions, such as those reflected in federal guidance on authentication, increasingly emphasize this distinction between human-memorable but patterned secrets and machine-generated random values, underscoring why delegating password creation to a text model is fundamentally misaligned with modern security expectations.

What Stronger Safeguards Would Require

Addressing these risks will demand changes at multiple levels. For developers, the immediate step is integrating automated secret-detection tools into any workflow that incorporates AI-generated code. Static analysis can catch hard-coded credentials before they reach production, but only if teams treat AI outputs with the same skepticism they would apply to code from an unvetted contractor. That means instituting mandatory reviews, scanning for secrets in both source files and configuration templates, and blocking deployments when tools flag potential passwords, API keys, or tokens. Organizations should also retrain engineering teams to see AI assistants as suggestion engines rather than authoritative sources, reinforcing that anything touching authentication or cryptography must be checked against vetted libraries and standards.

On the user side, security guidance needs to catch up with how people actually interact with chatbots. Enterprises deploying conversational AI should provide explicit, prominent warnings against pasting passwords or sensitive personal details into chat windows, and they should configure systems to redact or reject obvious secrets rather than echoing them back. Where password generation is unavoidable, organizations can pair LLM interfaces with dedicated cryptographic random generators, using the model only to explain password managers and multi-factor authentication while leaving the actual secret creation to tools designed for that purpose. At a policy level, aligning AI deployments with established frameworks like NIST’s digital identity standards and cryptographic validation programs can help ensure that convenience-driven features do not quietly erode the very security controls they are supposed to support.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

AI generated passwords raise new security concerns, experts say

Hard-Coded Credentials Slip Into AI-Generated Code

Why “Complex” Passwords Still Fail

Users Share Sensitive Data With AI Chatbots

LLMs Struggle With Password-Domain Tasks

What Stronger Safeguards Would Require

Author

Get weekly updates with the latest news and tips!

More in Cybersecurity

IG

FB

PIN

LI

X