Morning Overview

Companies retool websites to show up in AI search results

Companies across the web are rewriting technical configurations and restructuring content on their sites to appear in AI-powered search results, a shift driven by the rise of tools like ChatGPT search. The changes range from editing server-level crawl permissions to overhauling page layouts with machine-readable formatting. As generative AI tools increasingly mediate how people discover information online, the stakes for businesses that fail to adapt are growing sharper, reduced visibility, fewer referrals, and a shrinking share of audience attention.

What is verified so far

The clearest evidence of this shift comes directly from OpenAI, which has published specific technical requirements for websites that want to surface in ChatGPT search results. Site operators need to allow a dedicated crawler called OAI-SearchBot to access their pages, which typically means editing a file called robots.txt, the standard mechanism websites use to control which bots can index their content. OpenAI’s publisher guidance spells this out as a prerequisite for inclusion. Beyond the crawler itself, operators must also ensure their hosting provider or content delivery network permits traffic from OpenAI’s published IP addresses, according to the company’s search documentation. These are not optional best practices. They are gatekeeping conditions: fail to meet them, and a site simply will not appear.

OpenAI has also described partnerships with news organizations as part of its search ecosystem, signaling that the company views publisher cooperation as central to the product’s quality. The launch announcement for ChatGPT search framed these relationships as a way to ensure high-quality content reaches users through AI-generated answers rather than traditional link lists. In effect, OpenAI is asking publishers to trade some control over how their work is presented for the promise of continued visibility inside a new, conversational interface.

On the academic side, researchers have begun formalizing the practice of optimizing content for AI search engines under the term Generative Engine Optimization, or GEO. An arXiv preprint now in its third version provides a framework and experimental evidence showing that content changes can measurably improve how often a page is cited by generative engines. The paper, hosted on the preprint server maintained by Cornell University, treats GEO as a distinct discipline from traditional search engine optimization, with its own methods and metrics. It describes interventions such as rewriting introductions to foreground key facts, adding concise summaries, and clarifying entities, then measures how those edits affect whether AI systems choose a given page as a source.

A separate study published on arXiv, titled “Structural Feature Engineering for Generative Engine Optimization,” examines how specific page and content structures affect citation behavior by AI systems. That paper reports concrete citation-rate changes tied to structural features such as headings, summaries, and authoritative phrasing. Together, these two papers establish that GEO is not just marketing jargon but a measurable, experimentally testable field. They suggest that the way information is packaged (where headings are placed, how claims are framed, how clearly sections are labeled) can influence whether a generative engine surfaces a source at all.

These research efforts rely on access to large corpora and reproducible infrastructure. The preprints are distributed through Cornell Tech’s arXiv portal, which has become a central venue for early work on AI alignment, information retrieval, and now GEO. By making the experimental setups and datasets publicly available, the authors invite replication and critique, a step that distinguishes their findings from proprietary internal tests run by commercial platforms.

What remains uncertain

The biggest gap in the current evidence is the absence of on-the-record case studies from specific companies confirming they have changed their robots.txt files or whitelisted OpenAI’s IP addresses. OpenAI’s documentation tells publishers what to do, but no named business has publicly disclosed the results of making those changes. Without that, the story of corporate adaptation rests on inference from OpenAI’s guidance rather than verified corporate action. It is reasonable to assume that at least some high-traffic sites have implemented the recommended changes, but absent direct confirmation, those assumptions remain speculative.

The academic research, while rigorous in experimental design, has not yet been paired with institutional self-reporting. ArXiv’s own help pages, for instance, explain how the preprint service operates and how submissions are processed, but they do not discuss whether GEO techniques have changed how often its papers are cited in AI-generated answers. The experimental findings show what is possible in controlled settings, but real-world adoption data from publishers, retailers, or media companies is missing. Without logs from AI search providers or analytics from participating sites, it is difficult to know how often GEO-style optimizations translate into measurable gains.

Long-term effects on traffic and revenue also remain unquantified. A site that appears in a ChatGPT answer may gain brand exposure, but it may also lose a direct click if the AI summarizes the content well enough that users never visit the original page. No primary data source in the current reporting addresses this tension with hard numbers. ArXiv’s own pages about member institutions and donor support hint at the sustainability pressures facing open-access platforms, but they do not connect those pressures to AI-driven traffic patterns. For now, the economic impact of appearing in AI search results is more a matter of theory than of audited financial statements.

There is also an open question about whether GEO will reward genuinely useful content or simply reward content formatted to satisfy AI parsers. The structural engineering paper finds that fluent, authoritative phrasing increases citation likelihood, but “authoritative phrasing” is a style choice, not a guarantee of accuracy. If AI systems consistently prefer certain sentence patterns or heading structures, publishers may converge on a narrow set of templates, raising the risk that web content becomes more uniform and less distinctive over time. That homogenization could make it harder for readers to distinguish between careful reporting and confident-sounding speculation, especially when the AI interface already blurs the line between source and summary.

Another unresolved issue is how transparent AI search providers will be about their ranking criteria. Traditional search engines, while guarded about their algorithms, still expose link lists that analysts can study. ChatGPT search, by contrast, presents synthesized answers with only a small number of cited sources, making it harder to reverse-engineer why one page was chosen over another. Without independent audits or regulatory disclosure requirements, outside observers are left to infer ranking behavior from scattered examples and from the technical constraints that OpenAI has published.

How to read the evidence

The strongest evidence in this story comes from two categories: OpenAI’s own technical documentation and peer-reviewed or preprint academic research. OpenAI’s help center articles are primary sources in the strictest sense. They describe what the company requires, in its own words, with no intermediary interpretation. When OpenAI says a site must allow OAI-SearchBot, that is a verifiable policy statement, not a rumor or a leaked memo. These documents establish the minimum technical bar for participation in ChatGPT search and confirm that OpenAI expects publishers to take explicit steps if they want to be included.

The GEO research papers, distributed through the arXiv infrastructure, offer experimental evidence rather than anecdotal claims. They describe controlled tests, report measurable outcomes, and propose a vocabulary for discussing AI search optimization. That said, preprints have not undergone full peer review, and their findings should be treated as strong signals rather than settled science. The structural engineering paper’s citation-rate findings, for example, describe what happened in the researchers’ experiments, not necessarily what will happen across every AI search product. Different models, training data, or ranking layers could respond differently to the same structural tweaks.

What is notably absent from the evidence base is any independent audit of how ChatGPT search selects and ranks sources. OpenAI has not released comprehensive datasets showing which sites are most frequently cited, how inclusion correlates with robots.txt settings, or how ranking varies across topics. Without that, journalists and researchers are left to triangulate from public documentation, small-scale experiments, and the emerging GEO literature. The result is a picture that is suggestive but incomplete. We know what OpenAI says it wants from publishers, and we know that structural changes can influence AI citation behavior in lab settings, but we do not yet know how those pieces fit together at internet scale.

For readers and site operators, the most cautious interpretation is that a new optimization regime is forming, but its rules are still in flux. Allowing OAI-SearchBot and structuring content clearly are necessary steps for participation, not guarantees of prominence. GEO techniques may offer incremental advantages, yet they operate within opaque ranking systems that can change without notice. Until more organizations share concrete results (or regulators require greater transparency), any strategy for “AI search visibility” will rest on a mix of documented requirements, early research, and informed guesswork rather than on the kind of mature playbook that developed around traditional search engine optimization.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.