Morning Overview

Is AI making streaming worse? How algorithms can flatten great TV

Streaming platforms promise to learn what viewers love and serve it back to them, but a growing body of peer-reviewed research suggests the same recommendation engines designed to keep audiences watching can create feedback loops that may narrow what viewers end up encountering. The question is not whether algorithms shape viewing habits, but whether the feedback loops baked into these systems can push audiences toward a relatively small set of already-popular titles while leaving less-visible shows harder to find. For viewers who rely on their home screen to discover new series, the stakes are higher than a bad suggestion: the entire definition of “great TV” may be shifting beneath their feet.

What is verified so far

The clearest evidence for how recommendation systems can flatten viewing comes from academic research that has accumulated over more than a decade. A widely cited working paper by Daniel Fleder and Kartik Hosanagar, available on SSRN, modeled how recommender algorithms can shift consumption toward blockbuster titles or toward the long tail, depending on their design. That foundational work established a key insight: the architecture of a recommendation engine is not neutral. It actively redistributes attention across a catalog, and research on popularity bias and feedback loops suggests engagement-optimized designs can concentrate attention on already-popular content.

More recent work has sharpened that finding. A peer-reviewed survey in User Modeling and User-Adapted Interaction synthesized evidence on popularity bias in recommender systems, documenting the “rich-get-richer” dynamics that crowd out niche titles. Separately, research published in the Journal of Intelligent Information Systems quantified how iterative recommendation feedback amplifies concentration on “head” items and reduces long-tail exposure over repeated cycles. Together, these studies describe a self-reinforcing pattern: the more a system recommends popular shows, the more data it collects on those shows, which in turn makes them even more likely to be recommended again.

That loop was formally described in a paper presented at NeurIPS, titled “Deconvolving Feedback Loops in Recommender Systems,” which demonstrated how self-reinforcing feedback in recommendation engines can narrow the distribution of what users end up watching over time. The mechanism is straightforward: a system trained on engagement data will favor content that already generates high engagement, creating a cycle that is difficult to break without deliberate intervention. Once a show slips out of the recommendation spotlight, the system collects less information about it, making it even harder for that title to reappear in front of viewers.

Netflix, the largest streaming service by subscriber count, has disclosed its own emphasis on algorithmic discovery. In its Form 10-K filing for fiscal year 2024 with the U.S. Securities and Exchange Commission, the company described its focus on improving the product experience at scale, including discovery and engagement drivers. That corporate language aligns with the academic critique: engagement is the metric that matters, and engagement can favor what is already familiar. When success is defined primarily by engagement signals, the safest bet is often to push what has worked before.

The flattening effect extends beyond which titles appear in a recommendation row. Peer-reviewed research in the journal Convergence analyzed Netflix’s personalized artwork strategy, showing how the platform tailors thumbnail images for each user to maximize the chance of a click. A drama might be marketed with an action-heavy frame for one viewer and a romantic still for another, depending on which genres that viewer has previously watched. The study argued that this algorithmic packaging reshapes what counts as “great TV” in the viewer’s mind before they press play, because the same show is effectively rebranded to match pre-existing tastes.

Netflix itself has acknowledged the influence of this approach, stating in an official blog post that artwork and packaging can determine whether someone watches or scrolls past a title. The convergence of recommendation rankings and hyper-targeted imagery means that the algorithm is not only deciding which shows deserve attention, but also how those shows should look in order to secure that attention. In practice, this can make different series feel interchangeable on the home screen, even when their creative ambitions diverge sharply.

What remains uncertain

The academic case for algorithmic flattening is strong in theory, but several gaps make it difficult to measure the real-world damage with precision. No major streaming platform has publicly released internal data showing the extent to which long-tail content is suppressed in user recommendations versus how often it would surface under a non-personalized or purely chronological system. The models cited in research papers simulate these effects, but they rely on assumptions about user behavior and catalog structure that may not perfectly mirror any single platform’s actual deployment.

There is also a competing argument that recommendation engines can increase diversity under certain conditions. The Fleder and Hosanagar working paper explicitly noted that the direction of the effect depends on design choices. A system tuned to surface unfamiliar titles alongside popular ones could, in principle, broaden rather than narrow consumption, especially if it deliberately injects randomness or promotes underexposed genres. Whether any major streaming service has implemented such tuning at scale is not confirmed by available public disclosures, leaving open the possibility that some platforms are mitigating the worst feedback loops behind the scenes.

Peer-reviewed research in Convergence has examined the operational logics of Netflix’s recommender, connecting its mechanics to cultural taste-making and the use of engagement as a proxy for taste. But that analysis, like most academic work on proprietary algorithms, was conducted from the outside. Researchers relied on interface observations, corporate statements, and limited testing rather than direct access to Netflix’s internal weighting systems or A/B testing results. The gap between what can be observed and what is actually happening inside the algorithm remains significant, and it constrains any firm conclusions about how much cultural diversity is being lost.

Another open question concerns user agency. Even in a heavily optimized recommendation environment, viewers can still search for specific titles, follow word-of-mouth tips, or browse genre menus. The extent to which audiences rely on the default home screen versus these more active discovery methods is not well documented in the peer-reviewed literature cited here. Without longitudinal surveys tracking how people navigate streaming interfaces over time, it is difficult to know whether algorithms are steering the average viewer’s choices or merely amplifying preferences they would have expressed anyway.

Regulatory and policy responses are also in flux. Communications regulators have begun to scrutinize how prominence and discoverability work on connected TV platforms, particularly where public service media or culturally significant content is concerned. However, existing consultations and draft codes focus more on the placement of entire apps or channels than on the inner workings of recommendation engines once a viewer opens a streaming service. That leaves a large portion of algorithmic influence effectively unregulated, governed instead by each company’s internal metrics and product experiments.

What is clear from the available research is that recommendation systems are powerful cultural intermediaries, not neutral delivery pipes. Studies on popularity bias, feedback loops, and personalized packaging all point in the same direction: left to their own engagement-optimized devices, algorithms tend to favor the already popular and the already familiar. What remains uncertain is how far that tendency has progressed in practice, how much it can be countered by careful design, and whether viewers or regulators will demand more transparency about the invisible choices shaping what appears on the screen.

For now, the best evidence comes from models, external audits, and close readings of platform behavior rather than from the platforms’ own data. That leaves researchers, policymakers, and viewers working with an incomplete picture. The risk is that by the time fuller information emerges, the cultural landscape of television may already have been quietly reshaped, with a narrower canon of “must-watch” shows standing in for the far broader range of stories that streaming once promised to deliver.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.