Morning Overview

Consulting giants built thousands of AI agents and now question their value

The world’s largest consulting firms spent the past two years racing to build AI agents at scale, treating raw deployment numbers as proof of progress. Now, with thousands of these digital workers in operation, leaders across firms including McKinsey, PwC, EY, and BCG are increasingly confronting an uncomfortable question highlighted in recent reporting: do the agents actually deliver measurable business results? The shift from counting agents to proving their worth signals that the consulting industry’s AI experiment has entered a more demanding phase, one where hype must give way to hard evidence.

McKinsey’s Agent Army Grows Fast, but the Numbers Conflict

McKinsey has scaled its AI agent fleet faster than any rival, though the exact size depends on which account you trust. In one interview with CEO Bob Sternfels, the firm is described as having around 40,000 human employees working alongside roughly 20,000 AI agents embedded in its operations, up from about 3,000 agents just 18 months earlier. Sternfels has outlined a future in which every McKinsey employee is enabled by one or more agents, framing the technology as a core extension of the firm’s workforce rather than a back-office experiment or a marketing showcase.

A separate account complicates that picture. In another setting, Sternfels has characterized McKinsey’s total workforce as 60,000, with 25,000 of those being agents supporting client and internal work, again up from only a few thousand roughly 18 months prior. Whether the true count is closer to 20,000 or 25,000 agents, and whether McKinsey’s reported human workforce is closer to about 40,000 or its broader “total workforce” figure of 60,000, the discrepancy itself is telling. When a firm’s own leadership produces different totals in different settings, it suggests that defining what counts as an “agent” is still a moving target. That ambiguity matters because it makes value measurement harder: if you cannot precisely define the unit, you cannot precisely measure its output, its cost, or its contribution to client outcomes.

From Agent Counts to Outcome Metrics

The broader consulting industry is now grappling with the same tension between scale and substance. Leading firms have deployed thousands of AI agents across internal workflows and client-facing engagements, and the initial excitement of rapid deployment is giving way to a harder conversation about what those agents are worth. According to reporting on how major consultancies are rethinking their AI strategies, firms are shifting focus from raw deployment to demonstrated value, a transition that echoes how earlier technology waves played out. Companies that rushed to build websites in the late 1990s or launch mobile apps in the early 2010s eventually had to prove those investments generated revenue or cost savings, not just clicks or downloads. AI agents are entering that same accountability phase.

Each firm is taking a slightly different approach to the measurement problem. According to the Business Insider report on how major consultancies are rethinking their AI strategies, PwC is concentrating on adoption within clearly defined impact zones, focusing on functions such as tax, audit, and risk where automation and augmentation can be linked to specific financial outcomes. The report says EY is tracking key performance indicators month-to-month, building a longitudinal picture of how agent-assisted work compares to traditional delivery models, rather than relying on one-time pilot results. It also describes BCG as emphasizing time savings as a core metric, looking at how many hours agents remove from standard tasks such as research, slide creation, and data cleaning. These are all reasonable starting points, but none of them amount to a shared industry standard. Without common benchmarks, it will be difficult for clients or investors to compare agent performance across firms, and each consultancy’s internal metrics risk becoming self-serving narratives rather than objective evidence.

QuantumBlack and the Outcomes-Based Pivot

McKinsey’s internal structure offers a window into how one firm is trying to bridge the gap between deployment and results. QuantumBlack, McKinsey’s AI arm, sits at the center of its agent strategy and acts as a hub for tooling, governance, and experimentation. Rather than treating AI agents as a generic capability that every practice builds independently, McKinsey is channeling development through this dedicated unit with its own operational identity, technical leadership, and product roadmap. This organizational choice reflects a bet that concentrated AI expertise, shared platforms, and reusable components will produce better outcomes than a diffuse, firm-wide rollout where agents are built ad hoc by individual teams with varying levels of skill and risk tolerance.

At the same time, McKinsey is framing a broader strategic shift toward outcomes-based work. The firm has long talked about tying its fees more closely to client impact, but AI agents give that ambition new urgency. The aspiration is to move gradually away from billing clients for hours spent and toward billing for results delivered, with agents serving as the mechanism that makes that model economically viable. If agents can automate research, data synthesis, and preliminary analysis, the theory goes, consultants can spend more time on judgment-intensive work while still protecting margins. Yet this remains largely aspirational. There is no public evidence that McKinsey or its peers have fundamentally restructured their pricing models around agent-driven outcomes at scale, and most engagements still appear to rely on familiar combinations of time-based and milestone-based fees.

The Industry’s Measurement Gap

The absence of standardized value frameworks is the most significant unresolved problem in the consulting industry’s AI push. Each firm is building its own internal scorecard, but there is no external audit, no third-party benchmark, and no shared definition of what “value” means in this context. Time savings, for instance, are easy to measure but can be misleading. An agent that shaves two hours off a research task has limited value if the consultant still spends the same total time on the engagement because client decision cycles, data quality issues, or governance bottlenecks remain unchanged. Similarly, counting the number of agent-assisted projects can exaggerate impact if the agents are used only at the margins of the work rather than at its core.

Cost reduction is another common metric, but it risks incentivizing firms to replace human roles rather than augment them, potentially eroding the quality and nuance of advice over time. A more meaningful approach would involve tracking client outcomes directly: did the agent-assisted engagement produce a better result for the client than a comparable engagement without agents? That might mean higher revenue, faster time to market, improved compliance, or more accurate forecasting. Designing such controlled comparisons is difficult in practice, especially when every client situation is unique, but it is the only way to move beyond internal efficiency metrics and toward genuine proof of value. Consulting firms that figure out how to measure and demonstrate client-facing impact, rather than just internal productivity, will have a significant advantage in selling AI-augmented services. Those that cannot will find it increasingly hard to justify the cost of maintaining thousands of agents that look impressive on a slide deck but do not change the bottom line for the people paying the bills.

What the Reckoning Means for Clients

For the companies that hire McKinsey, PwC, EY, and BCG, this internal reckoning carries direct consequences. Clients are already being pitched on AI-augmented engagements, often at premium rates, with promises of faster delivery, richer insights, and continuous support from agents that keep working after the consultants leave. If the consultancies themselves are still figuring out whether these agents deliver net-new value or simply repackage existing capabilities, clients need to become more demanding buyers. That starts with asking for concrete evidence: not just how many agents will be deployed, but what specific tasks they will handle, how their performance will be measured, and how those measures tie to the client’s own financial and strategic objectives.

In practice, that means pushing for contracts that define success in terms of outcomes rather than activity, and for transparency into how agents are governed, updated, and monitored over time. Clients can ask to see before-and-after comparisons from similar projects, request access to dashboards that track agent contributions, and negotiate fee structures that share upside when promised gains materialize. As the consulting giants move from experimentation to accountability, the balance of power in AI-augmented work will tilt toward organizations that insist on evidence instead of accepting agent counts as a proxy for innovation. The firms that welcome that scrutiny are the ones most likely to have built AI capabilities that truly matter; the rest will have to decide whether to fix their measurement gap or quietly scale back an experiment that grew faster than their ability to prove it works.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.