maddaisy – cultivating innovation

Tag: enterprise-ai

Vibe Coding Enters the Enterprise. The Governance Question Follows It In.

When Andrej Karpathy coined the term “vibe coding” in early 2025, he was describing something informal — a developer giving in to the flow of conversation with an AI assistant, accepting whatever code it generated, and iterating by feel rather than by specification. It was a shorthand for a new way of working that felt more like directing than engineering.

Fourteen months later, the term has migrated from developer Twitter into enterprise press releases. Pegasystems announced this week that its Blueprint platform now offers an “end-to-end vibe coding experience” for designing mission-critical workflow applications. Salesforce has embedded similar capabilities into Agentforce. Gartner, in a May 2025 report titled Why Vibe Coding Needs to Be Taken Seriously, predicted that 40 per cent of new enterprise production software will be created using vibe coding techniques by 2028. What started as a solo developer’s guilty pleasure is being repackaged as an enterprise strategy.

The question is whether the repackaging addresses the risks, or merely relabels them.

From Slang to Sales Pitch

The appeal of vibe coding in an enterprise context is straightforward. Natural language replaces formal specification. Business users can describe what they want in conversational terms — a workflow, an approval chain, a customer-facing process — and an AI assistant translates that intent into a working application. Development cycles that previously took months collapse into days or hours. Stakeholder alignment happens at the prototype stage rather than after months of requirements gathering.

Pega’s implementation illustrates the model. Users converse with an AI assistant using text or speech to design applications, refine workflows, define data models, and build interfaces. They can switch between conversational input and traditional drag-and-drop modelling at any point. Completed designs deploy directly into Pega’s platform as live, governed workflows. The company’s chief product officer, Kerim Akgonul, framed it as “the excitement and speed of vibe coding” combined with “enterprise-grade governance, security, and predictability.”

That framing is telling. Enterprise vendors are not adopting vibe coding wholesale — they are domesticating it. The original concept involved a developer accepting AI-generated code on trust, with minimal review. The enterprise version keeps the conversational interface but routes the output through structured frameworks, predefined best practices, and platform-level guardrails. Whether that still qualifies as vibe coding or is simply a new marketing label for low-code development with an AI front end is an open question.

The Numbers Behind the Hype

Gartner’s 40 per cent prediction is eye-catching, but it deserves scrutiny. The firm also projects that 90 per cent of enterprise software engineers will use AI coding assistants by 2028, up from under 14 per cent in early 2024. These are not niche forecasts — they describe a wholesale transformation of how software gets built.

The market signals support the direction. Y Combinator reported that a quarter of its Winter 2025 startup cohort had codebases that were 95 per cent AI-generated. AI-native SaaS companies are achieving 100 per cent year-on-year growth rates compared with 23 per cent for traditional SaaS. Pega’s own Q4 2025 results showed 17 per cent annual contract value growth and a 33 per cent surge in cloud revenue, with management attributing much of the acceleration to Blueprint adoption.

But there is a less comfortable set of numbers. A Veracode report from 2025 found that nearly 45 per cent of AI-generated code introduced at least one security vulnerability. Linus Torvalds, creator of Linux, publicly cautioned that vibe coding “may be a horrible idea from a maintenance standpoint” for production systems requiring long-term support. And Gartner’s own research acknowledges that only six per cent of organisations implementing AI become “high performers” achieving significant financial returns.

The Shadow Already Has a Name

For regular readers of maddaisy, these risks will sound familiar. When we examined shadow AI in February, the data showed 37 per cent of employees had already used AI tools without organisational permission — including coding assistants plugged into development environments without security review. Vibe coding, in its original ungoverned form, is essentially shadow AI with a better name.

The enterprise vendors’ pitch — governed vibe coding, with guardrails — is a direct response to this problem. Rather than fighting the tide of developers and business users reaching for AI-assisted tools, platforms like Pega and Salesforce are channelling that energy through controlled environments. It is the same pattern that played out with cloud computing a decade ago: shadow IT became sanctioned cloud adoption once the governance frameworks caught up.

The difference this time is speed. Cloud adoption played out over years. Vibe coding is moving in months. And as maddaisy’s coverage of agentic AI drift highlighted, AI-generated systems do not fail suddenly — they degrade gradually, in ways that are harder to detect than traditional software failures. An application built through conversational prompts, where the development team may not fully understand the underlying logic, amplifies that risk considerably.

The Governance Gap Is the Real Story

The enterprise vibe coding pitch rests on a critical assumption: that platform-level guardrails can substitute for developer-level understanding. In regulated industries — financial services, healthcare, government — this assumption will be tested quickly and publicly.

The immediate challenge is not whether vibe coding works in a demo. It clearly does. The challenge is what happens six months into production, when the original conversational prompts have been refined dozens of times, the underlying models have been updated, and the people who designed the workflows have moved on. That is the maintenance problem Torvalds flagged, and it maps directly onto the agentic drift pattern: small, individually reasonable changes accumulating into a system whose behaviour no longer matches its original intent.

Consultants and technology leaders evaluating vibe coding platforms should be asking three questions. First, can you audit the reasoning chain — not just the output, but why the system built what it built? Second, what happens when the AI model underneath is updated — does the application need to be revalidated? Third, who owns the maintenance burden when the person who “vibe coded” the application is no longer available?

What to Watch

Enterprise vibe coding is not a fad. The productivity gains are real, the vendor investment is substantial, and the Gartner forecasts — even if directionally approximate — point to a genuine shift in how software gets built. PegaWorld 2026, scheduled for June in Las Vegas, will likely showcase dozens of enterprise vibe coding implementations.

But the narrative developing around it echoes the early days of every enterprise technology wave: speed first, governance second. The organisations that get this right will be those that treat vibe coding as a development interface, not a development shortcut — using the conversational speed to accelerate design while maintaining the engineering discipline to ensure what gets built can be understood, audited, and maintained over time.

The vibes are entering the enterprise. The question is whether the rigour follows them in.

March 6, 2026
AI Inference Is the Enterprise Security Risk Most Organisations Are Not Addressing

Most enterprise AI security conversations still focus on training — how models are built, what data goes in, how to prevent poisoning. But the greater operational exposure sits elsewhere: in inference, the moment a trained model processes a live query and produces an output. That is where proprietary logic, sensitive prompts, and business strategy become visible to anyone watching the traffic.

A recent panel hosted by The Quantum Insider, featuring leaders from BMO, CGI, and 01Quantum, put the point bluntly: inference is AI working, and AI working is where risk accumulates. Nearly half of the audience polled during the session admitted they lack confidence that their AI systems meet anticipated 2026 security standards. That number is consistent with broader industry data: a Cloud Security Alliance survey found that only 27 per cent of organisations feel confident they can secure AI used in core business operations.

This is not an abstract concern. It is the practical, operational end of the governance conversation maddaisy has been tracking for weeks.

Why inference, not training, is the exposure point

Training happens once (or periodically). Inference happens continuously — every API call, every chatbot interaction, every agentic workflow execution. As Tyson Macaulay of 01Quantum explained during the panel, inference models often contain the distilled intellectual property of an organisation. In expert systems, the model itself reflects proprietary training data, domain knowledge, and internal logic. Reverse engineering an inference endpoint can reveal insights about what the organisation knows and how it thinks.

But the exposure runs in both directions. Prompts themselves reveal information — about individuals, strategy, and operational priorities. A medical query reveals personal health data. A corporate query may signal product development direction. The question, in other words, can be as sensitive as the model.

When maddaisy examined CIOs’ non-AI priorities in February, cybersecurity topped the list — precisely because AI adoption was expanding the attack surface. Dmitry Nazarevich, CTO at Innowise, described security spending increases as “directly related to the increase in exposure and risk to data associated with the increased attack surface resulting from the introduction of generative AI.” Inference security is where that expanding surface is most exposed — and most neglected.

The shadow AI dimension

The problem is compounded by what organisations cannot see. Research suggests that roughly 70 per cent of organisations have shadow AI in use — employees running unauthorised tools outside IT oversight. Every unsanctioned ChatGPT or Claude query involving company data is an unmonitored inference event, pushing proprietary information through systems the organisation does not control.

JetStream Security, a startup founded by veterans of CrowdStrike and SentinelOne, raised $34 million in seed funding last week to address precisely this gap. The company’s product, AI Blueprints, maps AI activity in real time — which agents are running, which models they use, what data they access. The premise is straightforward: you cannot secure what you cannot see.

When maddaisy covered shadow AI in February, the focus was on governance and policy. Inference security adds a harder technical dimension. It is not enough to write policies about acceptable AI use if the organisation has no visibility into what models are being queried, by whom, and with what data.

Real-world vulnerabilities are already surfacing

The risks are not hypothetical. In February, LayerX Security published a report describing a critical vulnerability in Anthropic’s Claude Desktop Extensions — a malicious calendar invite could silently execute arbitrary code with full system privileges. The issue stemmed from an architectural choice: extensions ran unsandboxed with direct file system access, enabling tools to chain actions autonomously without user consent.

The debate that followed was instructive. Anthropic argued the onus was on users to configure permissions properly. Security researchers countered that competitors like OpenAI and Microsoft restricted similar capabilities through sandboxing and permission gates. The real lesson for enterprises is that inference-layer vulnerabilities are architectural, not incidental — and they require controls before deployment, not after.

As Rock Lambros of RockCyber put it: “Every enterprise deploying agents right now needs to answer — did we restrict tool chaining privileges before activation, or did we hand the intern the master key and go to lunch?”

The governance gap has a security-shaped hole

Maddaisy has covered the emerging agentic AI governance playbook extensively — the frameworks from regulators, the principles converging around least-privilege access and real-time monitoring. But frameworks are policy instruments. Inference security is the engineering layer that makes those policies enforceable.

The numbers illustrate the disconnect. According to the latest governance statistics compiled from major 2025-26 surveys, 75 per cent of organisations report having a dedicated AI governance process — but only 26 per cent have comprehensive AI security policies. Fewer than one in 10 UK enterprises integrate AI risk reviews directly into development pipelines. Governance without security controls is aspiration without implementation.

The financial services sector offers a partial model. Kristin Milchanowski, Chief AI and Data Officer at BMO, described her bank’s approach during the Quantum Insider panel: bringing large language models in-house where possible, ensuring that additional training on proprietary data remains contained, and treating responsible AI as a board-level cultural priority rather than a compliance exercise. But BMO operates under some of the strictest regulatory regimes globally. Most enterprises do not face equivalent pressure — yet.

What practitioners should be doing now

The practical agenda emerging from this convergence of research is specific and actionable:

Audit inference endpoints. Map every production AI system, including shadow deployments. The JetStream model — real-time visibility into which models are running, what data they touch, and who is responsible — is becoming table stakes.

Apply least-privilege to AI agents. The agentic governance frameworks maddaisy covered last week prescribe this. At the inference layer, it means restricting tool chaining, sandboxing execution environments, and requiring explicit permission gates for cross-system actions.

Build cryptographic agility into procurement. The Quantum Insider panel raised a forward-looking point: “harvest now, decrypt later” attacks — where encrypted inference traffic is collected today for decryption once quantum computing matures — are overtaking model drift as the top digital trust concern among infrastructure leaders. Embedding post-quantum cryptography expectations into vendor contracts now is practical and low-cost.

Treat inference security as infrastructure. Not as a feature, not as an add-on. As the panel concluded: critical infrastructure must be secured before it is tested by failure.

The operational layer matters most

The governance conversation has matured rapidly. Frameworks exist. Principles are converging. Regulation is arriving. But between the policy layer and the production environment sits inference — the operational layer where AI actually works, where data flows through models, where prompts reveal strategy, and where the absence of controls creates the exposure that governance documents are supposed to prevent.

Gartner projects spending on AI governance platforms will reach $492 million this year and surpass $1 billion by 2030. That money will be wasted if it funds policies without the engineering to enforce them. The organisations pulling ahead will be those that treat inference security not as a technical detail for the security team, but as the operational foundation on which their entire AI strategy depends.

March 5, 2026
The Agentic AI Governance Playbook Is Taking Shape. Most Enterprises Are Not Ready for It.

The governance playbook for agentic AI is starting to take shape — and it looks nothing like the frameworks most enterprises currently rely on.

Over the past month, a cluster of regulatory bodies, law firms, and industry coalitions have published guidance specifically addressing agentic AI systems. The UK Information Commissioner’s Office released its first Tech Futures report on agentic AI in January. Mayer Brown published a comprehensive governance framework in February. The Partnership on AI identified agentic governance as the top priority among six for 2026. And FINRA’s 2026 oversight report now includes a dedicated section on AI agents as an emerging threat to financial markets.

Taken individually, none of these publications is remarkable. Taken together, they represent something maddaisy has been tracking since mid-February: the governance conversation is finally catching up to the deployment reality.

The problem these frameworks are trying to solve

When maddaisy examined the agentic AI governance gap in February, the numbers were stark: three-quarters of enterprises planning to deploy agentic AI, but only one in five with a mature governance model. Subsequent analysis of agentic drift — where systems degrade gradually without triggering alarms — and shadow AI adoption revealed that the risks are not hypothetical. They are already materialising in production environments.

The core issue, as these new frameworks make explicit, is that agentic AI does not fit within existing oversight models. Traditional AI governance assumes a human reviews the output before acting on it. Agentic systems are the actor. They plan, execute, and adapt autonomously — booking appointments, approving procurement, triaging complaints, managing collections. Governance cannot happen after the fact. It must be embedded in real time.

What the emerging frameworks have in common

Despite coming from different jurisdictions and institutions, the recent guidance converges on several principles that practitioners should note.

Least privilege by default. Mayer Brown’s framework emphasises restricting what an agent can access — not just what it can do. Agents should not have standing access to sensitive databases, trade secrets, or systems beyond their immediate task scope. This mirrors the zero-trust approach that cybersecurity teams have adopted over the past decade, now applied to autonomous software.

Human checkpoints at decision boundaries, not everywhere. The emerging consensus rejects both extremes: fully autonomous operation with no oversight, and human approval for every action (which would negate the point of agentic AI). Instead, the frameworks advocate for defined boundaries — moments where human approval is required before the agent proceeds. These include irreversible actions, decisions in regulated domains such as healthcare or financial services, and any step that falls outside the agent’s defined scope.

Real-time monitoring, not periodic audits. The ICO report and Mayer Brown both stress continuous behavioural monitoring after deployment. This addresses the drift problem directly: an agent that passed all review gates at launch may behave differently three months later after prompt adjustments, model updates, and tool changes. Logging the full chain of reasoning and actions — not just inputs and outputs — is becoming a baseline expectation.

Transparency to users. Multiple frameworks now explicitly require that organisations disclose when a customer is interacting with an AI agent rather than a human. The ICO notes that agentic AI “magnifies existing risks from generative AI” because these systems can rapidly automate complex tasks and generate new personal information at scale. Users need to know what they are dealing with — and they need a route to a human at any point.

Value-chain accountability. The Partnership on AI flags a gap that most enterprise governance programmes have not addressed: who is responsible when something goes wrong in a multi-agent system? If an agent calls another agent, which calls a third-party tool, which accesses a database — and the outcome is harmful — the liability chain is unclear. Their recommendation: establish an agreed taxonomy for the AI value chain before deployment, not after an incident.

Where the frameworks fall short

For all the convergence, there are notable gaps. None of the published guidance adequately addresses the measurement problem. Adobe’s 2026 AI and Digital Trends Report found that only 31% of organisations have implemented a measurement framework for agentic AI. Without clear metrics for what “good governance” looks like in practice, the frameworks risk becoming compliance theatre — policies that exist on paper but do not change how agents actually operate.

There is also limited guidance on cross-border deployment. The Partnership on AI calls for international coordination, but the practical reality — as maddaisy’s analysis of America’s state-by-state regulatory fragmentation highlighted — is that even within a single country, compliance requirements vary dramatically. An agent deployed in New York now faces transparency requirements under the RAISE Act that do not apply in Texas, where the Responsible AI Governance Act takes a different approach. For multinational enterprises, the compliance surface is formidable.

What this means for practitioners

The practical takeaway is straightforward, if demanding. Organisations deploying or planning to deploy agentic AI should be doing four things now.

First, audit existing governance frameworks against the agentic-specific requirements these publications outline. Most enterprises have AI policies designed for advisory systems. Those policies almost certainly do not cover autonomous execution, multi-agent coordination, or real-time behavioural monitoring.

Second, define decision boundaries before deployment. Which actions require human approval? What constitutes an irreversible decision in your context? Where are the regulatory tripwires? These questions are easier to answer before an agent is in production than after it has been running for six months.

Third, invest in observability infrastructure. As maddaisy noted in the agentic drift analysis, the systems that fail most dangerously are the ones that appear to be working. Full execution logging, behavioural baselines, and anomaly detection are not optional extras — they are the minimum viable governance stack for agentic systems.

Fourth, assign clear ownership. Mayer Brown’s framework identifies four distinct governance roles: decision-makers who set policy, product teams who implement it, cybersecurity teams who integrate agents into security procedures, and frontline employees who can identify and escalate issues. Most organisations have not mapped these responsibilities for their agentic deployments.

The governance race is just starting

The gap between agentic AI deployment and agentic AI governance remains wide. But the direction of travel is now clear: regulators, industry bodies, and legal advisors are converging on a set of principles that will become the baseline expectation. Organisations that build these capabilities now — real-time monitoring, defined decision boundaries, value-chain accountability, and clear ownership — will be better positioned than those scrambling to retrofit governance after their first incident.

The frameworks are not perfect. They will evolve. But the era of governing agentic AI with policies designed for chatbots is ending. For consultants and technology leaders advising enterprises on AI deployment, that shift should be shaping every engagement.

March 4, 2026
The Pentagon-Anthropic Standoff Exposes a New Category of AI Vendor Risk

When maddaisy examined America’s fragmented AI regulation landscape last week, the focus was on states pulling in different directions while the federal government tried to impose order from above. That piece ended with the observation that organisations face a compliance labyrinth with no clear exit. Five days later, the labyrinth got a new wing — and this one has armed guards.

On 27 February, President Trump ordered all federal agencies to immediately cease using Anthropic’s AI systems. Hours later, Defence Secretary Pete Hegseth moved to designate the company a “supply-chain risk to national security” — a label previously reserved for foreign adversaries. The same evening, OpenAI CEO Sam Altman announced that his company had struck a deal to deploy its models on the Pentagon’s classified networks.

The sequence of events was not subtle. An American AI company refused to remove ethical guardrails from its military contract. The government threatened to destroy it. A rival stepped in to take the work. For anyone who manages AI vendor relationships — and that now includes most enterprise technology leaders — the implications are significant and immediate.

What actually happened

The conflict had been building for months. Anthropic holds a contract worth up to $200 million with the Pentagon and, through its partnership with Palantir, was one of only two frontier AI models cleared for use on classified defence networks. The arrangement worked — until the Pentagon insisted on access to Claude for “all lawful purposes,” without the ethical restrictions Anthropic had built into its acceptable use policy.

Anthropic’s red lines were specific: no mass domestic surveillance and no fully autonomous weapons without human oversight. CEO Dario Amodei argued that current AI systems “are simply not reliable enough to power fully autonomous weapons” and that “using these systems for mass domestic surveillance is incompatible with democratic values.”

The Pentagon disagreed, and the situation escalated rapidly. Defence Secretary Hegseth gave Anthropic an ultimatum: agree to the government’s terms by 5:01 p.m. on 27 February or face designation as a supply-chain risk and potential invocation of the Cold War-era Defence Production Act. Anthropic refused. Trump’s ban and Hegseth’s designation followed within hours.

What makes this more than a contract dispute is the weapon the government chose. As former Trump AI policy adviser Dean Ball wrote, the supply-chain risk designation would cut Anthropic off from hardware and hosting partners — “effectively destroying the company.” Ball, hardly an opponent of the administration, called it “attempted corporate murder.”

OpenAI steps in — with familiar language and different terms

OpenAI’s Pentagon deal arrived with careful framing. Altman claimed the agreement included the same safety principles Anthropic had sought: prohibitions on mass surveillance and human responsibility for the use of force. “The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement,” he wrote.

The critical difference is what “agreement” means in practice. Anthropic sought contractual guarantees — binding restrictions written into the terms of service. OpenAI’s approach permits “all lawful uses” and relies on the Pentagon’s existing policies and legal frameworks rather than company-imposed limitations. As TechCrunch noted, it remains unclear how — or whether — the safety measures in OpenAI’s deal differ substantively from the terms Anthropic rejected.

When maddaisy covered OpenAI’s Frontier Alliance with McKinsey, BCG, Accenture, and Capgemini last week, the story was about a vendor that needed consulting partners to scale its enterprise business. The Pentagon deal adds a wholly different dimension to OpenAI’s ambitions — and a wholly different category of risk.

The institutional gap

The deeper issue is structural. For decades, Pentagon technology contracts were dominated by slow-moving, heavily regulated defence contractors — Raytheon, Lockheed Martin, Northrop Grumman. These companies built institutional muscle for navigating political transitions, managing classified programmes, and absorbing the long-term volatility of government work. They were not exciting. They were durable.

AI startups are neither slow-moving nor heavily regulated. They operate on venture capital timelines, consumer brand logic, and talent markets where a single ethical controversy can trigger an employee exodus. OpenAI has already seen 11 of its own employees sign an open letter protesting the government’s treatment of Anthropic — even as their employer benefits from it.

This is the mismatch that matters for practitioners. The AI companies building the tools that enterprises depend on are now also becoming national security infrastructure — but they have none of the institutional frameworks that role demands. They lack the political risk management, the bipartisan relationship-building, and the organisational resilience to weather what comes next.

The vendor risk no one modelled

For organisations that have built their AI strategies around Anthropic or OpenAI, the past week introduced a category of risk that does not appear in most vendor assessment frameworks: political risk.

Anthropic’s designation as a supply-chain risk, if upheld, would prevent any military contractor or supplier from doing business with the company. Given how deeply Anthropic’s Claude is integrated with Palantir’s systems — which are themselves critical Pentagon infrastructure — the practical implications cascade well beyond the original dispute. The CIO analysis of the situation compared it to the FBI-Apple standoff over iPhone encryption in 2015, but noted that the current administration “seems less willing to be patient.”

For enterprises in the defence supply chain, the risk is direct: continued use of Anthropic’s technology may become a contractual liability. For everyone else, the risk is precedential. If the federal government can threaten to destroy an American company for negotiating contract terms, the calculus changes for every AI vendor evaluating government work — and for every enterprise evaluating those vendors.

What consultants and technology leaders should watch

Three things matter in the weeks ahead.

First, the legal challenge. Anthropic has stated it will contest the supply-chain designation in court. Legal analysts suggest the designation is unlikely to survive judicial scrutiny — it was designed for foreign adversaries, not domestic companies in active contract negotiations. But legal proceedings take time, and the commercial damage from even a temporary designation could be severe.

Second, the talent signal. AI companies compete fiercely for a small pool of researchers and engineers. The Pentagon standoff has made the ethical positioning of AI labs a hiring issue, not just a branding one. OpenAI’s internal tension — benefiting commercially from the deal while employees publicly protest the government’s tactics — is a dynamic that affects product roadmaps, not just press coverage.

Third, the precedent for vendor governance. As the Council on Foreign Relations observed, the Anthropic standoff raises a fundamental question about AI sovereignty: can a private firm constrain the government’s use of a decisive military technology, and should it? Whichever way that question resolves, it will reshape the terms on which AI vendors provide their products — to governments and to enterprises alike.

The irony is hard to miss. Just days after maddaisy reported on America’s fragmented state-level AI regulation, the federal government demonstrated that its own approach to AI governance is no less chaotic — merely higher-stakes. For organisations navigating vendor relationships in this environment, the lesson is uncomfortable: the AI companies they depend on are now players in a political contest they did not design and cannot control.

March 3, 2026
Insurtech’s AI-Fuelled Five Billion Dollar Comeback — And the Question the Industry Has Not Answered

Global insurtech funding reached $5.08 billion in 2025, up 19.5% from $4.25 billion the year before. It is the first annual increase since 2021 — and, according to Gallagher Re’s latest quarterly report, it marks a fundamentally different kind of recovery from the one the sector last enjoyed.

The 2021 boom was driven by venture capital chasing consumer-facing disruptors. The 2025 comeback is driven by insurers and reinsurers themselves investing in operational AI. That distinction matters far more than the headline number.

The money is coming from inside the house

In 2025, insurers and reinsurers made 162 private technology investments into insurtechs — more than in any prior year on record. This is not outside capital speculating on disruption. It is the industry itself funding its own modernisation, a shift Gallagher Re describes as a “changing of the guard” in the insurtech investor community.

The fourth quarter was particularly striking. Funding hit $1.68 billion — a 66.8% increase over Q3 and the strongest quarterly figure since mid-2022. More than 100 insurtechs raised capital for the first time since early 2024, and mega-rounds (deals exceeding $100 million) returned in force, with 11 such rounds totalling $1.43 billion for the full year, up from six in 2024.

Property and casualty insurtech funding rebounded 34.9% to $3.49 billion, driven by companies like CyberCube, ICEYE, Creditas, Federato, and Nirvana, which collectively secured $663 million in Q4 alone. Life and health insurtech, by contrast, declined slightly — a 4.6% dip that underlines where the industry sees its most pressing operational gaps.

Two-thirds of the money follows AI

The most telling statistic in the report is this: two-thirds of all insurtech funding in 2025 — $3.35 billion across 227 deals — went to AI-focused firms. By Q4, that share had climbed to 78%.

Andrew Johnston, Gallagher Re’s global head of insurtech, frames this as convergence rather than a trend: “Over time, we see AI becoming so integrated into insurtech that the two may well become synonymous — in much the same way as we could already argue that ‘insurtech’ is itself a meaningless label, because all insurers are technology businesses now.”

That trajectory is visible in the deals themselves. mea, an AI-native insurtech, raised $50 million from growth equity firm SEP in February — its first external capital after years of profitable organic growth. The company’s platform, already processing more than $400 billion in gross written premium across 21 countries, automates end-to-end operations for carriers, brokers, and managing general agents. mea claims its AI can cut operating costs by up to 60%, targeting the roughly $2 trillion in annual industry operating expenses where manual workflows persist.

At the seed stage, General Magic raised $7.2 million for AI agents that automate administrative tasks for insurance teams — reducing quote generation time from approximately 30 minutes to under three in early deployments with major insurers.

Profitability, not just growth

What separates the 2025 wave from the 2021 boom is that several insurtechs are now proving they can make money, not just raise it.

Kin Insurance, which focuses on high-catastrophe-risk regions, reported $201.6 million in revenue for 2025 — a 29% increase — with a 49% operating margin and a 20.7% adjusted loss ratio. Hippo, another property-focused insurtech, reversed its 2024 net loss with $58 million in net income, driven by improved underwriting and a deliberate shift away from homeowners insurance toward more profitable lines.

These are not unicorn-valuation stories. They are companies demonstrating operational discipline — the kind of results that explain why insurers and reinsurers, rather than venture capitalists, are now leading the investment.

The B2B shift

Gallagher Re’s data reveals another structural change worth watching. Nearly 60% of property and casualty deals in 2025 went to business-to-business insurtechs — a 12 percentage point increase from 2021’s funding boom. Meanwhile, the deal share for lead generators, brokers, and managing general agents fell to 35%, the lowest on record.

The implication is clear: capital is flowing toward technology that improves how existing insurers operate, not toward new entrants trying to replace them. The disruptor narrative of the early 2020s has given way to something more pragmatic — and, arguably, more durable.

This parallels a pattern visible across financial services. As maddaisy noted when examining Lloyds Banking Group’s AI programme, established institutions are increasingly treating AI not as an innovation experiment but as core operational infrastructure — and measuring it accordingly.

The question the industry has not answered

For all the funding momentum, Johnston raises a challenge that the sector has yet to confront seriously: the “so what” problem.

“As the implementation of AI starts to deliver efficiency gains, it is imperative that the industry works out how to best use all of this newly freed up time and resource,” he writes.

This is not a hypothetical. If mea can genuinely reduce operating costs by 60% for a carrier, that frees up a substantial portion of the 14 percentage points of combined ratio currently consumed by operations. The question is whether that freed capacity translates into better underwriting, deeper risk analysis, and improved customer outcomes — or whether it simply gets absorbed into margin without changing how insurance fundamentally works.

The broker market is already feeling the tension. In February, insurance broker stocks dropped roughly 9% after OpenAI approved the first AI-powered insurance apps on ChatGPT, enabling consumers to receive quotes and purchase policies within the conversation. Most analysts called the selloff overdone — commercial broking remains complex enough to resist near-term disintermediation — but the episode illustrated how quickly market sentiment can shift when AI moves from back-office tooling to customer-facing distribution.

What to watch

The $5 billion figure is a milestone, but the real signal is in its composition. Insurtech funding is no longer a venture capital bet on disruption. It is the insurance industry’s own investment in operational AI — led by incumbents, focused on B2B infrastructure, and increasingly backed by profitability rather than just promise.

Whether that investment translates into genuinely better insurance — not just cheaper operations — depends on how the industry answers Johnston’s question. The money is flowing. The efficiency gains are materialising. What the sector does with them will determine whether this comeback is a lasting structural shift or just the next chapter of doing the same things with fewer people.

March 2, 2026
Accenture Will Track AI Logins for Promotions. The Risk Is Measuring Compliance, Not Competence.

Accenture has begun tracking how often senior employees log into its AI tools — and will factor that usage into promotion decisions. An internal email, reported by the Financial Times, put it plainly: “Use of our key tools will be a visible input to talent discussions.” For a firm that sells AI transformation to the world’s largest organisations, the message to its own workforce is unmistakable: adopt or stall.

The policy is not without logic. Accenture has invested heavily in AI readiness — 550,000 employees trained in generative AI, up from just 30 in 2022, backed by $1 billion in annual learning and development spending. But training people and getting them to change how they work are two very different problems. The promotion-linked tracking is an attempt to close that gap by force.

The credibility problem

This move arrives at a moment when Accenture’s external positioning makes internal adoption a matter of commercial credibility. As maddaisy.com reported last week, Accenture is one of four firms named in OpenAI’s Frontier Alliance — tasked with building the data architecture, cloud infrastructure, and systems integration work needed to deploy AI agents at enterprise scale. It is difficult to sell that capability convincingly if your own senior managers are not using the tools.

The policy applies specifically to senior managers and associate directors, with leadership roles now requiring what Accenture calls “regular adoption” of AI. The firm is tracking weekly logins to its AI platforms for certain senior staff, though employees in 12 European countries and those on US federal contracts are excluded — a pragmatic nod to varying data protection regimes and security requirements.

The resistance is the interesting part

What makes this story more than a policy announcement is the reaction. Some senior employees have questioned the value of the tools outright, with one describing them as “broken slop generators.” Another told the Financial Times they would “quit immediately” if the tracking applied to them.

That resistance is worth taking seriously, not dismissing. It maps directly onto a pattern maddaisy.com has been tracking. Research published earlier this month found that only 34% of employees say their organisation has communicated AI’s workplace impact “very clearly” — a figure that drops to 12% among non-senior staff. When people do not understand why they are being asked to use a tool, mandating its use tends to produce compliance rather than competence.

Harvard Business Review research, cited in that same analysis, identified three psychological needs that determine whether employees embrace or resist AI: competence (feeling effective), autonomy (feeling in control), and relatedness (maintaining meaningful connections with colleagues). A policy that monitors logins and ties them to career progression addresses none of these. It measures activity. It says nothing about whether that activity is useful.

Logins are not outcomes

This is the core tension. Accenture’s leadership knows that senior adoption is a bottleneck — industry observers note that older managers are often “less comfortable with technology and more wedded to established working methods.” CEO Julie Sweet has framed AI adoption as existential, telling analysts that the company is “exiting employees” in areas where reskilling is not possible. The 11,000 layoffs announced in September reinforced the point.

But tracking logins conflates presence with productivity. A senior manager who logs in weekly to check a dashboard is counted the same as one who has genuinely integrated AI into client delivery. The metric captures the floor, not the ceiling.

This echoes a broader concern maddaisy.com has documented. UC Berkeley researchers found that employees using AI tools worked faster but not necessarily better — absorbing more tasks, blurring work-life boundaries, and entering a cycle of acceleration that resembled productivity but often was not. If Accenture’s policy drives more tool usage without more thoughtful tool usage, it risks producing exactly this outcome at scale.

What this tells the rest of the industry

Accenture is not alone in struggling with senior AI adoption. The challenge is structural across professional services. Deloitte’s 2026 CSO Survey found that while 95% of chief strategy officers expect AI to reshape their priorities, only 28% co-lead their organisation’s AI decisions. The people with the authority to mandate change are often the furthest from understanding it.

Accenture’s approach is at least direct. Rather than hoping adoption trickles up from junior staff — who typically adopt new tools faster — it is applying pressure from the top. And the numbers suggest some urgency: with 750,000 employees and $70 billion in revenue, Accenture has grown enormously from its 275,000-person, $29 billion base in 2013. Maintaining that trajectory while its competitors embed AI into delivery models requires its own workforce to be fluent, not just trained.

The risk, though, is that the policy optimises for the wrong signal. Organisations that have navigated AI adoption most effectively — and maddaisy.com has covered several — tend to share a common trait: they measure what AI enables people to do differently, not how often people open the application. Accenture’s policy would be considerably more compelling if it tracked client outcomes improved through AI-assisted work, or time freed for higher-value tasks, rather than weekly platform logins.

The precedent matters more than the policy

Whatever one makes of the specifics, Accenture has done something that most large organisations have avoided: it has made AI adoption an explicit, measurable condition of career advancement for senior leaders. That is a significant signal. It tells clients that Accenture is serious about practising what it sells. It tells employees that AI fluency is no longer optional at the leadership level.

Whether it works depends on what happens next. If the tracking evolves toward measuring genuine integration — how AI changes the quality of work, not just the frequency of logins — Accenture could set a useful template for the industry. If it remains a blunt instrument that rewards compliance over competence, it will likely produce exactly the kind of performative adoption that gives AI transformation programmes a bad name.

For consultants and enterprise leaders watching from outside, the lesson is practical: mandating AI adoption is easy; mandating it well is the hard part. The metric you choose to track will shape the behaviour you get.

March 1, 2026
PwC Built an AI That Can Actually Read Enterprise Spreadsheets. Here Is Why That Matters.

Most enterprise AI demonstrations involve chatbots, code generation, or image synthesis — capabilities that are impressive but often disconnected from the workflows where organisations actually make decisions. PwC has taken a different approach. On 19 February, the firm announced a frontier AI agent that can reliably reason across complex, multi-sheet enterprise spreadsheets — the kind of messy, formula-dense workbooks that underpin deals, risk assessments, and financial modelling across virtually every large organisation.

The announcement would be easy to dismiss as incremental. It is, in fact, one of the more practically significant AI developments of the year so far.

The Spreadsheet Problem No One Talks About

AI has made rapid progress with text, images, and code. But enterprise spreadsheets have remained stubbornly resistant. The reason is structural: a typical enterprise workbook is not a neatly formatted data table. It is a sprawling, multi-sheet artefact containing hundreds of thousands of rows, cross-sheet formulas, hidden dependencies, embedded charts, and formatting inconsistencies accumulated over years of manual editing by multiple authors.

Conventional AI systems — including the most advanced large language models — struggle with this complexity. They can process a clean CSV file or answer questions about a simple table. But ask them to trace a formula chain across five sheets in a workbook with 200,000 rows and inconsistent column headers, and accuracy collapses. For regulated industries where precision is non-negotiable — auditing, tax, financial due diligence — this limitation has kept spreadsheet analysis firmly in the domain of human practitioners.

PwC’s agent addresses this directly. Combining multimodal pattern recognition with a retrieval-augmented architecture, the system can process up to 30 workbooks containing nearly four million cells. In internal benchmarks, it achieved roughly three times the accuracy of previously published methods while using 50% fewer computational tokens — a meaningful efficiency gain that reduces both cost and energy consumption.

How It Works, Without the Hype

The technical approach mirrors how experienced analysts actually work. Rather than attempting to ingest an entire workbook at once — a strategy that overwhelms even million-token context windows — the agent scans, indexes, and selectively retrieves relevant sections. It can jump across tabs, trace logic through formula chains, integrate visual elements like charts, and explain its reasoning with what PwC describes as “defensible precision.”

Two internal use cases illustrate the practical impact. In engagement documentation, PwC teams work with large, nominally standardised workbooks that document business processes and controls. In practice, these files vary significantly — column names shift, fields appear in different orders, structures change between engagements. The agent handles this in two stages: first mapping the workbook’s structure, then extracting specific details using targeted retrieval rather than brute-force ingestion.

In risk assessment, the agent replaces what was previously weeks of custom development work. Each new set of files could break existing programmatic approaches due to formatting variations. The agent indexes and extracts directly, regardless of these inconsistencies. PwC reports that what previously required weeks of configuration can now be completed in hours.

The ROI Connection

The timing of this announcement is worth noting. Earlier this month, maddaisy examined PwC’s own 2026 Global CEO Survey, which found that 56% of chief executives could not point to measurable revenue gains from their AI investments. Only 12% reported achieving both revenue growth and cost reduction from AI programmes.

The spreadsheet agent is, in a sense, PwC’s answer to its own data. Rather than pursuing the kind of ambitious, organisation-wide AI transformation that the survey suggests most companies are failing at, this tool targets a specific, bounded problem: making AI useful where decisions actually get made. Spreadsheets are unglamorous, but they remain the substrate of enterprise decision-making across every industry. If AI cannot work reliably with them, the ROI gap that PwC’s own research documented will persist.

Matt Wood, PwC’s Commercial Technology and Innovation Officer, was notably direct about the origin: “This didn’t start as a research project. It started because our teams were spending weeks manually tracing logic through workbooks that no existing tool could handle.”

A Broader Pattern: Consulting Firms as Technology Builders

This development fits a pattern that maddaisy has been tracking across the consulting industry. Firms are not merely advising clients on AI — they are building proprietary capabilities that change the economics of their own delivery. McKinsey’s 25,000 AI agents. Accenture’s ongoing automation of delivery operations. Now PwC, with a tool that converts weeks of manual work into hours.

The competitive implications are significant. A firm that can process complex financial workbooks in hours rather than weeks can bid more aggressively on engagements, take on more work with the same headcount, and offer the outcome-based pricing models that clients increasingly prefer. The spreadsheet agent is not just a productivity tool — it is a structural advantage in the shifting economics of professional services.

What Practitioners Should Watch

For consultants and enterprise leaders, the PwC announcement carries a practical message: the AI value gap may start closing not through headline-grabbing deployments, but through targeted tools that tackle specific bottlenecks in existing workflows.

The broader FP&A landscape is moving in the same direction. IBM’s 2026 analysis of financial planning trends highlights that 69% of CFOs now consider AI integral to their finance transformation strategy, with the primary applications centring on data ingestion, budget analysis, and narrative generation — precisely the kind of spreadsheet-adjacent work that PwC’s agent addresses.

The question is no longer whether AI can handle enterprise data complexity. It is whether organisations will deploy these capabilities against the right problems — the mundane, time-intensive, precision-critical workflows where the return on investment is most measurable and most immediate.

PwC appears to have started there. Given the firm’s own data on the AI ROI crisis, that is arguably the most credible place to begin.

February 27, 2026
America’s Fragmented State-by-State AI Regulation Is Creating a Compliance Labyrinth

When maddaisy examined the shift from AI principles to penalties earlier this month, the United States’ growing patchwork of state-level AI laws featured as one of three converging forces. That brief mention understated the scale of what is now unfolding. With more than 1,000 AI-related bills introduced across all 50 states in 2025 alone, the US is rapidly building the most fragmented AI regulatory landscape of any major economy — and a federal government that wants to stop it but may lack the tools to do so.

The federal-state standoff

The tension is now explicit. In December 2025, the White House released a framework titled “Ensuring a National Policy Framework for Artificial Intelligence”, arguing that diverging state AI rules risk creating a harmful compliance patchwork that undermines American competitiveness. An accompanying fact sheet went further, suggesting the administration may consider litigation and funding mechanisms to counter state AI regimes it considers obstructive.

This follows Executive Order 14179, issued in January 2025, which rescinded the Biden administration’s 2023 AI order and established a competitiveness-first federal posture. The July 2025 AI Action Plan outlined more than 90 measures to support AI infrastructure and innovation. The message from Washington is consistent: the federal government wants to set the rules, and it wants states to fall in line.

The states, however, are not listening.

Four approaches, one country

What makes the US situation distinctive is not simply that states are regulating AI — it is that they are doing so in fundamentally different ways. Four models are emerging, each with different implications for organisations operating across state lines.

Colorado: the comprehensive framework. Colorado’s AI Act (SB 24-205) remains the most ambitious state-level attempt at broad AI governance. It requires deployers of high-risk AI systems to conduct impact assessments, provide consumer transparency, and exercise reasonable care against algorithmic discrimination. Although enforcement was delayed to June 2026 through SB25B-004, the statutory framework is intact. Legal analysts note that the delay is strategic, not a retreat — Colorado is buying time to refine its approach while preserving regulatory leverage.

California: regulation through existing powers. Rather than passing a single comprehensive AI act, California is weaving AI governance into its existing regulatory apparatus. The California Privacy Protection Agency finalised new regulations in September 2025 covering cybersecurity audits, risk assessments, and automated decision-making technology (ADMT), with phased effective dates running through January 2027. The ADMT obligations are particularly significant: they convert what many organisations treat as aspirational governance practices — impact assessments, decision transparency, audit trails — into enforceable system design requirements.

Washington: the multi-bill expansion. Washington has taken a fragmented-by-design approach, advancing separate bills for high-risk AI (HB 2157), transparency (HB 1168), training data regulation (HB 2503), and consumer protection in automated systems (SB 6284). The logic is pragmatic: AI risk does not fit into a single statutory box, so Washington is addressing it through multiple, targeted legislative vehicles. For compliance teams, this means tracking not one law but several, each with its own scope and requirements.

Texas: government-first regulation. The Responsible AI Governance Act (HB 149), effective since January 2026, governs AI use within state government, prohibits certain high-risk applications, and establishes an advisory council. It is narrower than Colorado or California’s approaches but signals that even politically conservative states see a role for AI regulation — just a different one.

The preemption problem

The White House clearly wants federal preemption — a single national framework that would override state laws. But there are reasons to doubt this will materialise quickly, if at all.

First, there is no comprehensive federal AI legislation on the table. The administration’s tools are executive orders and agency guidance, which carry less legal weight than statute and are vulnerable to court challenges. Second, the pattern from privacy law is instructive: the US still lacks a federal privacy law, and state laws like the California Consumer Privacy Act have effectively set national standards by default. AI regulation appears to be following the same trajectory.

Third, states have institutional momentum. According to the National Conference of State Legislatures, all 50 states, Washington DC, and US territories introduced AI legislation in 2025. That level of legislative activity does not simply stop because the federal government asks it to. As one legal analysis from The Beckage Firm observes, states are not ignoring the federal signal — they are interpreting it differently, rolling out requirements more slowly or in smaller pieces, but maintaining control rather than ceding it.

What this means for organisations

For any company deploying AI across multiple US states — which is to say, most enterprises of any scale — the practical implications are significant and immediate.

Multi-jurisdictional compliance is now unavoidable. Organisations cannot build a single AI governance programme around one state’s requirements and assume it covers the rest. Colorado’s impact assessment obligations differ from California’s ADMT rules, which differ from New York City’s bias audit requirements for automated employment tools. Each demands different documentation, different processes, and in some cases, different technical controls.

Effective dates are stacking up. Colorado’s enforcement begins June 2026. California’s ADMT obligations take full effect January 2027. New York City’s AEDT rules are already being actively enforced. Texas’s government-focused requirements are live now. Organisations treating these as isolated events rather than overlapping compliance waves risk being caught unprepared.

The privacy playbook applies. Companies that built adaptable privacy programmes — mapping data flows, documenting processing purposes, conducting impact assessments — when GDPR and the CCPA emerged are better positioned than those that treated each regulation as a separate exercise. The same principle holds for AI governance. The organisations that will manage this landscape most effectively are those building flexible frameworks capable of mapping risk assessments, accountability structures, and documentation across multiple jurisdictions simultaneously.

The consulting opportunity — and obligation

For consultants and practitioners, the US regulatory fragmentation creates both demand and responsibility. Demand, because multi-state AI compliance is genuinely complex and most organisations lack in-house expertise to navigate it. Responsibility, because the temptation to oversimplify — to sell a single “AI compliance solution” that claims to cover all jurisdictions — is real and should be resisted. The details matter, and they vary state by state.

As maddaisy has noted in covering shadow AI governance challenges and the agentic AI governance gap, the operational machinery for AI oversight is still immature in most enterprises. Adding a fragmented regulatory landscape on top of that immaturity does not just increase cost — it increases the likelihood that organisations will get something materially wrong.

The US may eventually get a federal AI framework. But the states are not waiting, and neither should the organisations that operate in them. The compliance labyrinth is already being built, one statehouse at a time.

February 26, 2026
OpenAI’s Frontier Alliance Confirms What Consultants Already Knew: AI Vendors Cannot Scale Alone

OpenAI announced on 23 February that it has formed multi-year “Frontier Alliances” with McKinsey, Boston Consulting Group, Accenture, and Capgemini. The four firms will help sell, implement, and scale OpenAI’s Frontier platform — an enterprise system for building, deploying, and governing AI agents across an organisation’s technology stack.

For readers who have been following maddaisy’s coverage of the consulting industry’s AI pivot, this is not a surprise. It is the logical next step in a pattern that has been building for months — and it tells us more about the limits of AI vendors than about the ambitions of consulting firms.

The vendor cannot scale alone

The most revealing line in the announcement came from Capgemini’s chief strategy officer, Fernando Alvarez: “If it was a walk in the park, OpenAI would have done it by themselves, so it’s recognition that it takes a village.”

That candour is worth pausing on. OpenAI’s enterprise business accounts for roughly 40% of revenue, with expectations of reaching 50% by the end of the year. The company has already signed enterprise deals with Snowflake and ServiceNow this year and appointed Barret Zoph to lead enterprise sales. Yet it still needs consulting firms — with their existing client relationships, implementation expertise, and organisational change capabilities — to get its technology into production at scale.

This is not a story about OpenAI’s generosity in sharing the enterprise market. It is an admission that the gap between a capable AI platform and a working enterprise deployment remains stubbornly wide. As maddaisy reported last week, PwC’s 2026 CEO Survey found that 56% of chief executives still cannot point to measurable revenue gains from their AI investments. The technology is not the bottleneck. Integration, governance, and organisational readiness are.

A clear division of labour

The alliance structure reveals how OpenAI sees the enterprise AI value chain. McKinsey and BCG are positioned as strategy and operating model partners — helping leadership teams determine where agents should be deployed and how workflows need to be redesigned. BCG CEO Christoph Schweizer noted that AI must be “linked to strategy, built into redesigned processes, and adopted at scale with aligned incentives.”

Accenture and Capgemini take the systems integration role: data architecture, cloud infrastructure, security, and the unglamorous work of connecting Frontier to the CRM platforms, HR systems, and internal tools that enterprises actually run on. Each firm is building dedicated practice groups and certifying teams on OpenAI technology. OpenAI’s own forward-deployed engineers will sit alongside them in client engagements.

This two-tier model — strategy at the top, integration at the bottom — maps neatly onto the consulting industry’s existing hierarchy. It also creates a clear dependency: OpenAI provides the platform, the consultancies provide the last mile.

The maddaisy continuity thread

This announcement intersects with several stories maddaisy has been tracking. When we examined McKinsey’s 25,000 AI agent deployment, the question was whether the firm’s aggressive internal build-out was a first-mover advantage or an expensive experiment. The Frontier Alliance suggests McKinsey is now positioning that internal capability as a credential — evidence that it can deploy agentic AI at scale, which it can now offer to clients through the OpenAI partnership.

Similarly, when maddaisy covered the shift from billable hours to outcome-based consulting, the question was how firms would make the economics work. Vendor alliances like this provide part of the answer: the consulting firm brings the implementation expertise, the AI vendor provides the platform, and the client pays for outcomes rather than hours. The risk is shared across the chain.

And Capgemini’s dual bet — adding 82,300 offshore workers while simultaneously investing in AI — now makes more strategic sense. The offshore delivery capacity is precisely what is needed to operationalise Frontier at enterprise scale. The bodies and the bots are not competing; they are complementary.

The SaaS vendors should be nervous

As Fortune noted, the Frontier Alliance creates a specific tension for established software-as-a-service vendors. Salesforce, Microsoft, Workday, and ServiceNow all depend on these same consulting firms to market and deploy their products. Now those consultants will also be actively promoting an alternative platform — one that positions itself as a “semantic layer” sitting above the traditional SaaS stack.

The consulting firms are not choosing sides. They are hedging. Accenture, for instance, signed a multi-year partnership with Anthropic in December 2025 and is now a Frontier Alliance member. The firms will sell whichever platform best fits a given client’s needs, which gives them leverage over the AI vendors rather than the other way around.

For the SaaS incumbents, however, having McKinsey and BCG actively evangelise an AI-native alternative to C-suite buyers is a development they will not welcome. Investor anxiety in this space is already elevated — shares of several enterprise software companies have been punished over concerns that customers will choose AI-native platforms over traditional offerings.

What to watch

The Frontier Alliance is a partnership announcement, not a set of outcomes. The real test is whether this model — AI vendor plus consulting firm — can close the deployment gap that has kept enterprise AI adoption stubbornly below expectations.

Three things matter from here. First, whether the certified practice groups produce measurably better outcomes than the piecemeal implementations enterprises have been attempting on their own. Second, whether Frontier’s “semantic layer” architecture genuinely simplifies agent deployment or simply adds another platform layer to an already complex stack. And third, whether the consulting firms’ simultaneous alliances with competing AI vendors — OpenAI, Anthropic, Google — create genuine client value or just a more complicated sales cycle.

For practitioners, the immediate signal is clear: the enterprise AI market is consolidating around a vendor-plus-integrator model. If your organisation is planning an agentic AI deployment, the question is no longer which model to use. It is which combination of platform, integrator, and operating model redesign will actually get agents into production — and keep them there.

February 25, 2026
Agentic AI Drift: The Silent Production Risk No One Is Measuring

When maddaisy examined the agentic AI governance gap last week, the focus was on a structural mismatch: three-quarters of enterprises planning to deploy agentic AI, but only one in five with a mature governance model. That gap remains wide. But a more specific — and arguably more dangerous — operational risk is now coming into focus: agentic AI systems do not fail suddenly. They drift.

A recent analysis published by CIO makes the case plainly. Unlike earlier generations of AI, which tend to produce identifiable errors — a wrong classification, a hallucinated fact — agentic systems degrade gradually. Their behaviour evolves incrementally as models are updated, prompts are refined, tools are added, and execution paths adapt to real-world conditions. For long stretches, everything appears fine. KPIs hold. No alarms fire. But underneath, the system’s risk posture has already shifted.

The Problem with Demo-Driven Confidence

Most organisations still evaluate agentic AI the way they evaluate any software feature: through demonstrations, curated test scenarios, and human judgment of output quality. In controlled settings, this looks adequate. Prompts are fresh, tools are stable, edge cases are avoided, and execution paths are short and predictable.

Production is different. Prompts evolve. Dependencies fail intermittently. Execution depth varies. New behaviours emerge over time. Research from Stanford and Harvard has examined why many agentic systems perform convincingly in demonstrations but struggle under sustained real-world use — a gap that grows wider the longer a system runs.

The result is a pattern that will be familiar to anyone who has managed complex software in production: a system passes all its review gates, earns early trust, and then becomes brittle or inconsistent months later, without any single change that clearly broke it. The difference with agentic AI is that the degradation is harder to detect, because the system’s outputs can still look reasonable even as the reasoning behind them has shifted.

What Drift Actually Looks Like

The CIO analysis includes a telling case study from a credit adjudication pilot. An agent designed to support high-risk lending decisions initially ran an income verification step consistently before producing recommendations. Over time, a series of small, individually reasonable changes — prompt adjustments for efficiency, a new tool for an edge case, a model upgrade, tweaked retry logic — caused the verification step to be skipped in 20 to 30 per cent of cases.

No single run produced an obviously wrong result. Reviewers often agreed with the recommendations. But the way the agent arrived at those recommendations had fundamentally changed. In a credit context, that difference carries real financial and regulatory consequences.

This is the nature of agentic drift: it is not a bug. It is the predictable outcome of complex, adaptive systems operating in changing environments. Two executions of the same agent with the same inputs can legitimately differ — that stochasticity is inherent to how modern agentic systems work. But it also means that point-in-time evaluation, one-off tests, and spot checks are structurally insufficient for production risk management.

From Policy to Diagnostics

When maddaisy covered the shadow AI governance challenge earlier this month, one theme was clear: governance frameworks are necessary but not sufficient. They define ownership, policies, escalation paths, and controls. What they often lack is an operational mechanism to answer a deceptively simple question: has the agent’s behaviour actually changed?

Without that evidence, governance operates in the dark. Policy defines what should happen. Diagnostics establish what is actually happening. When measurement is absent, controls develop blind spots in precisely the live systems where agentic risk tends to accumulate.

The Cloud Security Alliance has begun framing this as “cognitive degradation” — a systemic risk that emerges gradually rather than through sudden failure. Carnegie Mellon’s Software Engineering Institute has similarly emphasised the need for continuous testing and evaluation discipline in complex AI-enabled systems, drawing parallels to how other high-risk software domains manage operational risk.

What Practitioners Should Watch For

The emerging consensus points toward several operational principles for managing agentic drift:

Behavioural baselines over output checks. No single execution is representative. What matters is how behaviour shows up across repeated runs under similar conditions. Organisations need to establish baselines — not for what an agent should do in the abstract, but for how it has actually behaved under known conditions — and then monitor for sustained deviations.

Separate configuration changes from behavioural evidence. Prompt updates, tool additions, and model upgrades are important signals, but they are not evidence of drift on their own. What matters is persistence: transient deviations are often noise in stochastic systems, while sustained behavioural shifts across time and conditions are where risk begins to emerge.

Treat agent behaviour as an operational signal. Internal audit teams are asking new questions about control and traceability. Regulators are paying closer attention to AI system behaviour. Platform teams are under growing pressure to demonstrate stability in live environments. “It looked fine in testing” is no longer a defensible operational posture, particularly in sectors — financial services, healthcare, compliance — where subtle behavioural changes carry real consequences.

The Observability Gap

This is, ultimately, the next chapter in the governance story maddaisy has been tracking. The first chapter — covered in the enforcement era analysis — was about moving from principles to rules. The second, examined through Deloitte’s enterprise data, was the gap between strategic confidence and operational readiness. This third chapter is more specific and more technical: the gap between having governance frameworks and having the observability infrastructure to make them work.

The goal is not to eliminate drift. Drift is inevitable in adaptive systems. The goal is to detect it early — while it is still measurable, explainable, and correctable — rather than discovering it through incidents, audits, or post-mortems. Organisations that build this capability will be better positioned to deploy agentic AI at scale with confidence. Those that do not will continue to be surprised by systems that appeared stable, until they were not.

For consultants advising on enterprise AI deployments, the implication is practical: governance reviews that stop at policy documentation are incomplete. The question to ask is not just whether a client has an AI governance framework, but whether they can tell you how their agents are behaving today compared to three months ago. If the answer is silence, that is where the work begins.

February 24, 2026