What does agent-native mean?

Agent-native describes a system built so an AI agent can read state, decide, call a tool, and act across multiple steps on a single unified data layer, instead of a human reviewing each answer. It is a sharper test than AI-native because the agent acts on enterprise data directly, with no reviewer in the loop. As IBM puts it, calling a tool by itself does not make an LLM an agent: the agent has to decide which tool and when.

Why do AI agents fail on data instead of the model?

Chatbots buffered bad data behind a human reviewer on every turn. Agents remove that reviewer, so a stale or conflicting record becomes a decision, and errors compound at every step in the chain. The blocker is almost always the data layer the agent acts on, not the intelligence of the model. Fix the foundation, not the frontier model.

How does an enterprise get its data agent-ready in 90 days?

Do not try to unify everything. Pick one high-frequency decision, map every system feeding it, name the metric it moves, measure decision latency as a baseline, unify the source of truth for that one workflow, then put an agent on it read-only first before giving it approval-gated authority. One decision, one clean data layer, then the agent.

Enterprise AI agents aren't failing on the model. They're failing on the data underneath them.

The shift: agents moved from pilot to production faster than the data did

Something changed in the last year, and most operators felt it before they could name it. The thing on the other side of the chat box stopped just answering and started doing.

That’s the line IBM draws between an assistant and an agent. An assistant is reactive. It acts on every prompt, recommends rather than acts, and forgets the moment the window closes. An agent is proactive. It runs autonomously once it is kicked off, designs its own workflow, and decides which tool to call and when. IBM’s own framing is blunt about where people get this wrong: “the ability to call on tools by itself does not make an LLM an agent.” The shift is autonomy over multiple steps, not a better answer to a single question.

And the market moved on it fast. An IDC study commissioned by AWS, covering more than 900 organizations across 15 industries, found that 50% of organizations already have 10 or more agents in production in 2025, and 65% expect to reach full deployment of agentic AI within two years. Deloitte’s “From Ambition to Activation,” published 21 January 2026 and built on a survey of 3,235 leaders across 24 countries, puts close to three-quarters of companies on a path to deploy agentic AI within two years.

So the adoption curve is steep. The readiness curve is not. The same Deloitte release found only 21% of those companies report a mature model for agent governance. The same AWS study found just 3% of organizations are scaling agentic AI across departments, and less than 7% are in full production with even one use case. Trade coverage of that report put the inverse plainly: 97% have yet to figure out how to scale agents across their organization.

Enterprises are not behind on buying agents. They are behind on the layer those agents stand on.

Takeaway: agents went mainstream in months, but the data foundation they depend on did not. The gap between deploying an agent and getting value from one is now the whole game.

The proof: the value gap is widening, and data readiness is where it splits

Adoption without value is the dominant story in the data right now. The numbers are consistent across independent studies, and they all point at the same fracture line.

Start with the value gap itself. BCG’s “The Widening AI Value Gap,” from its Build for the Future 2025 study of more than 1,250 firms, found only 5% of companies are achieving AI value at scale, while 60% are not achieving material value at all despite substantial investment. The segment that’s scaling and seeing returns rose from 22% in 2024 to 35% in 2025, a 13-point jump. The middle is thinning. Companies are sorting into winners and everyone else.

Now look at where the split comes from. The pattern is data, not models.

Data readiness is rare. Only 7% of enterprises say their data is completely ready for AI, and more than a quarter (27%) say it is not very or not at all ready (Cloudera and Harvard Business Review Analytic Services, “Taming the Complexity of AI Data Readiness,” 5 March 2026). A separate benchmark put the fully-ready figure at 19% (AI Markets Group, “Enterprise AI 2026,” April 2026).
Access is the bottleneck, not the algorithm. Nearly 80% of enterprises say their AI and data initiatives are still constrained by limited data access across environments, and fewer than one in five (18%) say their data is fully governed (Cloudera, “The Data Readiness Index,” 14 April 2026).
Confidence outruns reality. 88% of senior data and analytics leaders report confidence in their data readiness, yet 43% name data readiness as the single biggest barrier to aligning AI with business objectives (Precisely and Drexel University’s LeBow College, “2026 State of Data Integrity and AI Readiness,” 21 January 2026).
Pilots stall before production. Two-thirds (67%) of data leaders say they have not moved even half of their generative AI pilots into production, with lack of trust in data quality cited by 38% as a reason value stays out of reach (Informatica, “CDO Insights 2025,” 28 January 2025).
No ROI at scale. 79% of organizations report no measurable EBIT impact from AI despite 87% adoption (AI Markets Group, “Enterprise AI 2026”). In Australia specifically, enterprises spend an average of $28M a year on AI, 72% report no measurable ROI, and only 24% have AI-ready data architectures (ADAPT Research, “State of Data & AI in Australia 2025,” 3 September 2025).

Then look at who wins, and notice they aren’t running smarter models. They built the foundation first.

Walmart cut time to value by 90% and costs by $5.6M annually after standardising on a unified data platform (Databricks, “Data Intelligence in Action,” 9 July 2025).
AT&T reduced fraud by up to 80% with more than 100 fraud-detection models in production on a single platform that ingests structured and unstructured data (Databricks customer story, AT&T).
Virgin Australia posted a 44% reduction in mishandled bags, 90% faster model deployment, and a 75% increase in near real-time data availability after consolidating onto one platform (Databricks customer story, Virgin Australia).
Sinyi Realty, a property firm, drove a 20% increase in property closing rates off a single source of truth feeding its recommendation system, live since May 2022 (Databricks customer story, Sinyi Realty).

The mechanism shows up in the research too. BCG and Google’s “Any Company Can Become a Resilient Data Champion” found data champions grew revenue more than 10% at twice the rate of laggards, a 2.5x gap, and 70% of all companies said data challenges were severe enough to threaten their AI use cases. Stanford’s Digital Economy Lab studied 51 successful deployments and found strategic scalers were far more likely to possess a large, accurate dataset (61% versus 38% for companies stuck in proof of concept). One telecom executive in that study put it the way every number above is trying to say: “All the hard work is in process documentation and data architecture.” Get those two things right, the executive added, and everything else is quite simple.

Takeaway: the studies disagree on the exact percentages and agree completely on the mechanism. The companies getting value built the data layer. The companies stuck did not.

The counter-intuitive part: agents don’t fail on intelligence, they fail on the data they’re pointed at

Here’s where most teams have the diagnosis backwards.

When an agent program underperforms, the instinct is to reach for a better model. Swap to the newest release. Add reasoning. Tune the prompt. The assumption underneath every one of those moves is that the blocker is intelligence.

It almost never is. The blocker is the data layer the agent is acting on. And the reason this stays hidden is that chatbots forgave bad data in a way agents won’t.

Think about what a chatbot actually did. It pulled a passage, summarised it, and handed the answer to a human who read it, judged it, and decided what to do next. A person stood between the data and the action. If the underlying record was stale or duplicated or contradicted another system, the human caught it, or at least owned the call. The data quality problem was real, but it was buffered by a reviewer on every single turn.

An agent removes that reviewer from the middle of the loop. As Galileo’s team put it, “AI agents often make decisions that directly impact users with limited opportunity for human verification before the interaction occurs,” and “inconsistent information forces agents to make dangerous assumptions during real-time operations, while multiple integrated data sources compound quality issues exponentially.” The agent reads the enterprise’s data, decides, calls a tool, reads the result, decides again, and acts. The bad record doesn’t get summarised for a human anymore. It becomes a decision.

The mechanism gets worse with every step. Anthropic’s own engineering guidance describes the autonomy problem directly: the LLM will potentially operate for many turns, demanding a real level of trust in its decision-making, and “during execution, it’s crucial for the agents to gain ‘ground truth’ from the environment at each step.” If the ground truth is wrong, every subsequent step is built on it. Anthropic’s team also documents “context rot”: as the token count in the window grows, the model’s ability to accurately recall information from that context decreases. Feed an agent fragmented, conflicting data and the inputs aren’t just bad. The one resource it needs to reason at all is being degraded.

Then there’s compounding. O’Reilly’s analysis of agentic failure lays out the arithmetic. At 98% per-agent reliability, a single agent is 98% accurate, three chained agents drop to about 94%, five to about 90%, and ten to roughly 82%, under the product rule for independent events. The author’s line is the one to remember: most multi-agent systems don’t fail because the models are bad, they fail because the agents get composed as if probability doesn’t compound. A November 2025 arXiv paper on agentic reliability frames the same risk as cascading failures across the chain (Xing and Lin, arXiv:2511.11921). Every handoff multiplies the chance that one bad data point poisons the whole run.

This is the whole reason agent-native is a sharper test than AI-native. A reactive assistant tolerates a messy stack because a human absorbs the error. An agent that acts autonomously, across many steps, on top of fragmented data, surfaces every flaw an enterprise used to be able to ignore. As one technology evangelist writing in TechRadar Pro put it about a lending example: if the financial data from scanned forms is outdated, the agent could approve a high-risk applicant. The model didn’t fail. The data did, and the agent simply executed on it.

So when an agent program stalls, resist the upgrade reflex. The model is probably fine. The foundation underneath it is what’s exposed.

Takeaway: chatbots hid the data debt behind a human reviewer. Agents act on that data directly and at every step, so a weak data layer that was survivable yesterday becomes a wrong decision today. Fix the foundation, not the frontier model.

What this means for ops leaders at a mid-market property, finance, or construction business

The studies are enterprise-wide. The decision is the operator’s, and it’s specific.

The biggest costs from a bad data layer are invisible, which is exactly why they don’t get budget. Stanford’s playbook found 77% of the hardest challenges in AI deployment were “invisible costs,” with change management and adoption (33%) and data quality and architecture (17%) the two largest categories. These never show up as a line item, so they never get funded, so they never get fixed. Meanwhile the deployed agent quietly makes worse calls than the team would have.

The fix: pick one decision the team makes every week and write down, today, exactly which systems hold the data behind it. If the answer is three spreadsheets, two SaaS tools, and someone’s inbox, that’s the real cost surfaced.

An operator’s confidence in their own data is probably wrong, and it’s costing the business. Recall the Precisely figure: 88% of leaders feel confident about data readiness while 43% name it as the top barrier to AI. The gap between feeling ready and being ready is where pilots die. For a property or finance business, that gap is the difference between an agent that prices, routes, or approves correctly and one that does it on a contradicted customer record.

The fix: run one read-only test. Point an agent at a single real decision and have it explain its reasoning without taking any action. If it cites stale or conflicting data, readiness has been measured for the price of an afternoon.

Spending on the model before the foundation is how an enterprise joins the 79% with no measurable return. The order of operations is the whole bet. Trinity Industries’ chief data officer said it directly to Databricks: “The data layer is the strategy. Not the model, not the agent, not the dashboard. The foundation.” After building that layer first, Trinity saw a 15% increase in on-time material delivery and a model 50% more accurate than the industry’s own estimates.

The fix: before approving another model or seat-based AI contract, ask one question. Will this run on a unified source of truth, or on the same fragmented stack that’s already producing no ROI? If it’s the second, the spend is dead on arrival.

How an enterprise gets its data agent-ready in 90 days

There’s no need to unify everything. The job is to unify the data behind one decision and put an agent on that. Here’s the sequence ASI builds with, and it’s deliberately narrow.

Audit the current state for one workflow, not the whole company. Pick a single high-frequency decision, then map every system that feeds it. Where does the data live, who owns it, how often is it wrong. This is the process-documentation work the Stanford telecom executive called the hard part. Most of the value is in finally seeing it written down.
Name the decision and the number it moves. Not “improve operations.” Something like “approve or decline a supplier invoice” or “price a listing.” Attach the metric that decision controls. Without a named number, there’s no way to measure whether an agent helped, and the enterprise ends up in the 72% of Australian businesses reporting no ROI.
Measure decision latency before any AI gets touched. How long does that decision take today, start to finish, including the time spent reconciling conflicting data. This is the baseline. Without it, “faster” is a feeling, not a result, and the lift never gets proven.
Unify the source of truth for that one decision. Build the clean, integrated, governed data layer behind that single workflow. No AI in this step. This is pure infrastructure, the warehouse or lakehouse work, and it’s where the winners spend. Databricks’ Virgin Australia case shows what a real foundation returns: 44% fewer mishandled bags, 75% more real-time data availability, 90% faster model deployment. The agents came after.
Only now put an agent on it, read-only first. Let it observe and recommend before it acts. Watch it on the clean data for real decisions. When its reasoning holds up, give it the authority to act with human approval on each move. Then measure latency again against the step-three baseline. That delta is the only proof that counts. Stanford found escalation models, where AI handles 80% or more autonomously and humans review exceptions, delivered 71% median productivity gains versus 30% for approval-only models, so the read-only-then-escalate path isn’t just safer, it pays more.

Five steps, one decision, 90 days. The point isn’t to boil the ocean. It’s to prove the foundation-first sequence on something small enough to ship and important enough to matter.

The window for buying the model and skipping the foundation is closing

The gap is widening on a clock. BCG’s scaling segment jumped from 22% to 35% in a single year while 60% of companies captured no material value at all. Every quarter that split compounds, because the companies on the right side of it are building a data advantage that gets harder to catch.

That’s the competitive stakes in plain terms. The advantage isn’t the agent. Any competitor can buy the same model. The advantage is the clean, unified foundation the agent runs on, and that takes time to build. Trinity’s chief data officer was candid that the migration took close to a year, and another six to eight months to shore everything up. The firms starting now will own a foundation the laggards can’t buy their way out of later. Inaction isn’t holding steady. It’s falling behind a moving line.

Here’s how ASI takes the risk out of getting started, and how every operator should think about it too. The enterprise approves every move the agent makes. It runs read-only first, observing and recommending before it has any authority to act. Every action sits behind a full audit log, so the business can see exactly what it did and why. And no AI spend happens until the data foundation is built, which means nobody is ever paying for a model to run on a stack that guarantees no return. The order of operations is the risk reduction. Data foundation, then read-only agent, then approved action, then measured lift.

The agents were never the problem. ASI builds the layer underneath them first.