This past week I’ve been peppered about the article on 95% failure rate of AI Projects in paper titled "The GenAI Divide STATE OF AI IN BUSINESS 2025” (not public yet, requires signup & approval) from an MIT team working on Project NANDA.
Side Note: NANDA is MIT’s framework for building distributed, capable agent systems, using standards like MCP and A2A — part of what they call the emerging ‘Agentic Web.’
MIT report: 95% of generative AI pilots at companies are failing
It’s definitely a headline grabber, I got a hold of the actual paper so lets dig into it
The actual statement in the paper is
Despite $30–40 billion in enterprise investment into GenAI, this report uncovers a surprising result in that 95% of organizations are getting zero return
Followed by
Tools like ChatGPT and Copilot are widely adopted. Over 80 percent of organizations have explored or piloted them, and nearly 40 percent report deployment. But these tools primarily enhance individual productivity, not P&L performance.
AI technology itself works well employees already see daily returns from "shadow AI" tools like ChatGPT, Claude, and Copilot. (The shadow part is where employees use the tooling without corporate accounts, contracts , BAA’s etc..).
The failures lie in enterprise-scale AI projects: ambitious pilots, unnecessarily complex systems, and poorly aligned tools that lack scalability and adaptability.
The actual story isn't about "95% failure"—it's that "95% of spending is being misdirected toward the wrong types of AI initiatives."
From the report and broader industry experience, three key issues emerge:
The MIT NANDA team is coining this as the GenAI Divide (at least that’s my reading of it)
Lastly experience, throughout the article, user feedback described internal solutions as lacking capability, e.g. memory, flexibility and adaptability fueling user rejections.
For many that drove them to adopt Shadow AI, like personal ChatGPT accounts. ChatGPT is a product built with memory, tooling, adaptable agents (search, deep research, images etc…) all of which tap into powerful GPT LLMs.
Hundreds of millions are spent on developing it, don’t try to build your own.
Trust me, I’ve built several of my own.
Most of the failing efforts are engineering or data science led, with little product rigor. That means:
A strong AI product manager would ask:
Without that discipline, companies build demos, not durable tools.
The report shows patterns in the 5% that succeed, al bet with a low N=300 / K =15
Front office AI can succeed, but only when designed with a realistic understanding of AI’s capabilities. The overlooked back office, however, often delivers faster and clearer ROI.
I have worked on far too many recovery or tiger team projects where we found AI means LLMs and product gaps were filled with naive chatbots and prompts. That’s not the way to do it.
For companies stuck on the wrong side of the GenAI Divide:
Remember you are in a time of change, you can’t have adoption without transformation.
Originally linked from https://www.linkedin.com/pulse/95-failure-patrick-o-leary-csd6e