ORLANDO, Fla. — As more companies build artificial intelligence systems, the quality of the data behind those systems is becoming a bigger business concern.
AI models increasingly rely on large amounts of text, images, audio, video, code and other forms of information. But for enterprise teams, collecting that data is only part of the challenge. The data also has to be cleaned, labeled, reviewed, secured and evaluated before it can be used in production systems.
Abaka.AI announced $8 million in funding to expand its enterprise data platform, which is designed to help companies prepare and evaluate multimodal AI data.
The company said the funding will support development of the Abaka AI Data Platform, including expanded production pipelines, more automation for complex supervision workflows and stronger model-in-the-loop evaluation. A portion of the funding will also support benchmark and interoperability work through the 2077AI Foundation, an open-source initiative co-founded by Abaka.AI.
Yunfei Zhao, co-founder and chief operating officer of Abaka.AI, said companies building AI systems need stronger data infrastructure as models become more complex.
“The Abaka AI Data Platform is built to give enterprises of any size access to production-grade datasets and evaluation pipelines,” Zhao said. “We want to support the best possible AI outcomes, and those are only as strong as the datasets behind them.”
The need for better data infrastructure has grown as companies move beyond basic AI tools and toward systems that can process multiple types of information at once. A business building a model for healthcare, finance, robotics or autonomous systems may need more specialized data workflows than a company working only with general text.
That can include different annotation standards, privacy controls, quality checks and review processes depending on how the AI system will be used.
Abaka.AI said its platform supports data across text, image, audio, video, code, 3D and other formats. The company said its quality-assurance process combines expert annotators, domain-specific training, consensus labeling, sampling audits and automated error detection.
For companies in regulated industries, the ability to track how data is handled can be important. Enterprise teams may need to know who reviewed a dataset, how labels were applied, whether errors were caught and whether the data can be used safely in a production system.
Abaka.AI said customers can use its platform in public cloud, on-premises or hybrid environments, with encrypted storage and role-based access control.
“We understand the bar for rigor and privacy is extremely high,” Zhao said. “This funding lets us expand our platform and the services around it while pushing the industry toward open, high-quality practical standards.”
The company said its near-term priorities include more model-assisted labeling, stronger on-premises and hybrid deployment options, and continued support for open benchmark and interoperability work.
The broader issue for AI companies is that model performance is increasingly tied to data quality. A powerful model can still fail if the data used to train, test or evaluate it is incomplete, inconsistent or poorly labeled.
That is why more AI infrastructure companies are focusing on the full data pipeline, from collection and annotation to quality review and evaluation.
With the new funding, Abaka.AI plans to expand its work with enterprises building conversational, reasoning and vision systems.
As AI systems become more complex, companies may need more reliable ways to turn messy multimodal inputs into production-ready data pipelines. The company is betting that enterprise data infrastructure will become a larger part of how businesses build and evaluate AI.
©2026 Cox Media Group








