2025 Enterprise Data & AI Trends: Agents, Platforms, and Moonshots
- Rifx.Online
- Data Science , Artificial Intelligence , Technology
- 27 Dec, 2024
Making predictions, especially in a rapidly evolving field like data and AI, is notoriously difficult. Nevertheless, we, Rajesh Parikh and Sanjeev Mohan published our 2024 trend forecasts last year. With 2024 now behind us, we’re delighted to confirm the resounding accuracy of our predictions. This success is even more notable considering the unprecedented pace of AI’s development, a rate of change rarely seen in the IT sector.
We highlighted the rise of Intelligent Data Platforms and AI agents among our top four predictions. While these trends were less obvious in 2023, the momentum behind AI agents is now undeniable suggesting further acceleration. The mainstreaming of AI and AI agents continues unabated.
On the data platform front, we observed a strong shift toward intelligent, unified platforms, driven by the need to simplify user experiences and accelerate data and AI product development. This trend is expected to intensify as more vendors enter the market, expanding the choices available to enterprises.
What to expect in 2025
As we approach 2025, the landscape of enterprise data and AI is set to undergo significant transformations, reshaping industries and redefining human interactions with technology. Instead of calling it predictions, we would like to use this document to explore these transformative trends that we believe require enterprise executives and technology managers to pay close attention to. Readers therefore, should use it as a guide to identify priorities and prepare their organizations to pick the right bets.
Without further ado, let’s jump into the trends, we believe will probably dominate enterprise data and AI landscape. Figure 1 illustrates the trends classified as Applied AI, Data & Ops, and Moonshots.
- Applied AI: These trends will significantly influence how enterprises leverage AI models for transformation, particularly in how agents automate routine tasks and functions. With continued advancements in model reasoning, these agents will evolve to handle increasingly complex tasks and collaborate seamlessly.
- Data and Platform Trends: A converged data and metadata plane supporting both structured and unstructured data will drive AI and serve as the foundation for agents and AI applications. Several key trends are converging to support this vision, including advancements in data platform management and the development of robust middleware for agentic applications.
- Moonshots: These ambitious, high-risk endeavors push the boundaries of current technology, exploring areas that may seem cutting-edge today. While carrying a significant risk of failure, breakthroughs in these areas have the potential to revolutionize industries and redefine human-computer interaction.
Applied AI
The 2025 Applied AI trends center around the practical application and mainstream adoption of agents. As illustrated in Figure 2, we have identified four key sub-topics poised to drive the most significant impact in this category.
Let’s next review each of the AI/AI application trends.
Agents All the Way
In 2025, we enter the era of agentic AI.
Excerpts on AI agents and our recommendation to enterprise in last year’s trends as below. For readers interested in looking at the details, go to [1]
AI Information agents is a trend we believe will likely play out over multiple years; however, given their promise, we expect 2024 to be the year where significant progress will be made both in terms of agent infrastructure/tooling as well as early adoption. It is appropriate to point out that a lot of how we understand the potential of current AI architecture to take on more complex tasks is still largely about potential, and there are quite a few unresolved issues.
Despite this, enterprises must aim for a practical approach to building agent applications and at some point expect the gaps with current AI technology to take on more and more complex automation that will likely shrink with every passing year. It must also account for the degree of automation possible in the next 12 months use-case by use-case. An evolutionary path/journey to such projects is likely to yield far better success with such endeavors.
The adoption of intelligent autonomous AI agents is poised to accelerate in enterprises during 2025, driven by the growing demand for automation of repetitive tasks and enhanced customer experiences. These agents will augment human capabilities, allowing us to focus on creative, strategic, and complex work.
They extend automation to tasks requiring high-level thinking, reasoning, and problem-solving — tasks that currently necessitate significant human involvement. For example, agents can perform market research, analyze data, or answer customer support queries. They can also automate complex, multi-step workflows previously considered impractical due to complexity, cost, or both.
A comprehensive explanation, definition, and classification of AI agents is provided in [2].
An AI Agent is a program or a system that can perceive its environment, reason, break down a given task into a set of steps, make decisions, and take actions to achieve those specific tasks autonomously, just like a human worker would do.
We are currently witnessing the emergence of AI-powered tools such as developer copilots, available for approximately $20 per month, and early-stage agents like Devin, priced at $500 (still representing a Level 2 automation solution). A Level 2 AI agent, would be an agent that can perform some tasks autonomously, but still requires significant human oversight and intervention.
However, in 2025, we anticipate the arrival of significantly more advanced agents with correspondingly higher price points reflecting the value they deliver. For instance, a dedicated agent capable of surpassing the performance of a junior marketer in developing a department’s top-of-funnel inbound and outbound marketing strategy could command a price of $20,000.
Multi-Agent System
Multi-Agent Systems (MAS) empower multiple autonomous agents to work together, communicating and collaborating to address complex challenges that would be insurmountable for a single agent. This specialization within a MAS allows each agent to focus on its area of expertise, enhancing the system’s overall effectiveness as agents contribute their unique skills and knowledge to solve complex problems. These agents interact with each other, frequently using diverse communication modes and channels, to achieve their individual objectives or the overarching system goal.
Figure 3 illustrates how multiple agents collaborate to enhance content generation within an organization.
MAS can exhibit different levels of control and different architectural patterns in how they communicate and coordinate through common architectural patterns:
- Hierarchical teams: This type of MAS commonly employs a central manager or task delegator to mediate communication. Worker agents within the system communicate exclusively through this central agent, preventing direct inter-agent communication.
- Peer-to-peer: In a peer-to-peer MAS, agents communicate directly with one another, without reliance on a central authority.
- Group collaboration: This type of MAS resembles a group chat (e.g., Slack, Microsoft Teams), where agents subscribe to relevant channels and coordinate via a publish-subscribe architecture.
Unlike single-agent systems, where one agent handles multiple roles, MAS enable efficient specialization, leading to improved performance across various applications. MAS are crucial for scaling complex agentic automations; overloading a single agent with tasks introduces complexity and scalability/reliability issues.
We foresee a trend toward enterprises developing a greater number of specialized agents. These agents must operate in team configuration collaborating and coordinating to achieve larger, more complex workflows. Therefore, MAS will play a crucial role in the overall success of agent-driven workflow automation initiatives.
For further examples and discussion on the necessity of MAS, refer to [3].
Agent Management System
An Agent Management System (AMS) facilitates the development, evaluation, deployment, and post-deployment monitoring of AI agents. By streamlining the creation and improvement of these agents, an AMS enables faster iteration and simplifies lifecycle management. It also ensures agents meet desired objectives through thorough pre-deployment testing and ongoing production monitoring.
Figure 4 shows the components of a representative AMS.
A representative AMS incorporates the following components:
- Agent Builder: An agent builder, often referred to as an agent framework, facilitates the rapid creation of new agents and enables iterative improvement of existing ones.
- Agent Registry: The agent registry maintains a catalog of available agents and facilitates access control and governance, incorporating version management to ensure appropriate access for the target audience.
- Agent Playground: An agent playground provides a user-friendly, plug-and-play interface for manually testing agents against various tasks and user queries. This environment allows for rapid evaluation of agent performance.
- Agent Experiments: Agent experiments enable automated pre-deployment evaluation of agents. This structured approach assesses agent performance by defining a dataset, selecting appropriate metrics, configuring the environment, analyzing results, and generating an evaluation report. Experiment logs from previous runs are also typically available.
- Deployment & Monitoring: Agent deployment involves provisioning the agent with necessary resources in a staging or production environment, while monitoring tracks relevant runtime metrics. This ensures agent reliability and effectiveness.
- Chat UI: The chat UI provides the requisite user interface for interaction with agents deployed within the production environment.
We anticipate that enterprises will deploy a significant number of purpose-built agents to address a wide range of domain-specific tasks. AMS will play a critical role in empowering organizations to create, deploy, and manage these agents at scale throughout their entire lifecycle, thereby enabling the realization of an agentic enterprise.
Task Specific Models
Although leading models such as Anthropic’s Claude, the OpenAI GPT series, Google’s Gemini, and AWS’s Nova dominated the landscape in 2024, several noteworthy trends emerged concerning the development of task- and domain-specific models, especially relevant to enterprise use cases.
Figure 5 illustrates the steps involved in this model creation process. This process is often called post-training alignment.
- Supervised Fine Tuning
Supervised fine-tuning (SFT) involves training a base model (commonly a pre-trained foundation model or an instruction-tuned variant) using a preference dataset. In the context of chain-of-thought (CoT) alignment, each record within this dataset frequently comprises (prompt, CoT, output) triples, with the CoT explicitly referencing the relevant safety specifications.
The context distillation process creates the dataset, starting with a model trained solely for helpfulness and prompts it with both safety specifications and relevant prompts. The outcome of this process is an SFT model.
2. Reinforcement Learning Fine-Tuning
The second stage employs high-compute Reinforcement Learning (RL). This stage uses a judge LLM to reward signals based on the model’s adherence to safety specifications, further refining the model’s ability to reason safely. Crucially, this entire process requires minimal human intervention beyond initial specification creation and high-level evaluation.
CoT reasoning allows the LLM to explicitly articulate its reasoning process, making its decision-making more transparent and interpretable. In RL stage Alignment, the CoT includes references to the safety specifications, demonstrating how the model arrived at its response. This enables the model to deliberate over safety-related considerations before generating an answer. The inclusion of CoT in the training data allows the model to learn to use this form of reasoning for safer responses, improving both safety and interpretability. Output of this stage is commonly referred to as “Reasoning Model”
3. Continual Fine Tuning
Continual fine-tuning allows AI engineers and data scientists to adapt models to specific use cases. Deep learning engineers and data scientists can now fine-tune frontier and open-source models with as few as 10 to 1,000 examples, significantly improving model quality for targeted applications. This is crucial for enterprises seeking to enhance model reliability for specific use cases without investing in extensive post-training infrastructure.
Most frontier models now offer continual fine-tuning APIs for both preference tuning and reinforcement learning fine-tuning (RLFT), lowering the barrier to entry for creating task- or domain-specific models.
The output of this stage can be referred to as “Task or Domain specific models”.
Open-source fine-tuning frameworks, such as Hugging Face Transformers Reinforcement Learning (TRL), Unsloth and others, provide similar continual tuning capabilities for OSS models. Early adopters of Llama models, for example, have fine-tuned it for over 85,000 times since its release.
We observe two distinct trends here as we continue to mainstream AI adoption in enterprise:
- Frontier Enterprises: Organizations possessing substantial capital investment capacity are likely to pursue a strategy of post-training open-source models, customizing them for specific domains and use cases and further refining them through continuous fine-tuning.
- Aspirational Enterprises: Companies operating with more constrained budgets but prioritizing use-case reliability are expected to focus
For further reading on post-training alignment, refer to [[5]](https://docs.google.com/document/d/1uwbyuBORLTG2qFJln3cLkS6uafYQTF4ta92KubUw8NE/edit?tab=t.0#heading=h.yltnd5f5kz5n)
Data & Ops Trends
Data is essential for successful AI implementations requiring data management best practices. Key 2025 Data & Ops trends are illustrated in Figure 6.
Let’s dive deeper into each.
Intelligent Data Platform
To accelerate data and AI innovation and reduce operational overhead, we proposed a unified and intelligent data and AI platform (IDP) in 2024. This unification and simplification effort gained significant traction across major software providers, resulting in the architecture shown in Figure 7.
The IDP streamlines integration across the data lifecycle — storage, processing, analytics, and machine learning — reducing the need for fragmented tools and manual effort. It also provides a centralized framework for data governance policies and enforcement. For a detailed overview of the IDP architecture, refer to [1].
While feature enhancements continued across major offerings from both established tech companies and startups throughout 2024, widespread adoption of Data and AI platforms for AI agents remains ongoing.
In 2025, data platform vendors will continue consolidating their services, creating a crucial foundation for AI agents and multi-agent systems by providing the information these applications require for operation and decision-making. These platforms abstract three key functionalities:
- Unified Data Plane: Unified data plane enable the onboarding, storage, management, and governance of diverse data formats, including text (e.g., PDFs), images (e.g., PNGs, JPEGs), and audio/video (e.g., MP3s). A key sub-trend within this unified data plane is the adoption of open table formats such as Apache Iceberg, Delta Lake, and Apache Hudi.
- Unified Metadata Plane: Metadata provides essential contextual information to AI applications regarding the data they process. For instance, if the data consists of an HR policy document, relevant metadata might include the document’s version number, last revision date, and author. Without rich metadata providing these nuances, agents will encounter difficulties in establishing sufficient context and delivering the expected functionality.
- Multi-Engine Orchestrator: The IDP further provides an extensible orchestration layer designed to manage and coordinate a variety of compute engines, including those used for analytical processing, data transformation, and the execution of AI models.
- Governance Plane: IDPs also serve as access control, governance, and personalization middleware, enabling agents to better understand user personas (including role, data access, and query history) and personalize responses.
ETL for AI
ETL (Extract, Transform, and Load) is a crucial data integration process for preparing raw data for AI and machine learning models. This process involves extracting data from various sources, transforming it through cleaning and formatting, and subsequently loading it into a data management or storage system, such as the IDP described previously, a data warehouse, or a vector store.
While enterprises are already familiar with ETL for structured data — extracting, transforming, and loading data from operational databases into warehouses or data lakes — ETL for AI extends this process to encompass diverse data formats, including text (.pdf, .md, .docx), audio/video (mp3, mpeg), and images (jpeg, png).
These unstructured data sources may include various content repositories, applications, and web resources used by enterprises. Indeed, the ETL process itself can leverage AI for extraction tasks, such as extracting entities (images, tables, and named entities) from PDFs using multi-modal large language models (LLMs) or optical character recognition (OCR) models.
ETL for unstructured data supports a variety of downstream use cases:
- AI-driven insights: Retrieval Augmented Generation (RAG) empowers applications to facilitate user interaction with documents, extract key summaries, and support similar use cases. Extracting and transforming data from disparate sources like SharePoint, Dropbox, Notion, and various cloud repositories and applications will be a key enabler for AI-driven insights. We anticipate vendors will continue to abstract RAG, integrating it as a readily accessible feature within unified data, analytics, and AI platforms.
- AI search: Enhances the accessibility and intelligence of enterprise content compared to traditional keyword search.
- AI-driven automation: Provides the necessary knowledge layer from unstructured data, offering agents essential contextual information.
- Post-Training Alignment and Continual Fine-tuning: Facilitates the availability of new and updated data, enabling seamless and continuous personalization of models for various departmental use cases.
Data Readiness for AI
Data readiness is the foundation for successful implementation of task specific models and AI agents.
It is a crucial “prerequisite” for success with such initiatives.
For data to be usable for AI, data needs to be comprehensively ready along many dimensions. While AI can pretty much consume all the available data that an enterprise has, the right approach is to derive the data readiness requirement from the use-cases that are being prioritized.
Some of the key dimensions of Data Readiness for AI are illustrated in Figure 8
Data Quality & Observability
Does this data pass the established quality metrics? This could mean one or more of the following:
- Trust
- Freshness
- Correctness
- Completeness of metadata
- Lineage
- Legality/Bias
- Relevance
- Versioned
How are the above metrics managed, tracked and surfaced in real time?
- Observability data
- Data Lineage
- Revision History
Data Products for AI
Data products are key for success with task-specific models, benchmarking and testing of AI models and agent applications. Some of the important AI data products are:
- Training-Ready Datasets: Labeled data becomes a valuable data product, ready for immediate use in AI training.
- Chain-of-Thought (CoT) Datasets: Unlike traditional datasets, which typically provide inputs and outputs for training, CoT datasets also include intermediate reasoning steps that explain how an answer is derived. This step-by-step reasoning approach aligns closely with how humans solve complex problems, making CoT datasets valuable for training AI models to perform tasks that require logical reasoning, planning, and interpretability.
- Distilled Datasets: Provide a smaller representative subset of a dataset that captures the full dataset’s diversity and variability. A few examples of distilled datasets are:
a. A subset of customer reviews that captures diverse sentiment levels and product categories.
b. A high-quality task specific dataset created to train smaller models (student models) to mimic the performance of larger, more complex models (teacher models).
c. A subset of technical documents distilled for fine-tuning a technical Q&A model.
4. Synthetic datasets: Distilled data used to generate synthetic datasets that mimic the original dataset’s core properties. They are often used to augment real datasets where availability of data is scarce or imbalanced. By generating variations, models can be trained on more diverse datasets.
5. Knowledge graph datasets: Data products powered by GraphRAG leverage the capabilities of graph-based data retrieval and generation. For example, a healthcare knowledge graph dataset that connects medical terms, diagnoses, treatments, and patient outcomes can be used to give personalized medical advice, suggest possible treatment options and help doctors make data-driven decisions.
6. User data: User data can be pivotal in building smarter, personalized AI applications. This data typically includes any information about the user persona and user interaction or input that an AI agent or application uses to understand the user persona an agent can utilize to provide meaningful outputs or responses. Some of the examples of user data are as follows:
a. A data analyst agent having information about user’s role and user’s dataset/query/dashboard interaction history can personalize the query response by filtering and choosing appropriate datasets access history and query runs.
b. A customer support agent that knows the user’s customer status (e.g., premium or regular) and the nature of their past support requests, can use past tickets, issues, and resolutions to prioritize responses, offer faster resolutions, or recommend specific knowledge base articles.
c. A sales agent by analyzing past communication history and engagement patterns with leads and clients, the agent can personalize follow-up strategies, recommend specific products or services, and prioritize leads based on their historical behavior.
Moonshots
Moonshot projects are ambitious, exploratory endeavors that seek to address significant challenges with groundbreaking solutions. These projects often push the boundaries of current technology, operating at the bleeding edge of innovation. While they inherently carry a high risk of failure, their potential for transformative outcomes is immense.
Although this section is a space for creative exploration, we want to explore more speculative concepts as highlighted in Figure 9.
Cognitive Agents
Cognitive agents actively and continuously learn from their experiences and adapt and improve continuously. Figure 10 depicts the defining characteristics of cognitive agents.
Besides the generic capabilities of an AI agent, Cognitive agents often have few other abilities:
- Memory Retention
Longer memory retention ability is one of the key characteristics of cognitive agents. Memory retention capability provides agents the ability to recall previous conversations, often remembering specific events including where and when it happened and often learning from those.
Cognitive agents therefore, have complex memory architecture which includes long-term storage for retention and specific form of memory such as episodic memory, which enables the agent to travel back in time and recall specific events including where and when.
An example use of episodic memory could be to recollect the steps of successful completion of a task from a prior event. In the event, the agent is faced with the same task again, it can recall the exact steps it took in the prior successful instance and perform it more efficiently this time.
2. Learn from past interactions
These agents learn from past interactions and leverage the learnings to make better decisions in the future.
3. Self-aware
These agents may further be aware of their own construction details and capabilities.
They can potentially learn from user interactions and update its knowledge base for newer learnings.
4. Self-healing
Self healing enables agents to extend its functionalities by adding new capabilities such as a tool, creating preference datasets from the recent interactions and triggering the next fine tuning job. It can further evaluate the new model and register the new model revision in the model registry along with producing a detailed model report for the AI engineer to review.
5. Self-upgrade
Optionally agents may self upgrade to the new model revision created above.
Embodied Agents
Embodied agents are types of AI agents with a physical presence like robots. This “embodiment” is crucial because it allows the agent to perceive and act in the physical world the same way as humans do, enabling it to learn and perform tasks that require the agent to develop a rich understanding of its physical space and act on tasks assigned. Generative AI is poised to revolutionize robotics by moving beyond traditional rule-based programming to operate in more complex and dynamic environments.
Figure 11 depicts how enterprises can use these novel agents for a wide range of applications.
Let’s explore how a bank may use an embodied agent. An embodied customer support agent at their branches may initiate the first interaction of a walk-in customer, provide personalized financial advice and help with transactions.
In retail, an embodied agent could take the form of in-store shopping assistants providing product information and guiding through the store. In manufacturing, these agents could handle tasks that require mobility and dexterity for tasks that are dangerous to human safety.
Agent Networking
Effective communication between multiple AI agents working towards a shared goal or complex problem is currently hampered by the lack of standardized message formats, protocols, and conflict resolution mechanisms. Future networking approaches must be scalable, low-latency, and secure, establishing trust between agents and protecting the communication network from malicious attacks.
This takes us to our last expected trend on improvements in agent networking.
Effective agent networking could revolutionize how agents communicate, collaborate, coordinate, get the work done and learn within and beyond the enterprise boundary. This trend draws parallels with the early days of standardization of intranet and internet protocols as well as evolution of communities and forums in the web 2.0 era. These trends significantly enhanced human-to-human collaboration beyond physical bounds.
Figure 12 shows four options for establishing effective agent networking.
Benefits of all the moonshot trends are enormous. Cognitive Agents can learn through these interactions by analyzing exchange data, updating their own knowledge base, analyzing exchange data, refining communication similar to how humans do, and accelerating innovation across enterprise boundaries.
Conclusion
To recap and summarize, the Applied AI trends are the ones that accelerate meaningful adoption of AI agents and applications in the enterprise whereas the Data & Ops trends provide a solid underlying substrate that support and accelerate these agentic applications. Furthermore, Moonshots are the ones that cover topics that may seem radical today but may bring the next transformative impact.
As always, the purpose of this research is to focus on technology solutions rather than on the organizational impacts. Autonomous agents naturally raise concern about job displacement, as AI agents are expected to take over repetitive tasks. Enterprises need to rediscover synergy/coordination between humans and AI agents by critically looking at redefining job roles and creating new ones centered around creating, managing and collaborating with AI. This transformation therefore creates another important task as most enterprises will need to critically upskill and reskill their workforce alongside their AI transformation.
Finally, any breakthrough in the LLM model architecture, solutions that bring in cost-effective adaptive knowledge injection or significant improvement in understanding and reasoning capabilities has the potential to further impact the underlying realities of the AI applications.