AIGC Market Tracker

Major multimodal model launches, enterprise deployment and large funding in video/multimodal AI

Major multimodal model launches, enterprise deployment and large funding in video/multimodal AI

Multimodal Models & Video AI

2024: A Breakthrough Year in Multimodal AI and Autonomous Ecosystems

The landscape of multimodal artificial intelligence in 2024 is experiencing unprecedented growth, driven by groundbreaking model launches, the proliferation of persistent on-device agents, and record-breaking funding rounds in video and multimodal AI startups. These developments are reshaping the way AI systems interact with media, enterprise workflows, and personal environments, signaling a new era of autonomous, privacy-preserving, and regionally controlled AI ecosystems.


Major Multimodal Model Launches and Enterprise Integration

At the forefront of technological innovation stands NVIDIA’s Nemotron 3 Super, a state-of-the-art large-scale multimodal model that exemplifies the latest advances. With 120 billion parameters housed within a hybrid Mixture of Experts (MoE) architecture, Nemotron 3 supports context windows of up to 1 million tokens, enabling real-time, multi-step autonomous reasoning across diverse media—text, images, and video. Its Multi-Token-Prediction (MTP) capability allows the model to forecast multiple tokens simultaneously, dramatically reducing inference latency and enhancing multimedia synthesis and reasoning.

Key features include:

  • Agentic reasoning optimization—NVIDIA emphasizes that Nemotron 3 Super is tailored for autonomous reasoning tasks in dense technical and multimedia environments.
  • Enterprise deployment via OCI—NVIDIA has integrated Nemotron 3 into Oracle Cloud Infrastructure (OCI), broadening access for organizations to import, customize, and scale these models for applications ranging from content creation to complex automation workflows.

This model's deployment signals a strategic shift toward democratizing access to powerful multimodal AI, enabling enterprises to leverage customized, scalable systems that can operate autonomously and securely within their infrastructure.


The Rise of Persistent, On-Device AI Agents

A prominent trend in 2024 is the emergence of persistent, on-device AI agents—software entities capable of continuous operation while maintaining strong privacy guarantees. Companies like Perplexity have pioneered this approach with offerings such as their "Personal Computer", an always-on AI assistant that runs locally on edge hardware like Mac Minis and can be controlled remotely via smartphones.

Advantages of these agents include:

  • Privacy preservation—processing occurs locally, minimizing data exposure.
  • Instantaneous responsiveness—eliminates reliance on cloud servers, reducing latency.
  • Secure, autonomous workflows—ideal for healthcare, defense, and enterprise automation.

Regulatory incentives are further accelerating this trend. States like Louisiana are offering tax breaks and infrastructure support for establishing local AI infrastructure, promoting data sovereignty and compliance. Additionally, initiatives like the Model Context Protocol (MCP) are fostering region-specific AI systems that adhere to local standards, ensuring trustworthiness and legal compliance.


Funding Boom in Multimodal, Video, and AI Startups

Investment activity in this space remains robust. Notably:

  • PixVerse, a startup specializing in video AI and multimodal content generation, raised $300 million in a Series C round led by Alibaba, highlighting industry confidence in AI-driven entertainment, advertising, and virtual production.
  • Replit, a platform for AI-powered coding automation, achieved a $9 billion valuation after its $400 million Series D, expanding its Replit Agent platform to enable multi-step automation workflows across enterprises.
  • Wonderful AI Inc. secured $150 million to develop persistent, privacy-preserving on-device agents, unlocking personal and enterprise applications.
  • Kai, a cybersecurity startup, raised $125 million to develop agent-driven threat detection and response platforms.

Adding to this momentum, Alibaba's Moonshot AI secured $1 billion in funding, aiming for a high valuation of approximately $18 billion. This massive influx of capital underscores the strategic importance of multimodal and video AI as core components of future digital ecosystems.


Broader Implications: Creativity, Trust, and Control

These technological strides are transforming creative workflows by enabling more sophisticated multimedia synthesis and autonomous content generation. The deployment of regionally sovereign AI infrastructure—anchored by standards like the Model Context Protocol (MCP)—promotes security, auditability, and regional compliance.

As AI-generated content approaches hyper-realism, trust and safety mechanisms become critical. Companies such as Microsoft are deploying digital watermarking and metadata tracking to authenticate AI-created media, combating misinformation and deepfake proliferation. Regulatory frameworks like the EU’s Article 12 emphasize transparency and explainability, pushing AI developers toward ethical and accountable deployment.

Privacy and regional control are now central to AI adoption, with governments incentivizing local infrastructure and on-device processing. These efforts aim to balance innovation with societal trust, ensuring AI systems are autonomous, privacy-preserving, and regionally compliant.


Outlook: Toward an Integrated, Trustworthy AI Ecosystem

The convergence of next-generation multimodal models, edge and on-device deployment, and massive investment is shaping an interconnected AI ecosystem poised for exponential growth. These advancements will enable:

  • More autonomous systems capable of complex reasoning.
  • Enhanced multimedia synthesis for creative industries.
  • Privacy-preserving on-device workflows vital for sensitive sectors.
  • Regionally controlled AI that respects local laws and standards.

By late 2024, AI systems are expected to become more autonomous, trustworthy, and seamlessly integrated into enterprise operations, public infrastructure, and personal environments. This evolution promises to transform industries, advance creative expression, and safeguard societal trust in AI-generated content, marking a pivotal year in the ongoing AI revolution.


Recent Highlight: Alibaba's Moonshot AI Funding

Adding a notable development, Alibaba’s Moonshot AI secured $1 billion in a funding round led by BABA (Alibaba Group), targeting a high valuation of approximately $18 billion. This substantial investment underscores the strategic importance of multimodal AI in China’s and global tech ecosystems, especially as Alibaba seeks to position itself at the forefront of video, multimodal content, and autonomous AI systems. The influx of capital is expected to accelerate innovation and deployment across enterprise, consumer, and industrial sectors globally.


In conclusion, 2024 stands as a landmark year—where technological breakthroughs, strategic investments, and regulatory shifts collectively drive the evolution toward autonomous, privacy-preserving, and trustworthy multimodal AI ecosystems. These advancements are not just enhancing AI capabilities but are fundamentally redefining the relationship between humans, media, and intelligent systems.

Sources (57)
Updated Mar 16, 2026
Major multimodal model launches, enterprise deployment and large funding in video/multimodal AI - AIGC Market Tracker | NBot | nbot.ai