Major multimodal model launches, enterprise deployment and large funding in video/multimodal AI

Multimodal Models & Video AI

2024: A Breakthrough Year in Multimodal AI and Autonomous Ecosystems

The landscape of multimodal artificial intelligence in 2024 is experiencing unprecedented growth, driven by groundbreaking model launches, the proliferation of persistent on-device agents, and record-breaking funding rounds in video and multimodal AI startups. These developments are reshaping the way AI systems interact with media, enterprise workflows, and personal environments, signaling a new era of autonomous, privacy-preserving, and regionally controlled AI ecosystems.

Major Multimodal Model Launches and Enterprise Integration

At the forefront of technological innovation stands NVIDIA’s Nemotron 3 Super, a state-of-the-art large-scale multimodal model that exemplifies the latest advances. With 120 billion parameters housed within a hybrid Mixture of Experts (MoE) architecture, Nemotron 3 supports context windows of up to 1 million tokens, enabling real-time, multi-step autonomous reasoning across diverse media—text, images, and video. Its Multi-Token-Prediction (MTP) capability allows the model to forecast multiple tokens simultaneously, dramatically reducing inference latency and enhancing multimedia synthesis and reasoning.

Key features include:

Agentic reasoning optimization—NVIDIA emphasizes that Nemotron 3 Super is tailored for autonomous reasoning tasks in dense technical and multimedia environments.
Enterprise deployment via OCI—NVIDIA has integrated Nemotron 3 into Oracle Cloud Infrastructure (OCI), broadening access for organizations to import, customize, and scale these models for applications ranging from content creation to complex automation workflows.

This model's deployment signals a strategic shift toward democratizing access to powerful multimodal AI, enabling enterprises to leverage customized, scalable systems that can operate autonomously and securely within their infrastructure.

The Rise of Persistent, On-Device AI Agents

A prominent trend in 2024 is the emergence of persistent, on-device AI agents—software entities capable of continuous operation while maintaining strong privacy guarantees. Companies like Perplexity have pioneered this approach with offerings such as their "Personal Computer", an always-on AI assistant that runs locally on edge hardware like Mac Minis and can be controlled remotely via smartphones.

Advantages of these agents include:

Privacy preservation—processing occurs locally, minimizing data exposure.
Instantaneous responsiveness—eliminates reliance on cloud servers, reducing latency.
Secure, autonomous workflows—ideal for healthcare, defense, and enterprise automation.

Regulatory incentives are further accelerating this trend. States like Louisiana are offering tax breaks and infrastructure support for establishing local AI infrastructure, promoting data sovereignty and compliance. Additionally, initiatives like the Model Context Protocol (MCP) are fostering region-specific AI systems that adhere to local standards, ensuring trustworthiness and legal compliance.

Funding Boom in Multimodal, Video, and AI Startups

Investment activity in this space remains robust. Notably:

PixVerse, a startup specializing in video AI and multimodal content generation, raised $300 million in a Series C round led by Alibaba, highlighting industry confidence in AI-driven entertainment, advertising, and virtual production.
Replit, a platform for AI-powered coding automation, achieved a $9 billion valuation after its $400 million Series D, expanding its Replit Agent platform to enable multi-step automation workflows across enterprises.
Wonderful AI Inc. secured $150 million to develop persistent, privacy-preserving on-device agents, unlocking personal and enterprise applications.
Kai, a cybersecurity startup, raised $125 million to develop agent-driven threat detection and response platforms.

Adding to this momentum, Alibaba's Moonshot AI secured $1 billion in funding, aiming for a high valuation of approximately $18 billion. This massive influx of capital underscores the strategic importance of multimodal and video AI as core components of future digital ecosystems.

Broader Implications: Creativity, Trust, and Control

These technological strides are transforming creative workflows by enabling more sophisticated multimedia synthesis and autonomous content generation. The deployment of regionally sovereign AI infrastructure—anchored by standards like the Model Context Protocol (MCP)—promotes security, auditability, and regional compliance.

As AI-generated content approaches hyper-realism, trust and safety mechanisms become critical. Companies such as Microsoft are deploying digital watermarking and metadata tracking to authenticate AI-created media, combating misinformation and deepfake proliferation. Regulatory frameworks like the EU’s Article 12 emphasize transparency and explainability, pushing AI developers toward ethical and accountable deployment.

Privacy and regional control are now central to AI adoption, with governments incentivizing local infrastructure and on-device processing. These efforts aim to balance innovation with societal trust, ensuring AI systems are autonomous, privacy-preserving, and regionally compliant.

Outlook: Toward an Integrated, Trustworthy AI Ecosystem

The convergence of next-generation multimodal models, edge and on-device deployment, and massive investment is shaping an interconnected AI ecosystem poised for exponential growth. These advancements will enable:

More autonomous systems capable of complex reasoning.
Enhanced multimedia synthesis for creative industries.
Privacy-preserving on-device workflows vital for sensitive sectors.
Regionally controlled AI that respects local laws and standards.

By late 2024, AI systems are expected to become more autonomous, trustworthy, and seamlessly integrated into enterprise operations, public infrastructure, and personal environments. This evolution promises to transform industries, advance creative expression, and safeguard societal trust in AI-generated content, marking a pivotal year in the ongoing AI revolution.

Recent Highlight: Alibaba's Moonshot AI Funding

Adding a notable development, Alibaba’s Moonshot AI secured $1 billion in a funding round led by BABA (Alibaba Group), targeting a high valuation of approximately $18 billion. This substantial investment underscores the strategic importance of multimodal AI in China’s and global tech ecosystems, especially as Alibaba seeks to position itself at the forefront of video, multimodal content, and autonomous AI systems. The influx of capital is expected to accelerate innovation and deployment across enterprise, consumer, and industrial sectors globally.

In conclusion, 2024 stands as a landmark year—where technological breakthroughs, strategic investments, and regulatory shifts collectively drive the evolution toward autonomous, privacy-preserving, and trustworthy multimodal AI ecosystems. These advancements are not just enhancing AI capabilities but are fundamentally redefining the relationship between humans, media, and intelligent systems.

Sources (57)

Updated Mar 16, 2026

Major multimodal model launches, enterprise deployment and large funding in video/multimodal AI

2024: A Breakthrough Year in Multimodal AI and Autonomous Ecosystems

Major Multimodal Model Launches and Enterprise Integration

The Rise of Persistent, On-Device AI Agents

Funding Boom in Multimodal, Video, and AI Startups

Broader Implications: Creativity, Trust, and Control

Outlook: Toward an Integrated, Trustworthy AI Ecosystem

Recent Highlight: Alibaba's Moonshot AI Funding

Show HN: KeyID – Free email and phone infrastructure for AI agents (MCP)

ClauDesk

AmPN AI Memory Store

The AI-Driven Decision Ecosystem

The Agent Anatomy: Why intelligence is no longer enough to build an AI agent

Alibaba (BABA) Gains 0.75% to $135.21 as Moonshot AI Targets $18B Valuation in $1B Funding Round

AI agent development startup Wonderful reels in $150M

Alibaba-Backed Video AI Startup PixVerse Raises $300 Million

Cybersecurity startup Kai raises $125M to build agent-driven AI security platform

Georgian Leads $400M Series D Investment in Replit to support continued investment in Replit Agent

Zendesk Acquiring AI Customer Service Startup Forethought

PixVerse Raises $300M Series C for | SignalBase

Revibe — Your codebase, fully understood

Domain-specific AI models are the future of enterprise ROI

NVIDIA Nemotron 3 Super on OCI Generative AI: Import and Run Your Own Models

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

@LinusEkenstam: Some fresh $400M at a $9B valuation. And Replit Agent 4. Launching all this minutes before I start...

@therundownai: Perplexity just launched "Personal Computer", an always-on AI agent that merges their cloud-based Co...

@omarsar0: Great news for devs deploying agents with open models. @FireworksAI_HQ now offers high-performance ...

[PDF] Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba ...

Perplexity Now Gets Inspired by OpenClaw

Wiz joins Google

@icreatelife: The coolest part? Everything's connected. Create your work with AI Assistant (beta) in Photoshop (w...

Google Just Dropped Bayesian: AI That Evolves In Real Time

An Honest Conversation About AI Security w/ Katherine McNamara

Yann LeCun’s new startup AMI Labs raises $1.03B to train world models

Legora hits $5.55B valuation with $550M Series D round

AI Daily: GPT-5.4 Release, ChatGPT for Excel, DeepMind Nano Banana 2, New LLM Research

Yann LeCun’s AMI Labs Launches With $1.03 Billion to Build AI That Understands the Real World

Yann LeCun’s AMI Labs Raises $1B Seed Round to Advance World Models for Robotics and Industry

Google to Provide Pentagon with Gemini-powered AI agents

Lyzr AI hits $250M valuation to build on-prem enterprise AI agents

Amazon holds engineering meeting following AI-related outages

Nvidia-backed Nscale Raises $2B at $14.6B Valuation in Funding Round

Anthropic sues Defense Department over supply chain risk designation

OpenAI acquires Promptfoo to secure its AI agents

Phi-4-reasoning-vision

Launch HN: Terminal Use (YC W26) – Vercel for filesystem-based agents

Nvidia backs $2 billion Nscale funding round as IPO plans accelerate

CRUSHING Top Models: The Insane 1T Parameter AI Upgrade! Better Than DeepSeek #ai

Anthropic's Pentagon Deal Sparks Defense Tech Reckoning

Tech companies could receive large tax breaks in Louisiana as data centers begin construction

OpenAI Builds AI Search Engine to Rival Google with ChatGPT Tech

Explosive Silicon Valley dispute raises questions over AI giants’ willingness to help track Americans and fuel robot warfare

LLMOps startup Portkey raises $15 million in round led by Elevation Capital

Anthropic acquires computer-use AI startup Vercept after Meta poached one of its founders

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

A roadmap for AI, if anyone will listen

Mozart AI announces ‘oversubscribed’ $6 million seed round, says it’s topped 100,000 users

Marvell Q4 Breakdown: The Ironclad Foundation of AI Infrastructure.

Flock AI Raises $6 Million Seed Round to Advance AI-Generated Visual Commerce

Olmo Hybrid

Netflix Acquires AI Filmmaking Startup InterPositive Founded by Ben Affleck

AI cloud company Together AI, which rents out Nvidia chips, pursues $1B in fresh funding: report

India's Adani Group To Invest $100 Billion In AI Data Centers Amid Strategic Partnership With Google, Microsoft

@emollick: Had early access to GPT-5.4 and Pro. They are very good. One fun illustration of progress, this is...