Search

Saved articles

You have not yet added any article to your bookmarks!

Browse articles
Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

Google to merge Gemini and Veo AI models, DeepMind CEO confirms

Hassabis Reveals Long-Term Vision for Unified AI Model

Google is planning to merge its flagship Gemini AI models with its Veo video-generation models, a step that could bring the company closer to its goal of developing a universal multimodal digital assistant. The strategy was confirmed by DeepMind CEO Demis Hassabis during a recent appearance on the Possible podcast, co-hosted by LinkedIn co-founder Reid Hoffman.

“We’ve always built Gemini, our foundation model, to be multimodal from the beginning,” Hassabis explained. “We have a vision for this idea of a universal digital assistant — one that actually helps you in the real world.”

Toward "Omni" AI Models

Hassabis’ comments reflect a broader trend in the AI industry toward “omni” models — systems capable of processing and generating text, images, audio, and video in a unified framework. Google’s latest Gemini updates already support image, audio, and text generation, while OpenAI’s ChatGPT has integrated image creation and Amazon has announced its own “any-to-any” model slated for release later in 2025.

Combining Gemini with Veo would enhance the model’s understanding of physical and visual information, making AI assistants more intuitive in real-world applications.

YouTube Data Likely Fueling Video Model Development

Hassabis suggested that video data from YouTube — owned by Google — is a key component in training Veo.
“Basically, by watching YouTube videos — a lot of YouTube videos — [Veo 2] can figure out, you know, the physics of the world,” he said.

Although Google has been cautious in confirming the extent of YouTube content used for AI training, the company previously told TechCrunch that its models "may be" trained on "some" YouTube content, subject to the platform’s terms and its agreements with creators.

Notably, Google updated its terms of service in 2023, expanding its data usage rights—reportedly, in part to enable broader AI model training across its platforms.

Implications for the Future of AI Assistants

The integration of Gemini and Veo is expected to yield a more context-aware AI capable of navigating not only digital but also physical interactions. While timelines remain unspecified, Hassabis' remarks suggest that such a merger is on Google's development roadmap.

“We’re trying to build a model that understands and helps you across modalities — not just with text prompts, but by interacting more deeply with the world,” Hassabis said.


As major tech companies race to develop the next generation of multimodal AI, Google’s approach — rooted in vast content libraries like YouTube and grounded in deep scientific modeling — could give it a decisive edge in building powerful, real-world digital assistants.

Stay tuned to The Horizons Times for the latest in AI innovation, emerging technologies, and breakthroughs in machine learning.

Prev Article
Billions Risked on 'Unproven' Green Tech, MPs Warn
Next Article
The Ocean Is Losing Its Ability to Store Heat as the Planet Warms Up

Comments (0)

    Leave a Comment