Video models are zero-shot learners and reasoners
Video models are zero-shotlearners and reasoners
Google
DeepMind
* Joint leads.
TL;DR
Veo 3 shows emergent zero-shot abilities across many visual tasks, indicating that video models are on a path to becoming vision foundation models—just like LLMs became foundation models for language.
Abstract
The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation eme...
Read more at video-zero-shot.github.io