{"href":"https://api.simplecast.com/oembed?url=https%3A%2F%2Fgoogle-ai-release-notes.simplecast.com%2Fepisodes%2Fgeminis-multimodality-VctGtJJi","width":444,"version":"1.0","type":"rich","title":"Gemini's Multimodality","thumbnail_width":300,"thumbnail_url":"https://image.simplecastcdn.com/images/95190ff2-3221-4589-bc23-6d49f11be41f/b9555480-c416-4b1b-a5fa-3cd3fd92e9c5/rnp-logo.jpg","thumbnail_height":300,"provider_url":"https://simplecast.com","provider_name":"Simplecast","html":"<iframe src=\"https://player.simplecast.com/7d5d0c57-f399-42f4-9dc6-42212886fa05\" height=\"200\" width=\"100%\" title=\"Gemini&apos;s Multimodality\" frameborder=\"0\" scrolling=\"no\"></iframe>","height":200,"description":"Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini's multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where \"everything is vision.\" Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more.\n\nChapters:\n0:00 - Intro \n1:12 - Why Gemini is natively multimodal\n2:23 - The technology behind multimodal models\n5:15 - Video understanding with Gemini 2.5\n9:25 - Deciding what to build next\n13:23 - Building new product experiences with multimodal AI\n17:15 - The vision for proactive assistants\n24:13 - Improving video usability with variable FPS and frame tokenization\n27:35 - What’s next for Gemini’s multimodal development\n31:47 - Deep dive on Gemini’s document understanding capabilities\n37:56 - The teamwork and collaboration behind Gemini\n40:56 - What’s next with model behavior\n\n"}