Google Launches Gemini 3.1 Flash Live With 90.8% Task Accuracy
Google released Gemini 3.1 Flash Live on Thursday, marking its most capable audio AI model to date with significant improvements in multi-step task execution and conversational quality.
The model scored 90.8% on ComplexFuncBench Audio, a benchmark measuring multi-step function calling with various constraints—a notable jump from previous Gemini versions. On Scale AI's Audio MultiChallenge test, which evaluates instruction following amid real-world audio interruptions, 3.1 Flash Live hit 36.1% with reasoning features enabled, leading the category.
Where It's Available
Google is rolling out 3.1 Flash Live across three tiers: developers can access it via the Gemini Live API in Google AI Studio (currently in preview), enterprises through Gemini Enterprise for Customer Experience, and general users via Search Live and Gemini Live.
The enterprise angle matters here. Verizon, LiveKit, and The Home Depot have already tested the model in production workflows, with Google citing positive feedback on conversation naturalness. For companies building voice-based customer service or internal tools, the improved tonal recognition—detecting frustration, confusion, and adjusting responses accordingly—addresses a persistent weakness in earlier voice AI systems.
Technical Improvements
Beyond raw benchmark scores, Google highlights better acoustic nuance detection compared to 2.5 Flash Native Audio. The model reads pitch and pace more accurately, which translates to less robotic-sounding interactions.
For Gemini Live users specifically, Google claims faster response times and doubled conversation memory—the model can now track conversational threads twice as long as before. That's meaningful for extended brainstorming sessions or complex multi-turn queries where context drift typically degrades output quality.
Global Expansion
The multilingual capabilities of 3.1 Flash Live enabled Google to expand Search Live to over 200 countries and territories this week. Users can now conduct real-time, multimodal conversations with Search in their preferred language.
All audio output carries SynthID watermarking—Google's imperceptible marker for detecting AI-generated content. The company positions this as a misinformation safeguard, though its practical enforcement remains an open question as AI audio proliferates.
Developers interested in building voice-first applications can access the model immediately through Google AI Studio, with enterprise pricing and availability details available through Gemini Enterprise for Customer Experience.