Google DeepMind Blog·Frontier Labs·2 Apr 2026
Hype 4/10EXPLORE
New ways to balance cost and reliability in the Gemini API
Google adds Flex (lower cost, higher latency) and Priority (low latency, higher cost) tiers to the Gemini API.
Why it matters
Tiered inference pricing gives enterprise architects a direct lever to optimise AI workload economics — batch analytics and async processing move to Flex, while customer-facing or time-critical workflows justify Priority pricing. For banks running high-volume document processing or compliance screening at scale, the cost differential between tiers can materially shift the ROI calculation on Gemini-based deployments.