Topic

#inference optimization

2 articles

Google AI / DeepMindApr 5, 2026

Google's TurboQuant Algorithm Slashes LLM Memory Usage by 6x, Opening the Door to On-Device AI

A new compression technique from Google reduces large language model memory requirements by more than six times — potentially bringing frontier-class AI to phones and laptops.

3 min read

Google AI / DeepMindApr 1, 2026

Google's TurboQuant Compresses AI Memory 6x With Zero Accuracy Loss, Rattles Chip Industry

Google Research unveils TurboQuant, an algorithm that shrinks large language model memory usage by 6x with no accuracy loss and no retraining — wiping billions from memory chip makers in the process.

4 min read