2026-06-11 google
Google open-sourced the first mainstream text diffusion model. The real story isn't 'fast'. It's that the local decode bottleneck moves from memory bandwidth to compute, with bidirectional attention generating 256 tokens at once. The cost: quality, experimental status, and the 26B MoE trade-offs.
Read analysis 2026-06-10 deepseek
DeepSeek V4 matters because it turns 1M context from a capability demo into a cost, routing, and product-default problem for builders.
Read analysis 2026-06-10 deepseek
DeepSeek V4 pressures closed frontier models by pairing open weights with same-day API availability, compatibility, and a clear migration path.
Read analysis 2026-06-10 xiaomi
MiMo-V2.5-Pro-UltraSpeed's 1000 tps claim matters less as a speed stunt than as a change in long-output, parallel-sampling, and real-time interaction economics.
Read analysis 2026-06-10 xiaomi
MiMo UltraSpeed is a strong signal for real-time agents, but limited capacity and controlled access make it a premium path rather than a universal production backend.
Read analysis 2026-06-08 xiaomi
MiMo-V2.5-Pro-UltraSpeed decodes a trillion-parameter model past 1000 tps on a single 8-GPU commodity node. The real signal is that model-system codesign broke the 'extreme speed needs custom silicon' equation β not the operating-room marketing wrapped around it.
Read analysis