Back to Feed
Tech▲ 60
Cloudflare Optimizes Large Language Model Hosting
Cloudflare Blog·
Cloudflare is enhancing its Workers AI platform to efficiently host extra-large language models, including Moonshot's Kimi K2.5, which has been made three times faster. The company is focusing on optimizing hardware and software for the demanding requirements of AI models, particularly for agentic products. Key innovations include disaggregated prefill and decode stages to improve GPU utilization and prompt caching for long-context agent use cases. These advancements aim to reduce latency and increase throughput, supporting the growing demand for powerful AI applications.
Tags
ai
product
Original Source
Cloudflare Blog — blog.cloudflare.com