top of page


How to Turn GPU Resources into an Inference API
The Distributed GPU Cloud Story and Why Ray Is at the Center of It 💡 Core Message "A GPU that never serves a request has no business value. " Air Cloud connects everything from the runtime layer to the platform layer, so hardware actually reaches users as a real service. Introduction When people talk about AI infrastructure today, the conversation usually starts with GPU scarcity. How many H100s did you lock in? Is B200 supply going to loosen up? Does your data center have e
May 29


Two Technologies That Reduce AI Model Deployment Costs: Quantization and Prefix Caching
Hi, I'm Jinbeom Kim, a Software Developer on the AIEEV Dev Team. I studied computer science through both undergrad and graduate school, and I've been with AIEEV since the early days of the company — working on how we can operate more distributed GPU resources efficiently within Air Cloud 😊 In this post, I want to walk through two techniques we regularly evaluate when thinking about how to deploy AI models more efficiently. The first is Quantization — a method for reducing me
May 7


One Command, Done: Integrating Air API with a ClawHub Plugin
Hi,I’m CY Lee from the DevOps/SRE team. With the launch of Air API, we’ve been building out our internal infrastructure monitoring system. Along the way, we developed an OpenClaw plugin—and in this post, I’d like to walk you through what we built and why it matters. 🙂 Before We Start If you’ve used OpenClaw for a while, you’ve probably experienced something like this at least once. The moment you try to connect an external model provider, you find yourself going through the
Apr 16
bottom of page
