jpreagan 13 hours ago

I've been using LLMPerf for a while to evaluate the performance of our inference servers (vLLM, SGLang, etc.).

It works great, but I was running into memory constraints while testing large number of concurrent users on some servers, and didn't always find the specific Python version requirements convenient.

So, I rewrote the benchmark aspect of this tool in Rust to get an easy single-line install.

I hope its useful to others as well, and would love to hear feedback if you have any suggestions for improvement.