DevOps Ninja logo devops.ninja

AI/ML Ops

vLLM, Ray, MLflow, vector DBs — running ML in production without burning the org down.

Self-hosted inference is finally cheaper than the API tier for most production workloads above ~100M tokens/month. vLLM with multi-LoRA can serve 5+ specialist adapters from a single 8GB GPU. The vector DB war is mostly over (pgvector + a good index strategy beats most dedicated solutions for under 10M vectors).

These guides cover real deployments — quantization, batching strategies, eval pipelines — not 'here's how to call OpenAI' tutorials.

##Guides & Reviews