Predibase lets you run fine-tuned models at the same price, on a per-token basis. 25c/MTok up to 21B models. That's sames as Claude 3 Haiku, but with fine-tuning. #markdown#pricing
RunPod's vLLM endpoint lets you run any HuggingFace LLM with an OpenAI API priced on usage (serverless) not on idle time. "Autoscaling to 0". #llm-ops