litellm
LiteLLM is a Python SDK and proxy server to unify 100+ LLM APIs, enabling calls in OpenAI format. This is a critical tool for abstracting LLM providers, offering features like cost tracking, load balancing, and guardrails.
vLLM is an LLM hosting framework designed for fast and efficient LLM inference and serving. Its features like continuous batching and automatic prefix caching significantly improve throughput for online and offline workloads.
Visit docs.vllm.ai →