litellm
LiteLLM is a Python SDK and proxy server to unify 100+ LLM APIs, enabling calls in OpenAI format. This is a critical tool for abstracting LLM providers, offering features like cost tracking, load balancing, and guardrails.
LiteLLM is a Python SDK and proxy server to unify 100+ LLM APIs, enabling calls in OpenAI format. This is a critical tool for abstracting LLM providers, offering features like cost tracking, load balancing, and guardrails.
OpenInference delivers OpenTelemetry instrumentation for AI observability. This allows tracing AI applications, including LLMs and their ecosystem components, providing critical insights into their runtime behavior across any OpenTelemetry-compatible backend.
This is my go-to C library for efficiently running LLaMA and other LLMs locally on consumer hardware, especially useful for Mac users where it shines on Apple Silicon. It really democratizes local AI inference.
vLLM is an LLM hosting framework designed for fast and efficient LLM inference and serving. Its features like continuous batching and automatic prefix caching significantly improve throughput for online and offline workloads.