LiteLLM Integration
A comprehensive integration of LiteLLM for unified LLM API access. Includes a proxy server, usage examples, and configuration templates.
Overview
LiteLLM is a library that provides a unified interface for calling various LLM APIs (OpenAI, Anthropic, Google, Ollama, etc.) using the OpenAI format. This project includes:
- Proxy Server: FastAPI-based proxy server for LLM requests
- Usage Examples: Comprehensive examples for different providers
- Configuration: YAML configuration for model management
Features
- Unified API: Call any LLM provider using OpenAI-compatible format
- Multiple Providers: Support for OpenAI, Anthropic, Google, Ollama, and more
- Proxy Server: HTTP proxy for centralized LLM access
- Streaming Support: Real-time streaming responses
- Async Support: Asynchronous API calls
- Easy Configuration: YAML-based configuration
Installation
Basic Installation
pip install -r requirements.txtWith Proxy Extras
pip install 'litellm[proxy]'Quick Start
1. Basic Usage
from litellm import completion
response = completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)2. Run Proxy Server
python proxy_server.pyThe server will start on http://localhost:8000.
🏭 Production Deployment
Deployment Strategy
For high-throughput environments, deploy the LiteLLM Proxy as a scalable microservice.
Docker Deployment
- Dockerfile:
FROM python:3.9-slim
WORKDIR /app
RUN pip install 'litellm[proxy]'
COPY config.yaml .
CMD ["litellm", "--config", "config.yaml", "--port", "8000", "--host", "0.0.0.0"]- Run Container:
docker run -d -p 8000:8000 \
-e OPENAI_API_KEY=sk-... \
-v $(pwd)/config.yaml:/app/config.yaml \
litellm-proxy:latestScaling & Load Balancing
- Horizontal Scaling: Run multiple instances behind a load balancer (Nginx, AWS ALB). LiteLLM Proxy is stateless.
- Internal Load Balancing: Configure LiteLLM’s
Routerto balance traffic across multiple API keys or deployments (e.g., multiple Azure deployments).
model_list:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4-east
api_base: https://east-us.api.cognitive.microsoft.com/
api_key: env/AZURE_KEY_1
- model_name: gpt-4
litellm_params:
model: azure/gpt-4-west
api_base: https://west-us.api.cognitive.microsoft.com/
api_key: env/AZURE_KEY_2Cost Monitoring & Control
- Budgeting: Set monthly budgets per user or key in
config.yaml. - Database: Connect a PostgreSQL database to track spend and usage logs.
# Set DATABASE_URL environment variable
export DATABASE_URL="postgresql://user:pass@db:5432/litellm"Observability
- Logging: Configure callbacks for LangFuse, Helicone, or custom logging.
- Metrics: Expose Prometheus metrics at
/metricsto monitor latency, error rates, and request volume.
Production Readiness Checklist
Configuration
Environment Variables
Set API keys as environment variables:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."Configuration File
Use config.yaml for advanced configuration:
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEYStart proxy with config:
litellm --config config.yamlAdvanced Features
Router (Load Balancing)
from litellm import Router
router = Router(
model_list=[
{"model_name": "gpt-3.5-turbo", "litellm_params": {"model": "openai/gpt-3.5-turbo"}},
{"model_name": "claude-3-sonnet", "litellm_params": {"model": "anthropic/claude-3-sonnet-20240229"}},
]
)
response = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)Fallbacks
response = completion(
model=["gpt-4", "gpt-3.5-turbo", "claude-3-sonnet"], # Try in order
messages=[{"role": "user", "content": "Hello!"}]
)Troubleshooting
API Key Issues
Ensure API keys are set:
echo $OPENAI_API_KEYOllama Connection
Check if Ollama is running:
curl http://localhost:11434/api/tagsLicense
See main repository LICENSE file.