Follow-up to: Installing vLLM as a Service on Ubuntu Server 24.04.3 LTS
This guide adds HTTPS access to your vLLM deployment using Caddy as a reverse proxy. It assumes you’ve completed the base vLLM installation.
Security Note: This guide adds transport security (HTTPS/TLS), firewall configuration, and restricts exposed endpoints to /v1/* only. However, this is NOT yet a production-hardened deployment. For production use, you must also implement:
- Rate limiting (prevent API abuse and resource exhaustion)
- Request size limits
- Additional systemd security directives
- Monitoring and logging
- Certificate renewal monitoring
- Backup and recovery procedures
- IP allowlisting or geographic restrictions (if applicable)
Carefully review all security considerations in the Notes section below before deploying to production.
Part I: Firewall Configuration
Allow SSH before enabling firewall:
sudo ufw allow 22/tcp comment 'SSH'
Note: If using a non-standard SSH port, replace 22 with your actual port number.
Allow HTTP and HTTPS:
sudo ufw allow 80/tcp comment 'HTTP'sudo ufw allow 443/tcp comment 'HTTPS'
Enable UFW:
sudo ufw enable
Verify configuration:
sudo ufw status verbose
Note: vLLM remains bound to 127.0.0.1:8000 and is not directly accessible from the network. Only the reverse proxy will reach it locally.
Part II: Verify vLLM Network Binding
Edit the launch script from Part IV of the base installation guide:
sudo -u vllm nano /opt/vllm/start-vllm.sh
Ensure --host 127.0.0.1 (not --host 0.0.0.0). Restart if changed:
sudo systemctl restart vllm.service
Part III: Install Caddy
Add Caddy Repository
Install dependencies:
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
Add Caddy GPG key:
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
Add Caddy repository:
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
Install Caddy
sudo apt updatesudo apt install -y caddy
Verify Installation
caddy versionsudo systemctl status caddy
Part IV: Configure Caddy
Option A: Self-Signed Cert (Internal/Dev)
For internal networks or development environments without a public domain.
Get server IP address:
SERVER_IP=$(hostname -I | awk '{print $1}')echo "Server IP: $SERVER_IP"
Create Caddyfile:
sudo tee /etc/caddy/Caddyfile >/dev/null <<EOF{ auto_https disable_redirects}https://${SERVER_IP} { tls internal encode zstd gzip header { Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" X-Content-Type-Options "nosniff" Referrer-Policy "no-referrer" } handle /v1/* { reverse_proxy 127.0.0.1:8000 { transport http { read_timeout 3600s write_timeout 3600s } } }}EOF
Note: Clients will see certificate warnings and must manually trust the certificate. For larger deployments, consider internal CA infrastructure or Let’s Encrypt DNS-01 validation.
Option B: Let’s Encrypt (Internet-Facing)
For public domains with internet-accessible servers. Requires ports 80/443 open and DNS pointing to your server.
Create Caddyfile:
sudo tee /etc/caddy/Caddyfile >/dev/null <<'EOF'{ email you@example.com}http://your-domain.com { redir https://{host}{uri} permanent}https://your-domain.com { encode zstd gzip header { Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" X-Content-Type-Options "nosniff" Referrer-Policy "no-referrer" } handle /v1/* { reverse_proxy 127.0.0.1:8000 { transport http { read_timeout 3600s write_timeout 3600s } } }}EOF
Replace you@example.com and your-domain.com with your actual email and domain.
Note: Let’s Encrypt automatically obtains and renews certificates. DNS must resolve to your server’s public IP and ports 80/443 must be accessible from the internet.
Option C: Custom Certs
Follow these steps if you have a commercial CA or internal PKI.
Create Caddyfile:
sudo tee /etc/caddy/Caddyfile >/dev/null <<'EOF'http://your-domain.com { redir https://{host}{uri} permanent}https://your-domain.com { tls /path/to/certificate.pem /path/to/private-key.pem encode zstd gzip header { Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" X-Content-Type-Options "nosniff" Referrer-Policy "no-referrer" } handle /v1/* { reverse_proxy 127.0.0.1:8000 { transport http { read_timeout 3600s write_timeout 3600s } } }}EOF
Replace your-domain.com with your domain and update the certificate paths.
Note: Ensure certificate files are readable by the caddy user. If your certificate includes a chain, combine it with your certificate in the .pem file.
Restart Caddy
sudo systemctl restart caddysudo systemctl status caddy
Configuration Notes
- Timeout settings:
3600s(1 hour) prevents timeouts during long LLM inference runs - Path handling:
handle /v1/*preserves the/v1prefix (required for vLLM routing). Do not usehandle_pathas it strips the prefix - Security headers: HSTS, MIME-sniffing protection, and referrer policy enabled
- Endpoint restriction: Only
/v1/*is exposed; other endpoints like/healthand/metricsare blocked
Part V: Testing
Retrieve API Key
Retrieve API key:
sudo grep VLLM_API_KEY /etc/vllm/vllm.env | cut -d '=' -f 2
Browser Test
We can test endpoint access in an ordinary browser:
Option A (Self-Signed): https://YOUR_SERVER_IP/v1/models
Option B/C (Domain): https://your-domain.com/v1/models
Expected response:
{"error":"Unauthorized"}
Note: For self-signed certificates (Option A), accept the browser security warning.
API Test with curl
Now we can actually test API access from another system:
Option A (Self-Signed):
curl -k https://YOUR_SERVER_IP/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "phi-4-mini", "messages": [ {"role": "user", "content": "Say hello in French"} ], "max_tokens": 50, "temperature": 0 }'
Option B/C (Domain):
curl https://your-domain.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "phi-4-mini", "messages": [ {"role": "user", "content": "Say hello in French"} ], "max_tokens": 50, "temperature": 0 }'
Replace YOUR_SERVER_IP, your-domain.com, and YOUR_API_KEY with your actual values.
Note: The -k flag bypasses certificate validation for self-signed certificates. For production use, configure clients to trust the certificate.
Successful response:
{"id":"chatcmpl-bfefd2e69a5269b0","object":"chat.completion","created":1768651488,"model":"phi-4-mini","choices":[{"index":0,"message":{"role":"assistant","content":"Bonjour!","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":200020,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":7,"total_tokens":10,"completion_tokens":3,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
Notes
- Certificate Management: Let’s Encrypt certificates (Option B) auto-renew via Caddy. Self-signed certificates (Option A) do not expire but require manual trust on each client. Custom certificates (Option C) must be renewed manually before expiration.
- Endpoint Restriction: Only
/v1/*endpoints are exposed. Direct access to/health,/metrics, and other vLLM endpoints is blocked by the proxy configuration. This prevents information disclosure but may complicate monitoring. - Rate Limiting: This configuration does not implement rate limiting. For production deployments, especially internet-facing servers, rate limiting is critical to prevent abuse and resource exhaustion. This will be covered in the production hardening guide.
- Client Configuration: For self-signed certificates (Option A), clients must either use
-k/--insecureflags (not recommended for production) or add the certificate to their trusted certificate store. Python clients can useverify=False(development only) or provide the certificate path. - Firewall Changes: UFW is now enabled and will persist across reboots. To allow additional services, use
sudo ufw allow PORT/PROTOCOL comment 'DESCRIPTION'before the service starts. - Caddy Logs: View Caddy logs with
sudo journalctl -u caddy -f. Logs include access requests, errors, and certificate operations. - Performance: Compression (
encode zstd gzip) reduces bandwidth but adds CPU overhead. For high-throughput deployments on constrained hardware, consider removing compression. - Multiple Models: To serve multiple models on the same server, configure additional vLLM services on different ports (e.g., 8001, 8002) and add corresponding
handleblocks in the Caddyfile. - Proxy Bypass: Clients on the same server can still access vLLM directly at
http://127.0.0.1:8000without going through the proxy. This is useful for local testing but ensure applications use the proxied endpoint for consistency. - Troubleshooting: Common issues include: DNS not resolving (Let’s Encrypt), certificate path errors (Option C), port conflicts (check
sudo ss -tlnp | grep :443), and firewall blocking (verifysudo ufw status).





