Securing vLLM with Caddy Reverse Proxy

Follow-up to: Installing vLLM as a Service on Ubuntu Server 24.04.3 LTS

This guide adds HTTPS access to your vLLM deployment using Caddy as a reverse proxy. It assumes you’ve completed the base vLLM installation.

Security Note: This guide adds transport security (HTTPS/TLS), firewall configuration, and restricts exposed endpoints to /v1/* only. However, this is NOT yet a production-hardened deployment. For production use, you must also implement:

Rate limiting (prevent API abuse and resource exhaustion)
Request size limits
Additional systemd security directives
Monitoring and logging
Certificate renewal monitoring
Backup and recovery procedures
IP allowlisting or geographic restrictions (if applicable)

Carefully review all security considerations in the Notes section below before deploying to production.

Part I: Firewall Configuration

Allow SSH before enabling firewall:

sudo ufw allow 22/tcp comment 'SSH'

Note: If using a non-standard SSH port, replace 22 with your actual port number.

Allow HTTP and HTTPS:

			
sudo ufw allow 80/tcp comment 'HTTP'
sudo ufw allow 443/tcp comment 'HTTPS'

Enable UFW:

sudo ufw enable

Verify configuration:

sudo ufw status verbose

Note: vLLM remains bound to 127.0.0.1:8000 and is not directly accessible from the network. Only the reverse proxy will reach it locally.

Part II: Verify vLLM Network Binding

Edit the launch script from Part IV of the base installation guide:

sudo -u vllm nano /opt/vllm/start-vllm.sh

Ensure --host 127.0.0.1 (not --host 0.0.0.0). Restart if changed:

sudo systemctl restart vllm.service

Part III: Install Caddy

Add Caddy Repository

Install dependencies:

			
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl

Add Caddy GPG key:

			
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg

Add Caddy repository:

			
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list

Install Caddy

			
sudo apt update
sudo apt install -y caddy

Verify Installation

			
caddy version
sudo systemctl status caddy

Part IV: Configure Caddy

Option A: Self-Signed Cert (Internal/Dev)

For internal networks or development environments without a public domain.

Get server IP address:

			
SERVER_IP=$(hostname -I | awk '{print $1}')
echo "Server IP: $SERVER_IP"

Create Caddyfile:

			
sudo tee /etc/caddy/Caddyfile >/dev/null <<EOF
{
    auto_https disable_redirects
}
https://${SERVER_IP} {
    tls internal
    
    encode zstd gzip
    
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
        Referrer-Policy "no-referrer"
    }
    
    handle /v1/* {
        reverse_proxy 127.0.0.1:8000 {
            transport http {
                read_timeout 3600s
                write_timeout 3600s
            }
        }
    }
}
EOF

		

Note: Clients will see certificate warnings and must manually trust the certificate. For larger deployments, consider internal CA infrastructure or Let’s Encrypt DNS-01 validation.

Option B: Let’s Encrypt (Internet-Facing)

For public domains with internet-accessible servers. Requires ports 80/443 open and DNS pointing to your server.

Create Caddyfile:

			
sudo tee /etc/caddy/Caddyfile >/dev/null <<'EOF'
{
    email you@example.com
}
http://your-domain.com {
    redir https://{host}{uri} permanent
}
https://your-domain.com {
    encode zstd gzip
    
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
        Referrer-Policy "no-referrer"
    }
    
    handle /v1/* {
        reverse_proxy 127.0.0.1:8000 {
            transport http {
                read_timeout 3600s
                write_timeout 3600s
            }
        }
    }
}
EOF

		

Replace you@example.com and your-domain.com with your actual email and domain.

Note: Let’s Encrypt automatically obtains and renews certificates. DNS must resolve to your server’s public IP and ports 80/443 must be accessible from the internet.

Option C: Custom Certs

Follow these steps if you have a commercial CA or internal PKI.

Create Caddyfile:

			
sudo tee /etc/caddy/Caddyfile >/dev/null <<'EOF'
http://your-domain.com {
    redir https://{host}{uri} permanent
}
https://your-domain.com {
    tls /path/to/certificate.pem /path/to/private-key.pem
    
    encode zstd gzip
    
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
        Referrer-Policy "no-referrer"
    }
    
    handle /v1/* {
        reverse_proxy 127.0.0.1:8000 {
            transport http {
                read_timeout 3600s
                write_timeout 3600s
            }
        }
    }
}
EOF

		

Replace your-domain.com with your domain and update the certificate paths.

Note: Ensure certificate files are readable by the caddy user. If your certificate includes a chain, combine it with your certificate in the .pem file.

Restart Caddy

			
sudo systemctl restart caddy
sudo systemctl status caddy

Configuration Notes

Timeout settings: 3600s (1 hour) prevents timeouts during long LLM inference runs
Path handling: handle /v1/* preserves the /v1 prefix (required for vLLM routing). Do not use handle_path as it strips the prefix
Security headers: HSTS, MIME-sniffing protection, and referrer policy enabled
Endpoint restriction: Only /v1/* is exposed; other endpoints like /health and /metrics are blocked

Part V: Testing

Retrieve API Key

Retrieve API key:

sudo grep VLLM_API_KEY /etc/vllm/vllm.env | cut -d '=' -f 2

Browser Test

We can test endpoint access in an ordinary browser:

Option A (Self-Signed): https://YOUR_SERVER_IP/v1/models
Option B/C (Domain): https://your-domain.com/v1/models

Expected response:

{"error":"Unauthorized"}

Note: For self-signed certificates (Option A), accept the browser security warning.

API Test with curl

Now we can actually test API access from another system:

Option A (Self-Signed):

			
curl -k https://YOUR_SERVER_IP/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "phi-4-mini",
    "messages": [
      {"role": "user", "content": "Say hello in French"}
    ],
    "max_tokens": 50,
    "temperature": 0
  }'

		

Option B/C (Domain):

			
curl https://your-domain.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "phi-4-mini",
    "messages": [
      {"role": "user", "content": "Say hello in French"}
    ],
    "max_tokens": 50,
    "temperature": 0
  }'

		

Replace YOUR_SERVER_IP, your-domain.com, and YOUR_API_KEY with your actual values.

Note: The -k flag bypasses certificate validation for self-signed certificates. For production use, configure clients to trust the certificate.

Successful response:

{"id":"chatcmpl-bfefd2e69a5269b0","object":"chat.completion","created":1768651488,"model":"phi-4-mini","choices":[{"index":0,"message":{"role":"assistant","content":"Bonjour!","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":200020,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":7,"total_tokens":10,"completion_tokens":3,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Notes

Certificate Management: Let’s Encrypt certificates (Option B) auto-renew via Caddy. Self-signed certificates (Option A) do not expire but require manual trust on each client. Custom certificates (Option C) must be renewed manually before expiration.
Endpoint Restriction: Only /v1/* endpoints are exposed. Direct access to /health, /metrics, and other vLLM endpoints is blocked by the proxy configuration. This prevents information disclosure but may complicate monitoring.
Rate Limiting: This configuration does not implement rate limiting. For production deployments, especially internet-facing servers, rate limiting is critical to prevent abuse and resource exhaustion. This will be covered in the production hardening guide.
Client Configuration: For self-signed certificates (Option A), clients must either use -k/--insecure flags (not recommended for production) or add the certificate to their trusted certificate store. Python clients can use verify=False (development only) or provide the certificate path.
Firewall Changes: UFW is now enabled and will persist across reboots. To allow additional services, use sudo ufw allow PORT/PROTOCOL comment 'DESCRIPTION' before the service starts.
Caddy Logs: View Caddy logs with sudo journalctl -u caddy -f. Logs include access requests, errors, and certificate operations.
Performance: Compression (encode zstd gzip) reduces bandwidth but adds CPU overhead. For high-throughput deployments on constrained hardware, consider removing compression.
Multiple Models: To serve multiple models on the same server, configure additional vLLM services on different ports (e.g., 8001, 8002) and add corresponding handle blocks in the Caddyfile.
Proxy Bypass: Clients on the same server can still access vLLM directly at http://127.0.0.1:8000 without going through the proxy. This is useful for local testing but ensure applications use the proxied endpoint for consistency.
Troubleshooting: Common issues include: DNS not resolving (Let’s Encrypt), certificate path errors (Option C), port conflicts (check sudo ss -tlnp | grep :443), and firewall blocking (verify sudo ufw status).