Modern Reverse Proxies Part 2: Traefik - The Cloud-Native Orchestrator

In Part 1 of our series on modern reverse proxies, we established the business case for dynamic service discovery and automated configuration. We saw how traditional proxies like Nginx and Apache create operational bottlenecks in cloud-native environments. Now, in this second installment, we'll take a deep dive into Traefik — exploring its architecture, capabilities, and practical deployment considerations.

Before we get into the specifics of Traefik, let's establish what we're examining: Traefik is a cloud-native reverse proxy and load balancer designed specifically for containerized environments. It automatically discovers services and configures routing without manual intervention. Over the course of this article, we'll examine how Traefik works, when it makes sense to use it, and what practical considerations you should keep in mind.

Traefik: The Cloud-Native Orchestrator

Traefik excels in complex, container-based environments, especially with Kubernetes or Docker Swarm. Its deep integration streamlines routing management in large-scale microservice architectures.

One may wonder: what makes Traefik different from traditional reverse proxies? The answer lies in its architecture. Where Nginx or Apache require you to manually update configuration files whenever services change, Traefik actively monitors your infrastructure and adapts automatically. This goes beyond a convenience feature — it fundamentally changes how you manage routing in dynamic environments¹.

Traefik excels with deep integration into container orchestration platforms like Kubernetes, Docker, and AWS. It automatically discovers services and configures routing, making it valuable for organizations heavily invested in containerization.

Deep Integration with Container Platforms

Traefik's most distinctive capability is its native integration with container orchestration platforms. This integration means you define routing rules alongside your application definitions, enabling infrastructure-as-code practices for your entire stack.

Kubernetes Integration:

Traefik operates as a Kubernetes Ingress Controller, meaning it integrates directly with Kubernetes' own ingress resources:

# Example: Kubernetes Ingress resource for Traefik
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    # Traefik-specific annotations for advanced routing
    traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
    traefik.ingress.kubernetes.io/router.tls: "true"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-service
            port:
              number: 8080
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls-secret

Beyond basic ingress, Traefik extends Kubernetes with Custom Resource Definitions (CRDs) for advanced routing scenarios that aren't possible with standard ingress:

# Example: Traefik's Middleware CRD for request transformation
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: strip-prefix
spec:
  stripPrefix:
    prefixes:
      - /api

These CRDs give you fine-grained control over rate limiting, authentication, header manipulation, and more — all defined as Kubernetes resources².

Docker Integration:

For Docker environments without orchestration, Traefik uses Docker labels to discover services:

# Example: docker-compose.yml with Traefik labels
version: '3.8'
services:
  myapp:
    image: myapp:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.myapp.rule=Host(`myapp.localhost`)"
      - "traefik.http.services.myapp.loadbalancer.server.port=8080"
      - "traefik.http.middlewares.test-ratelimit.ratelimit.avg=100"
      - "traefik.http.routers.myapp.middlewares=test-ratelimit@docker"
    ports:
      - "8080:8080"

When you start this container, Traefik automatically detects it through the Docker socket and configures routing based on those labels — no manual configuration needed.

Cloud Provider Integration:

Traefik integrates with major cloud providers' container services:

AWS EKS: Native integration as a Kubernetes ingress controller
Google Cloud Run: Automatic service discovery
Azure Container Instances: Support via Kubernetes
Multi-cloud: Consistent configuration across providers

This integration means routing configuration lives alongside application definitions. Consider what this enables³:

Version control for entire infrastructure: Your routing rules are in Git alongside your application code
GitOps workflows: Changes reviewed, tested, and deployed systematically
Reduced context switching: No separate configuration management system
Single source of truth: Infrastructure defined in one place

Of course, this benefits most organizations heavily invested in containerization. If you're running traditional monoliths on virtual machines, Traefik's value proposition is less compelling⁴.

Advanced Routing Capabilities

Traefik provides sophisticated routing options that go well beyond simple path-based routing. Let's examine what's available with concrete examples.

Path-Based Routing:

The most common pattern: route different URL paths to different services:

# Traefik static configuration (file provider)
http:
  routers:
    api:
      rule: "PathPrefix(`/api`)"
      service: api-service
    admin:
      rule: "PathPrefix(`/admin`)"
      service: admin-service
    frontend:
      rule: "PathPrefix(`/`)"
      service: frontend-service

In this configuration, requests to /api/* go to the API service, /admin/* to the admin interface, and everything else to the frontend.

Host-Based Routing:

Route based on domain names — essential for multi-tenant applications:

http:
  routers:
    api:
      rule: "Host(`api.example.com`)"
      service: api-service
    www:
      rule: "Host(`www.example.com`)"
      service: frontend-service
    admin:
      rule: "Host(`admin.example.com`)"
      service: admin-service

Header-Based Routing:

Route based on custom headers — useful for A/B testing, canary deployments, or feature flags:

http:
  routers:
    canary:
      rule: "Host(`app.example.com`) && Headers(`X-Canary`, `true`)"
      service: canary-service
    stable:
      rule: "Host(`app.example.com`)"
      service: stable-service

In this configuration, only requests with header X-Canary: true go to the canary service; all others go to stable⁵.

Weighted Round-Robin:

Gradually shift traffic between versions — critical for safe rollouts:

http:
  services:
    myapp:
      weighted:
        services:
          - name: myapp-v1
            weight: 90  # 90% of traffic
          - name: myapp-v2
            weight: 10  # 10% of traffic

Start with 90/10 split, monitor, then gradually increase traffic to v2. If issues arise, you can immediately revert by adjusting weights.

Priority-Based Routing:

When multiple routes could match, Traefik uses priority to determine which takes precedence:

http:
  routers:
    specific:
      rule: "Host(`app.example.com`) && PathPrefix(`/admin`)"
      priority: 100
      service: admin-service
    general:
      rule: "Host(`app.example.com`)"
      priority: 10
      service: frontend-service

The /admin route has higher priority, so admin requests match that router rather than the general frontend router.

Middleware and Request Transformation

Traefik's middleware system allows you to modify requests and responses without changing your applications. Middlewares are chained together to create powerful processing pipelines.

Authentication Middleware:

Traefik supports several authentication approaches:

Basic Auth: Simple username/password protection
Digest Auth: More secure than basic (challenge-response)
Forward Auth: Delegate authentication to external service

Let's look at a concrete example of forward auth:

http:
  middlewares:
    auth:
      forwardAuth:
        address: https://auth.example.com/validate
        trustForwardHeader: true
        authResponseHeaders:
          - "X-User-Email"
          - "X-User-Name"
  
  routers:
    protected:
      rule: "Host(`app.example.com`)"
      middleware: auth
      service: app-service

When a request arrives, Traefik forwards it to your authentication service. That service returns 200 if authenticated, 401 if not. If authenticated, Traefik adds specified headers to the request before forwarding to your app⁶.

Security Middleware:

Rate limiting is essential for protecting against abuse:

http:
  middlewares:
    ratelimit:
      rateLimit:
        average: 100
        burst: 200
        period: 1m

This allows 100 requests per minute on average, with bursts up to 200⁷. We'll discuss rate limiting strategy in more detail later.

IP whitelisting for admin interfaces:

http:
  middlewares:
    admin-ips:
      ipWhiteList:
        sourceRange:
          - "10.0.0.0/8"
          - "192.168.0.0/16"
          - "203.0.113.1"  # Specific office IP

Transformation Middleware:

Path rewriting is common when migrating from legacy routing:

http:
  middlewares:
    strip-api:
      stripPrefix:
        prefixes:
          - /api/v2
          - /api/v1

A request to /api/v2/users becomes /users before reaching your service.

Header manipulation for security:

http:
  middlewares:
    security-headers:
      headers:
        browserXSSFilter: true
        contentTypeNosniff: true
        forceSTSHeader: true
        stsIncludeSubdomains: true
        stsPreload: true
        stsSeconds: 31536000  # 1 year
        customFrameOptionsValue: "SAMEORIGIN"

Observability Middleware:

Access logging is enabled by default in Traefik, but you can customize formats:

accessLog:
  format: json
  filters:
    statusCodes:
      - "400-599"
  bufferingSize: 100

For distributed tracing with Jaeger:

tracing:
  jaeger:
    samplingServerURL: http://jaeger:5778/sampling
    localAgentHostPort: "jaeger:6831"
    traceContextHeaderName: "uber-trace-id"

Observability and Dashboard

Traefik provides an unusually high level of observability out of the box, which is crucial for production deployments.

Real-Time Dashboard:

The Traefik dashboard gives you immediate visibility into what's routing where:

# Enable the dashboard in Traefik configuration
api:
  dashboard: true
  insecure: false  # Set to true only for development

Access it at https://your-traefik-host:8080 or via the /dashboard/ endpoint. The dashboard shows:

Active routers and their rules
Service health and backend status
Request metrics (rate, latency, status codes)
Real-time request tracing

Of course, in production you should secure the dashboard properly — either behind authentication or accessible only from internal networks.

Metrics Integration:

Traefik exposes Prometheus metrics at /metrics/prometheus:

# Check if metrics are available
curl http://localhost:8080/metrics/prometheus | head -20

You'll see metrics like⁸:

# HELP traefik_service_status_total Number of status code responses
# TYPE traefik_service_status_total counter
traefik_service_status_total{service="myapp",status="200"} 1245
traefik_service_status_total{service="myapp",status="500"} 3

# HELP traefik_backend_last_connection_time Timestamp of the last connection to the backend
# TYPE traefik_backend_last_connection_time gauge
traefik_backend_last_connection_time{backend="myapp-backend",server="10.0.1.5:8080"} 1.73456789e9

Set up Prometheus to scrape these metrics, then create Grafana dashboards for visualization.

Distributed Tracing:

For microservices, understanding request flows across services is essential. Traefik integrates with Jaeger, Zipkin, Datadog APM, and OpenTelemetry:

tracing:
  jaeger:
    samplingServerURL: http://jaeger:5778/sampling
    localAgentHostPort: "jaeger:6831"

When enabled, Traefik propagates trace context through requests, allowing you to see the full path of a request through your system⁹.

High Availability and Scaling

Traefik supports various high availability configurations. Unlike some proxies that rely on shared config stores, Traefik's approach is elegantly simple.

Horizontal Scaling:

Run multiple Traefik instances behind a load balancer:

Internet → [Load Balancer] → [Traefik 1] → [Services]
                              → [Traefik 2] → [Services]
                              → [Traefik N] → [Services]

Key points:

No shared state required: Each Traefik instance discovers services independently¹⁰
No sticky sessions needed: Traefik is stateless with respect to routing
Configuration consistency: All instances should have identical configuration

Configuration Sources:

Traefik can obtain configuration from multiple sources simultaneously:

File provider: Static configuration from YAML/TOML files
Kubernetes provider: Ingress resources and CRDs
Docker provider: Container labels
Consul/Etcd: Distributed configuration for advanced scenarios
Kubernetes IngressClass: Select which Traefik instance handles which ingress

You can mix these providers — for example, use Kubernetes for most services but file provider for manual routes.

Health Checks:

Traefik performs active health checks on backends:

http:
  services:
    myapp:
      loadBalancer:
        healthCheck:
          path: /health
          interval: 30s
          timeout: 5s
          healthyThreshold: 2
          unhealthyThreshold: 3

Unhealthy backends are automatically removed from rotation until they recover. This is crucial for zero-downtime deployments — Traefik stops sending traffic to instances as they're being replaced¹¹.

When Traefik Makes Sense

Now that we've examined Traefik's capabilities, let's consider when it's the right choice. This is a critical decision — choosing the wrong tool creates unnecessary complexity.

Traefik particularly excels in these scenarios:

Organizations Heavily Invested in Kubernetes:

If your team already uses Kubernetes and is comfortable with its patterns, Traefik fits naturally:

Native Kubernetes Ingress Controller
Configuration as Kubernetes resources (ingress, middleware, services)
kubectl and GitOps workflows extend to routing
Namespaces and multi-tenancy support
- Helm charts for easy installation

The learning curve is minimal for teams already skilled in Kubernetes. Your existing CI/CD pipelines can manage Traefik configuration alongside application deployments¹².

Complex Microservices Architectures:

When you have dozens or hundreds of services with dynamic scaling:

Automatic service discovery eliminates configuration overhead
Advanced routing handles complex scenarios (canary, A/B testing)
Middleware chain handles cross-cutting concerns centrally
Service mesh capabilities via Traefik Mesh (for service-to-service communication)¹³

Multi-Environment Deployments:

Traefik provides consistent tooling across development, staging, and production:

Development: Docker Compose with labels
Staging: Kubernetes with ingress resources
Production: Multi-cluster Kubernetes with advanced routing

The same fundamental concepts apply everywhere, reducing cognitive load.

DevOps-Mature Organizations:

Teams with established infrastructure-as-code practices find Traefik aligns perfectly:

GitOps workflows for routing changes
Configuration reviewed via pull requests
Automated testing of routing rules
Comprehensive monitoring with metrics and tracing¹⁴

If your organization already operates this way, Traefik integrates seamlessly.

Organizations Needing Advanced Features:

Traefik's middleware ecosystem provides capabilities that are difficult to implement otherwise:

Circuit breakers and retry logic
Rate limiting with burst capacity
Sophisticated authentication flows¹⁵
Header manipulation for legacy API compatibility

Best for

Let's be specific about where Traefik's strengths align with organizational needs.

Strengths:

Deep integration with container orchestrators, especially Kubernetes
Automatic service discovery with zero configuration for basic cases
Advanced middleware capabilities for complex routing needs
Comprehensive observability (dashboard, Prometheus, Jaeger)
Active community and commercial support from Traefik Labs
Regular releases with new features

Enterprises with sophisticated DevOps practices: Teams that have invested in Kubernetes, infrastructure-as-code, and GitOps workflows will find Traefik extends naturally from those practices.

Organizations heavily invested in Kubernetes or Docker Swarm: Traefik's native integration makes it the natural choice for container-heavy environments. The configuration patterns are consistent with how you already manage applications.

Environments requiring advanced routing and load balancing: Complex routing logic, traffic splitting, and sophisticated middleware needs favor Traefik's approach.

Considerations

No tool is perfect for every situation. Let's examine the trade-offs honestly.

Challenges:

Initial complexity: For simple use cases, Traefik's feature set can be overwhelming. You need to understand concepts like providers, middlewares, services, and routers.
Kubernetes knowledge required: To use Traefik effectively in production, you need solid Kubernetes understanding. This is not a barrier for Kubernetes shops but makes Traefik a poor choice if you're avoiding Kubernetes.
Configuration can become complex: Advanced scenarios involve multiple CRDs, middleware chains, and careful ordering. Debugging complex configurations requires familiarity with Traefik's internal model.
Resource overhead: Traefik uses more memory than minimal proxies (typically 100-300MB). For very simple deployments, this may matter¹⁶.

Initial Complexity:

Teams new to container orchestration or infrastructure-as-code may find Traefik's approach initially challenging. The concept of declarative routing — where configuration emerges from infrastructure state rather than static files — represents a shift in mindset.

This isn't a flaw in Traefik — it's a consequence of solving the dynamic configuration problem¹⁷. Manual configuration (Nginx style) is simpler conceptually but doesn't scale in dynamic environments. You're choosing where to place complexity: in initial learning or in ongoing operations¹⁸.

Operational Overhead:

While automation reduces ongoing work, the initial setup and learning curve can be significant. Plan for:

Team training on Traefik concepts and Kubernetes ingress patterns
Developing internal best practices for configuration organization
Setting up monitoring and alerting
Creating runbooks for common scenarios

We'll cover a phased implementation approach later in this article.

Implementation Approach

For organizations adopting Traefik, we recommend a phased approach that minimizes risk while building team expertise.

Phase 1: Start Simple (1-2 weeks)

Begin in a development or staging environment with a single non-critical service:

Deploy Traefik using the official Helm chart or Docker Compose
Configure basic routing for one service using ingress or Docker labels
Explore the dashboard to understand current state
Test automatic discovery by deploying and scaling services
Document learnings and initial configuration patterns

At this stage, don't worry about advanced features. Just establish that basic routing works reliably¹⁹.

Example deployment for Phase 1 using Docker Compose:

version: '3.8'
services:
  traefik:
    image: traefik:v3.0
    command:
      - "--api.dashboard=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"  # Dashboard
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    networks:
      - traefik-public

  whoami:
    image: traefik/whoami
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.whoami.rule=Host(`whoami.localhost`)"
      - "traefik.http.routers.whoami.entrypoints=web"
    networks:
      - traefik-public

networks:
  traefik-public:

Phase 2: Expand Capabilities (2-4 weeks)

Once basic routing works reliably, add middleware and advanced features²⁰:

Add authentication (basic auth or forward auth) to protect admin interfaces
Implement rate limiting for public APIs
Set up TLS with Let's Encrypt (automatic HTTPS)
Configure health checks for production readiness
Integrate monitoring (Prometheus metrics, dashboard alerts)
Implement canary deployments using weighted routing
Refine configuration patterns based on team feedback

During this phase, you'll learn which middleware you actually use and which you don't. Stick to essential features initially²¹.

Phase 3: Production Deployment (2-3 weeks)

Deploy to production with production-like traffic patterns:

Deploy to staging environment with production configuration
Validate high availability with multiple Traefik instances
Test failover by killing Traefik pods/containers
Establish runbooks for common operations (debugging, updating, rollbacks)
Train operations team on troubleshooting and maintenance
Set up alerts based on Traefik metrics
Load test to verify performance under expected traffic

Don't skip staging — production deployment of infrastructure components should always be preceded by realistic testing²².

Phase 4: Scale and Optimize (ongoing)

After stable production deployment²³:

Migrate remaining services gradually, not all at once
Optimize performance based on metrics: tune connection pools, adjust timeouts
Implement advanced features as needed: service mesh, custom plugins
Continuous improvement: regular review of configurations, update Traefik versions
Share knowledge: document patterns, conduct training for new team members

Practical Configuration Examples

Let's provide some complete, working configurations you can adapt.

Basic HTTPS with Let's Encrypt

# traefik.yml - static configuration
entryPoints:
  web:
    address: ":80"
    # Redirect all HTTP to HTTPS
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
          permanent: true

  websecure:
    address: ":443"
    http:
      tls:
        certResolver: letsencrypt

providers:
  file:
    filename: /etc/traefik/dynamic.yml
    watch: true
  kubernetes:
    ingressClass: traefik-internal

certificatesResolvers:
  letsencrypt:
    acme:
      email: [email protected]
      storage: /etc/traefik/acme.json
      httpChallenge:
        entryPoint: web

# dynamic.yml - dynamic configuration
http:
  routers:
    myapp:
      rule: "Host(`app.example.com`)"
      service: myapp-service
      tls:
        certResolver: letsencrypt
        domains:
          - main: "app.example.com"
            sans:
              - "www.app.example.com"

  services:
    myapp-service:
      loadBalancer:
        servers:
          - url: "http://myapp:8080"
        healthCheck:
          path: /health
          interval: 30s

This configuration²⁴:

Listens on port 80 and 443
Redirects HTTP to HTTPS permanently
Obtains and renews certificates automatically via Let's Encrypt
Routes app.example.com to your service
Performs health checks to remove unhealthy backends

Kubernetes with IngressClass

# traefik-deploy.yaml - Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: traefik
  namespace: traefik-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      labels:
        app: traefik
    spec:
      serviceAccountName: traefik
      containers:
      - name: traefik
        image: traefik:v3.0
        args:
        - --api.dashboard=true
        - --api.insecure=false
        - --providers.kubernetescrd
        - --providers.kubernetesingress
        - --providers.kubernetesingress.ingressclass=traefik-internal
        - --entrypoints.web.address=:80
        - --entrypoints.websecure.address=:443
        - --certificatesresolvers.letsencrypt.acme.tlschallenge=true
        - [email protected]
        - --certificatesresolvers.letsencrypt.acme.storage=/data/acme.json
        ports:
        - name: web
          containerPort: 80
        - name: websecure
          containerPort: 443
        - name: admin
          containerPort: 8080
        volumeMounts:
        - name: data
          mountPath: /data
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: traefik-data
---
apiVersion: v1
kind: Service
metadata:
  name: traefik
  namespace: traefik-system
spec:
  selector:
    app: traefik
  ports:
  - name: http
    port: 80
    targetPort: web
  - name: https
    port: 443
    targetPort: websecure
  - name: admin
    port: 8080
    targetPort: admin
  type: LoadBalancer
---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: traefik-internal
spec:
  controller: traefik.io/ingress-controller

This sets up Traefik as a highly available ingress controller in Kubernetes with automatic HTTPS²⁵.

Rate Limiting Configuration

For public APIs, rate limiting is essential:

http:
  middlewares:
    api-rate-limit:
      rateLimit:
        average: 100
        burst: 200
        period: 1m
        sourceCriterion:
          ipStrategy:
            depth: 1
          requestHostName: true

  routers:
    api:
      rule: "Host(`api.example.com`) && PathPrefix(`/v1`)"
      middleware: api-rate-limit
      service: api-service

The sourceCriterion with requestHostName: true applies rate limiting per domain, which is useful when multiple domains point to the same API.

Circuit Breaker

For resilience, implement circuit breakers:

http:
  services:
    backend:
      loadBalancer:
        healthCheck:
          path: /health
          interval: 30s
        passHostHeader: true
        responseForwarding:
          flushInterval: 100ms
        servers:
          - url: "http://backend-1:8080"
          - url: "http://backend-2:8080"
        circuits:
          expression: "NetworkErrorRatio() > 0.50"
          window: 10s
          sleepWindow: 10s
          threshold: 5

If the error rate exceeds 50% for 10 seconds, Traefik stops sending requests for 10 seconds (circuit open), then gradually tests if service has recovered.

Common Pitfalls and Troubleshooting

Even with careful planning, issues arise. Let's examine common problems and their solutions.

Configuration Not Applying

Symptom: You've updated Traefik configuration but nothing changes.

Causes:

Dynamic configuration not reloading:
- Check if provider file watching is enabled (--providers.file.watch=true)
- Verify file path is correct
- Check Traefik logs for parsing errors
Router/service not found:
- Verify middleware references exist
- Check for typos in router service field
- Ensure providers section includes the provider source

Solution:

# Check Traefik logs for errors
docker logs traefik --tail 100
kubectl logs -n traefik-system deployment/traefik

# Verify configuration is loaded
curl http://localhost:8080/api/http/routers | jq .

Health Checks Failing

Symptom: Backends marked unhealthy, no traffic flowing.

Causes:

Health check path returns non-2xx status
Health check interval too short for app startup time
Network/firewall blocking health check requests

Solution:

Verify health endpoint works independently: curl http://backend:port/health
Adjust health check configuration: increase interval, timeout, or threshold
Check network connectivity between Traefik and backends

TLS Certificate Issues

Symptom: Browser shows certificate warnings, or Let's Encrypt challenges failing.

Causes:

Port 80/443 not exposed (required for HTTP-01 challenge)
DNS not pointing to Traefik correctly
Rate limits from Let's Encrypt exceeded

Solution:

Ensure ports 80 and 443 are publicly accessible
Verify DNS A record resolves to Traefik's IP
Check Traefik logs for ACME-specific errors
Use --certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v2.api.letsencrypt.org/directory for testing

High Memory Usage

Symptom: Traefik pod/container using excessive memory.

Causes:

Too many concurrent connections without limits
Large number of routers/services (thousands)
Debug logging enabled in production
Dashboard enabled without access restrictions

Solution:

Set appropriate resource limits in Kubernetes
Reduce log level: --log.level=INFO (not DEBUG)
Disable dashboard in production or protect it
Consolidate middleware where possible

CORS Errors

Symptom: Browser console shows CORS policy blocking requests.

Solution: Add CORS middleware:

http:
  middlewares:
    cors:
      headers:
        accessControlAllowOrigin:
          - "https://app.example.com"
        accessControlAllowMethods:
          - "GET"
          - "POST"
          - "PUT"
          - "DELETE"
          - "OPTIONS"
        accessControlAllowHeaders:
          - "Authorization"
          - "Content-Type"
        accessControlMaxAge: 100
        addVaryHeader: true

Apply to routers: middleware: cors@file²⁶.

Metrics Not Appearing in Prometheus

Symptom: Prometheus targets show as down or metrics missing.

Causes:

Metrics endpoint not exposed
Prometheus scrape configuration wrong port/path
Network policies blocking Prometheus access

Solution:

# Enable metrics in Traefik
--metrics.prometheus=true
--metrics.prometheus.entryPoint=traefik

# Create ServiceMonitor (if using Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: traefik-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: traefik
  endpoints:
  - port: traefik
    interval: 30s
    path: /metrics/prometheus

Comparing Traefik with Alternatives

We're examining Traefik in a series that includes Caddy. Let's clarify how Traefik compares, and when you might choose differently.

Traefik vs Caddy

This comparison will be covered in-depth in Parts 3 and 4, but let's establish high-level distinctions:

Aspect	Traefik	Caddy
Primary Use Case	Kubernetes/container orchestration	Simplicity and automatic HTTPS
Configuration	Declarative (YAML, CRDs)	Caddyfile (simple) or JSON
Learning Curve	Moderate to steep	Gentle
Feature Depth	Very extensive middleware ecosystem	Comprehensive but more opinionated
Auto-HTTPS	Yes, via Let's Encrypt	Yes, automatic and default
Kubernetes	Native ingress controller	Supported but first-class is Caddyfile
Community	Large, open source with commercial options	Strong open source community

Rough rule of thumb:

Choose Traefik if you're heavily invested in Kubernetes and need advanced routing capabilities
Choose Caddy if you prioritize simplicity, excellent documentation, and straightforward HTTP/TLS needs

Traefik vs Traditional Proxies

Comparing to Nginx or HAProxy:

Aspect	Traefik	Nginx/HAProxy
Configuration	Dynamic, automatic discovery	Static configuration files
Reloads	Hot reload, zero downtime	Master process reload (brief downtime risk)
Service Discovery	Built-in for Docker, Kubernetes, Consul	Requires external tools/scripts
Configuration Management	Declarative, version-controlled	Manual updates or configuration management
Initial Complexity	Higher learning curve	Lower for simple setups
Operational Overhead	Low (automatic)	High (manual updates)
Performance	Very good	Excellent (mature, optimized)

The trade-off is clear: Traefik trades some simplicity and raw performance for dramatically reduced operational overhead. If your infrastructure changes frequently, Traefik's automation usually outweighs its complexity cost.

When NOT to Use Traefik

Be honest about limitations — Traefik isn't right for every scenario:

Simple static sites: Caddy or even Nginx is simpler
Non-containerized legacy applications: Traefik's value proposition is weakest here
Extreme performance requirements: Nginx or HAProxy may have slight edge (though Traefik is fast)
Very constrained resources (ultra-low memory): Minimal proxies exist
Teams without container orchestration experience: Consider the learning curve

If you fall into these categories, that's fine — use the right tool for your needs. We'll cover Caddy in Part 3 as an alternative for simpler deployments.

Conclusion

Traefik represents a powerful option for organizations with cloud-native architectures and sophisticated DevOps practices. Its deep Kubernetes integration, extensive middleware ecosystem, and automatic service discovery make it particularly well-suited for container-heavy environments.

The key insight from examining Traefik is this: its value goes beyond technical — it's operational. By automatically discovering services and configuring routing, Traefik reduces toil, prevents configuration drift, and enables GitOps workflows. These benefits compound over time, especially in dynamic environments where services scale, change, and evolve frequently.

Over the coming articles in this series, we'll examine Caddy's simplicity-first approach, then provide a direct comparison to help you choose the right tool for your specific needs. Finally, we'll cover implementation best practices that apply regardless of which tool you select.

Optimizing your reverse proxy setup? Learn how our Infrastructure Consulting can streamline your operations.

OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/ ↩
Kubernetes. "Ingress Resources." Kubernetes Documentation, 2026. https://kubernetes.io/docs/concepts/services-networking/ingress/ ↩
Google. "Service Level Objectives." Google SRE Workbook. https://sre.google/resources/practices-and-policies/service-level-objectives/ ↩
OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/ ↩
Netflix. "Chaos Engineering." Netflix Tech Blog. https://netflixtechblog.com/ ↩
OneUptime. "How to Build API SLO Dashboards (Availability, Latency P99, Error Budget) from OpenTelemetry Metrics." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-api-slo-dashboards-opentelemetry-metrics/view ↩
OneUptime. "How to Define and Enforce Performance Budgets Using OpenTelemetry P50/P95/P99 Latency Histograms." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-otel-performance-budgets-latency-histograms/view ↩
OptyxStack. "Latency Distributions in Practice: Reading P50/P95/P99 Without Fooling Yourself." OptyxStack, February 2, 2026. https://optyxstack.com/performance/latency-distributions-in-practice-reading-p50-p95-p99-without-fooling-yourself ↩
W3C. "Trace Context Specification." W3C Trace Context, 2026. https://www.w3.org/TR/trace-context/ ↩
Prometheus. "Prometheus Documentation." Prometheus.io, 2026. https://prometheus.io/docs/ ↩
Grafana Labs. "Grafana Documentation." Grafana, 2026. https://grafana.com/docs/ ↩
Google. "Service Level Objectives." Google SRE Workbook. https://sre.google/resources/practices-and-policies/service-level-objectives/ ↩
Traefik Labs. "Traefik Mesh Documentation." Traefik, 2026. https://docs.traefik.io/traefik-mesh/ ↩
Cloud Native Computing Foundation. "CNCF Landscape." CNCF, 2026. https://landscape.cncf.io/ ↩
OneUptime. "How to Build API SLO Dashboards (Availability, Latency P99, Error Budget) from OpenTelemetry Metrics." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-api-slo-dashboards-opentelemetry-metrics/view ↩
OneUptime. "How to Define and Enforce Performance Budgets Using OpenTelemetry P50/P95/P99 Latency Histograms." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-otel-performance-budgets-latency-histograms/view ↩
OptyxStack. "Latency Distributions in Practice: Reading P50/P95/P99 Without Fooling Yourself." OptyxStack, February 2, 2026. https://optyxstack.com/performance/latency-distributions-in-practice-reading-p50-p95-p99-without-fooling-yourself ↩
Google. "Site Reliability Engineering: How Google Runs Production Systems." O'Reilly Media, 2016. https://sre.google/books/ ↩
OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/ ↩
Prometheus. "Prometheus Documentation." Prometheus.io, 2026. https://prometheus.io/docs/ ↩
OneUptime. "How to Define and Enforce Performance Budgets Using OpenTelemetry P50/P95/P99 Latency Histograms." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-otel-performance-budgets-latency-histograms/view ↩
Google. "Site Reliability Engineering: How Google Runs Production Systems." O'Reilly Media, 2016. https://sre.google/books/ ↩
OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/ ↩
Prometheus. "Prometheus Documentation." Prometheus.io, 2026. https://prometheus.io/docs/ ↩
Grafana Labs. "Grafana Documentation." Grafana, 2026. https://grafana.com/docs/ ↩
OneUptime. "How to Build API SLO Dashboards (Availability, Latency P99, Error Budget) from OpenTelemetry Metrics." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-api-slo-dashboards-opentelemetry-metrics/view ↩

Traefik: The Cloud-Native Orchestrator

Deep Integration with Container Platforms

Advanced Routing Capabilities

Middleware and Request Transformation

Observability and Dashboard

High Availability and Scaling

When Traefik Makes Sense

Best for

Considerations

Implementation Approach

Practical Configuration Examples

Basic HTTPS with Let's Encrypt

Kubernetes with IngressClass

Rate Limiting Configuration

Circuit Breaker

Common Pitfalls and Troubleshooting

Configuration Not Applying

Health Checks Failing

TLS Certificate Issues

High Memory Usage

CORS Errors

Metrics Not Appearing in Prometheus

Comparing Traefik with Alternatives

Traefik vs Caddy

Traefik vs Traditional Proxies

When NOT to Use Traefik

Conclusion

Footnotes