Modern Reverse Proxies Part 2: Traefik - The Cloud-Native Orchestrator

February 16, 2026

In Part 1 of our series on modern reverse proxies, we established the business case for dynamic service discovery and automated configuration. We saw how traditional proxies like Nginx and Apache create operational bottlenecks in cloud-native environments. Now, in this second installment, we'll take a deep dive into Traefik — exploring its architecture, capabilities, and practical deployment considerations.

Before we get into the specifics of Traefik, let's establish what we're examining: Traefik is a cloud-native reverse proxy and load balancer designed specifically for containerized environments. It automatically discovers services and configures routing without manual intervention. Over the course of this article, we'll examine how Traefik works, when it makes sense to use it, and what practical considerations you should keep in mind.

Traefik: The Cloud-Native Orchestrator

Traefik excels in complex, container-based environments, especially with Kubernetes or Docker Swarm. Its deep integration streamlines routing management in large-scale microservice architectures.

One may wonder: what makes Traefik different from traditional reverse proxies? The answer lies in its architecture. Where Nginx or Apache require you to manually update configuration files whenever services change, Traefik actively monitors your infrastructure and adapts automatically. This goes beyond a convenience feature — it fundamentally changes how you manage routing in dynamic environments1.

Traefik excels with deep integration into container orchestration platforms like Kubernetes, Docker, and AWS. It automatically discovers services and configures routing, making it valuable for organizations heavily invested in containerization.

Deep Integration with Container Platforms

Traefik's most distinctive capability is its native integration with container orchestration platforms. This integration means you define routing rules alongside your application definitions, enabling infrastructure-as-code practices for your entire stack.

Kubernetes Integration:

Traefik operates as a Kubernetes Ingress Controller, meaning it integrates directly with Kubernetes' own ingress resources:

# Example: Kubernetes Ingress resource for Traefik
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    # Traefik-specific annotations for advanced routing
    traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
    traefik.ingress.kubernetes.io/router.tls: "true"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-service
            port:
              number: 8080
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls-secret

Beyond basic ingress, Traefik extends Kubernetes with Custom Resource Definitions (CRDs) for advanced routing scenarios that aren't possible with standard ingress:

# Example: Traefik's Middleware CRD for request transformation
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: strip-prefix
spec:
  stripPrefix:
    prefixes:
      - /api

These CRDs give you fine-grained control over rate limiting, authentication, header manipulation, and more — all defined as Kubernetes resources2.

Docker Integration:

For Docker environments without orchestration, Traefik uses Docker labels to discover services:

# Example: docker-compose.yml with Traefik labels
version: '3.8'
services:
  myapp:
    image: myapp:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.myapp.rule=Host(`myapp.localhost`)"
      - "traefik.http.services.myapp.loadbalancer.server.port=8080"
      - "traefik.http.middlewares.test-ratelimit.ratelimit.avg=100"
      - "traefik.http.routers.myapp.middlewares=test-ratelimit@docker"
    ports:
      - "8080:8080"

When you start this container, Traefik automatically detects it through the Docker socket and configures routing based on those labels — no manual configuration needed.

Cloud Provider Integration:

Traefik integrates with major cloud providers' container services:

  • AWS EKS: Native integration as a Kubernetes ingress controller
  • Google Cloud Run: Automatic service discovery
  • Azure Container Instances: Support via Kubernetes
  • Multi-cloud: Consistent configuration across providers

This integration means routing configuration lives alongside application definitions. Consider what this enables3:

  • Version control for entire infrastructure: Your routing rules are in Git alongside your application code
  • GitOps workflows: Changes reviewed, tested, and deployed systematically
  • Reduced context switching: No separate configuration management system
  • Single source of truth: Infrastructure defined in one place

Of course, this benefits most organizations heavily invested in containerization. If you're running traditional monoliths on virtual machines, Traefik's value proposition is less compelling4.

Advanced Routing Capabilities

Traefik provides sophisticated routing options that go well beyond simple path-based routing. Let's examine what's available with concrete examples.

Path-Based Routing:

The most common pattern: route different URL paths to different services:

# Traefik static configuration (file provider)
http:
  routers:
    api:
      rule: "PathPrefix(`/api`)"
      service: api-service
    admin:
      rule: "PathPrefix(`/admin`)"
      service: admin-service
    frontend:
      rule: "PathPrefix(`/`)"
      service: frontend-service

In this configuration, requests to /api/* go to the API service, /admin/* to the admin interface, and everything else to the frontend.

Host-Based Routing:

Route based on domain names — essential for multi-tenant applications:

http:
  routers:
    api:
      rule: "Host(`api.example.com`)"
      service: api-service
    www:
      rule: "Host(`www.example.com`)"
      service: frontend-service
    admin:
      rule: "Host(`admin.example.com`)"
      service: admin-service

Header-Based Routing:

Route based on custom headers — useful for A/B testing, canary deployments, or feature flags:

http:
  routers:
    canary:
      rule: "Host(`app.example.com`) && Headers(`X-Canary`, `true`)"
      service: canary-service
    stable:
      rule: "Host(`app.example.com`)"
      service: stable-service

In this configuration, only requests with header X-Canary: true go to the canary service; all others go to stable5.

Weighted Round-Robin:

Gradually shift traffic between versions — critical for safe rollouts:

http:
  services:
    myapp:
      weighted:
        services:
          - name: myapp-v1
            weight: 90  # 90% of traffic
          - name: myapp-v2
            weight: 10  # 10% of traffic

Start with 90/10 split, monitor, then gradually increase traffic to v2. If issues arise, you can immediately revert by adjusting weights.

Priority-Based Routing:

When multiple routes could match, Traefik uses priority to determine which takes precedence:

http:
  routers:
    specific:
      rule: "Host(`app.example.com`) && PathPrefix(`/admin`)"
      priority: 100
      service: admin-service
    general:
      rule: "Host(`app.example.com`)"
      priority: 10
      service: frontend-service

The /admin route has higher priority, so admin requests match that router rather than the general frontend router.

Middleware and Request Transformation

Traefik's middleware system allows you to modify requests and responses without changing your applications. Middlewares are chained together to create powerful processing pipelines.

Authentication Middleware:

Traefik supports several authentication approaches:

  • Basic Auth: Simple username/password protection
  • Digest Auth: More secure than basic (challenge-response)
  • Forward Auth: Delegate authentication to external service

Let's look at a concrete example of forward auth:

http:
  middlewares:
    auth:
      forwardAuth:
        address: https://auth.example.com/validate
        trustForwardHeader: true
        authResponseHeaders:
          - "X-User-Email"
          - "X-User-Name"
  
  routers:
    protected:
      rule: "Host(`app.example.com`)"
      middleware: auth
      service: app-service

When a request arrives, Traefik forwards it to your authentication service. That service returns 200 if authenticated, 401 if not. If authenticated, Traefik adds specified headers to the request before forwarding to your app6.

Security Middleware:

Rate limiting is essential for protecting against abuse:

http:
  middlewares:
    ratelimit:
      rateLimit:
        average: 100
        burst: 200
        period: 1m

This allows 100 requests per minute on average, with bursts up to 2007. We'll discuss rate limiting strategy in more detail later.

IP whitelisting for admin interfaces:

http:
  middlewares:
    admin-ips:
      ipWhiteList:
        sourceRange:
          - "10.0.0.0/8"
          - "192.168.0.0/16"
          - "203.0.113.1"  # Specific office IP

Transformation Middleware:

Path rewriting is common when migrating from legacy routing:

http:
  middlewares:
    strip-api:
      stripPrefix:
        prefixes:
          - /api/v2
          - /api/v1

A request to /api/v2/users becomes /users before reaching your service.

Header manipulation for security:

http:
  middlewares:
    security-headers:
      headers:
        browserXSSFilter: true
        contentTypeNosniff: true
        forceSTSHeader: true
        stsIncludeSubdomains: true
        stsPreload: true
        stsSeconds: 31536000  # 1 year
        customFrameOptionsValue: "SAMEORIGIN"

Observability Middleware:

Access logging is enabled by default in Traefik, but you can customize formats:

accessLog:
  format: json
  filters:
    statusCodes:
      - "400-599"
  bufferingSize: 100

For distributed tracing with Jaeger:

tracing:
  jaeger:
    samplingServerURL: http://jaeger:5778/sampling
    localAgentHostPort: "jaeger:6831"
    traceContextHeaderName: "uber-trace-id"

Observability and Dashboard

Traefik provides an unusually high level of observability out of the box, which is crucial for production deployments.

Real-Time Dashboard:

The Traefik dashboard gives you immediate visibility into what's routing where:

# Enable the dashboard in Traefik configuration
api:
  dashboard: true
  insecure: false  # Set to true only for development

Access it at https://your-traefik-host:8080 or via the /dashboard/ endpoint. The dashboard shows:

  • Active routers and their rules
  • Service health and backend status
  • Request metrics (rate, latency, status codes)
  • Real-time request tracing

Of course, in production you should secure the dashboard properly — either behind authentication or accessible only from internal networks.

Metrics Integration:

Traefik exposes Prometheus metrics at /metrics/prometheus:

# Check if metrics are available
curl http://localhost:8080/metrics/prometheus | head -20

You'll see metrics like8:

# HELP traefik_service_status_total Number of status code responses
# TYPE traefik_service_status_total counter
traefik_service_status_total{service="myapp",status="200"} 1245
traefik_service_status_total{service="myapp",status="500"} 3

# HELP traefik_backend_last_connection_time Timestamp of the last connection to the backend
# TYPE traefik_backend_last_connection_time gauge
traefik_backend_last_connection_time{backend="myapp-backend",server="10.0.1.5:8080"} 1.73456789e9

Set up Prometheus to scrape these metrics, then create Grafana dashboards for visualization.

Distributed Tracing:

For microservices, understanding request flows across services is essential. Traefik integrates with Jaeger, Zipkin, Datadog APM, and OpenTelemetry:

tracing:
  jaeger:
    samplingServerURL: http://jaeger:5778/sampling
    localAgentHostPort: "jaeger:6831"

When enabled, Traefik propagates trace context through requests, allowing you to see the full path of a request through your system9.

High Availability and Scaling

Traefik supports various high availability configurations. Unlike some proxies that rely on shared config stores, Traefik's approach is elegantly simple.

Horizontal Scaling:

Run multiple Traefik instances behind a load balancer:

Internet → [Load Balancer] → [Traefik 1] → [Services]
                              → [Traefik 2] → [Services]
                              → [Traefik N] → [Services]

Key points:

  • No shared state required: Each Traefik instance discovers services independently10
  • No sticky sessions needed: Traefik is stateless with respect to routing
  • Configuration consistency: All instances should have identical configuration

Configuration Sources:

Traefik can obtain configuration from multiple sources simultaneously:

  • File provider: Static configuration from YAML/TOML files
  • Kubernetes provider: Ingress resources and CRDs
  • Docker provider: Container labels
  • Consul/Etcd: Distributed configuration for advanced scenarios
  • Kubernetes IngressClass: Select which Traefik instance handles which ingress

You can mix these providers — for example, use Kubernetes for most services but file provider for manual routes.

Health Checks:

Traefik performs active health checks on backends:

http:
  services:
    myapp:
      loadBalancer:
        healthCheck:
          path: /health
          interval: 30s
          timeout: 5s
          healthyThreshold: 2
          unhealthyThreshold: 3

Unhealthy backends are automatically removed from rotation until they recover. This is crucial for zero-downtime deployments — Traefik stops sending traffic to instances as they're being replaced11.

When Traefik Makes Sense

Now that we've examined Traefik's capabilities, let's consider when it's the right choice. This is a critical decision — choosing the wrong tool creates unnecessary complexity.

Traefik particularly excels in these scenarios:

Organizations Heavily Invested in Kubernetes:

If your team already uses Kubernetes and is comfortable with its patterns, Traefik fits naturally:

  • Native Kubernetes Ingress Controller
  • Configuration as Kubernetes resources (ingress, middleware, services)
  • kubectl and GitOps workflows extend to routing
  • Namespaces and multi-tenancy support
    • Helm charts for easy installation

The learning curve is minimal for teams already skilled in Kubernetes. Your existing CI/CD pipelines can manage Traefik configuration alongside application deployments12.

Complex Microservices Architectures:

When you have dozens or hundreds of services with dynamic scaling:

  • Automatic service discovery eliminates configuration overhead
  • Advanced routing handles complex scenarios (canary, A/B testing)
  • Middleware chain handles cross-cutting concerns centrally
  • Service mesh capabilities via Traefik Mesh (for service-to-service communication)13

Multi-Environment Deployments:

Traefik provides consistent tooling across development, staging, and production:

  • Development: Docker Compose with labels
  • Staging: Kubernetes with ingress resources
  • Production: Multi-cluster Kubernetes with advanced routing

The same fundamental concepts apply everywhere, reducing cognitive load.

DevOps-Mature Organizations:

Teams with established infrastructure-as-code practices find Traefik aligns perfectly:

  • GitOps workflows for routing changes
  • Configuration reviewed via pull requests
  • Automated testing of routing rules
  • Comprehensive monitoring with metrics and tracing14

If your organization already operates this way, Traefik integrates seamlessly.

Organizations Needing Advanced Features:

Traefik's middleware ecosystem provides capabilities that are difficult to implement otherwise:

  • Circuit breakers and retry logic
  • Rate limiting with burst capacity
  • Sophisticated authentication flows15
  • Header manipulation for legacy API compatibility

Best for

Let's be specific about where Traefik's strengths align with organizational needs.

Strengths:

  • Deep integration with container orchestrators, especially Kubernetes
  • Automatic service discovery with zero configuration for basic cases
  • Advanced middleware capabilities for complex routing needs
  • Comprehensive observability (dashboard, Prometheus, Jaeger)
  • Active community and commercial support from Traefik Labs
  • Regular releases with new features

Enterprises with sophisticated DevOps practices: Teams that have invested in Kubernetes, infrastructure-as-code, and GitOps workflows will find Traefik extends naturally from those practices.

Organizations heavily invested in Kubernetes or Docker Swarm: Traefik's native integration makes it the natural choice for container-heavy environments. The configuration patterns are consistent with how you already manage applications.

Environments requiring advanced routing and load balancing: Complex routing logic, traffic splitting, and sophisticated middleware needs favor Traefik's approach.

Considerations

No tool is perfect for every situation. Let's examine the trade-offs honestly.

Challenges:

  • Initial complexity: For simple use cases, Traefik's feature set can be overwhelming. You need to understand concepts like providers, middlewares, services, and routers.
  • Kubernetes knowledge required: To use Traefik effectively in production, you need solid Kubernetes understanding. This is not a barrier for Kubernetes shops but makes Traefik a poor choice if you're avoiding Kubernetes.
  • Configuration can become complex: Advanced scenarios involve multiple CRDs, middleware chains, and careful ordering. Debugging complex configurations requires familiarity with Traefik's internal model.
  • Resource overhead: Traefik uses more memory than minimal proxies (typically 100-300MB). For very simple deployments, this may matter16.

Initial Complexity:

Teams new to container orchestration or infrastructure-as-code may find Traefik's approach initially challenging. The concept of declarative routing — where configuration emerges from infrastructure state rather than static files — represents a shift in mindset.

This isn't a flaw in Traefik — it's a consequence of solving the dynamic configuration problem17. Manual configuration (Nginx style) is simpler conceptually but doesn't scale in dynamic environments. You're choosing where to place complexity: in initial learning or in ongoing operations18.

Operational Overhead:

While automation reduces ongoing work, the initial setup and learning curve can be significant. Plan for:

  • Team training on Traefik concepts and Kubernetes ingress patterns
  • Developing internal best practices for configuration organization
  • Setting up monitoring and alerting
  • Creating runbooks for common scenarios

We'll cover a phased implementation approach later in this article.

Implementation Approach

For organizations adopting Traefik, we recommend a phased approach that minimizes risk while building team expertise.

Phase 1: Start Simple (1-2 weeks)

Begin in a development or staging environment with a single non-critical service:

  1. Deploy Traefik using the official Helm chart or Docker Compose
  2. Configure basic routing for one service using ingress or Docker labels
  3. Explore the dashboard to understand current state
  4. Test automatic discovery by deploying and scaling services
  5. Document learnings and initial configuration patterns

At this stage, don't worry about advanced features. Just establish that basic routing works reliably19.

Example deployment for Phase 1 using Docker Compose:

version: '3.8'
services:
  traefik:
    image: traefik:v3.0
    command:
      - "--api.dashboard=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"  # Dashboard
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    networks:
      - traefik-public

  whoami:
    image: traefik/whoami
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.whoami.rule=Host(`whoami.localhost`)"
      - "traefik.http.routers.whoami.entrypoints=web"
    networks:
      - traefik-public

networks:
  traefik-public:

Phase 2: Expand Capabilities (2-4 weeks)

Once basic routing works reliably, add middleware and advanced features20:

  1. Add authentication (basic auth or forward auth) to protect admin interfaces
  2. Implement rate limiting for public APIs
  3. Set up TLS with Let's Encrypt (automatic HTTPS)
  4. Configure health checks for production readiness
  5. Integrate monitoring (Prometheus metrics, dashboard alerts)
  6. Implement canary deployments using weighted routing
  7. Refine configuration patterns based on team feedback

During this phase, you'll learn which middleware you actually use and which you don't. Stick to essential features initially21.

Phase 3: Production Deployment (2-3 weeks)

Deploy to production with production-like traffic patterns:

  1. Deploy to staging environment with production configuration
  2. Validate high availability with multiple Traefik instances
  3. Test failover by killing Traefik pods/containers
  4. Establish runbooks for common operations (debugging, updating, rollbacks)
  5. Train operations team on troubleshooting and maintenance
  6. Set up alerts based on Traefik metrics
  7. Load test to verify performance under expected traffic

Don't skip staging — production deployment of infrastructure components should always be preceded by realistic testing22.

Phase 4: Scale and Optimize (ongoing)

After stable production deployment23:

  1. Migrate remaining services gradually, not all at once
  2. Optimize performance based on metrics: tune connection pools, adjust timeouts
  3. Implement advanced features as needed: service mesh, custom plugins
  4. Continuous improvement: regular review of configurations, update Traefik versions
  5. Share knowledge: document patterns, conduct training for new team members

Practical Configuration Examples

Let's provide some complete, working configurations you can adapt.

Basic HTTPS with Let's Encrypt

# traefik.yml - static configuration
entryPoints:
  web:
    address: ":80"
    # Redirect all HTTP to HTTPS
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
          permanent: true

  websecure:
    address: ":443"
    http:
      tls:
        certResolver: letsencrypt

providers:
  file:
    filename: /etc/traefik/dynamic.yml
    watch: true
  kubernetes:
    ingressClass: traefik-internal

certificatesResolvers:
  letsencrypt:
    acme:
      email: [email protected]
      storage: /etc/traefik/acme.json
      httpChallenge:
        entryPoint: web
# dynamic.yml - dynamic configuration
http:
  routers:
    myapp:
      rule: "Host(`app.example.com`)"
      service: myapp-service
      tls:
        certResolver: letsencrypt
        domains:
          - main: "app.example.com"
            sans:
              - "www.app.example.com"

  services:
    myapp-service:
      loadBalancer:
        servers:
          - url: "http://myapp:8080"
        healthCheck:
          path: /health
          interval: 30s

This configuration24:

  • Listens on port 80 and 443
  • Redirects HTTP to HTTPS permanently
  • Obtains and renews certificates automatically via Let's Encrypt
  • Routes app.example.com to your service
  • Performs health checks to remove unhealthy backends

Kubernetes with IngressClass

# traefik-deploy.yaml - Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: traefik
  namespace: traefik-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      labels:
        app: traefik
    spec:
      serviceAccountName: traefik
      containers:
      - name: traefik
        image: traefik:v3.0
        args:
        - --api.dashboard=true
        - --api.insecure=false
        - --providers.kubernetescrd
        - --providers.kubernetesingress
        - --providers.kubernetesingress.ingressclass=traefik-internal
        - --entrypoints.web.address=:80
        - --entrypoints.websecure.address=:443
        - --certificatesresolvers.letsencrypt.acme.tlschallenge=true
        - [email protected]
        - --certificatesresolvers.letsencrypt.acme.storage=/data/acme.json
        ports:
        - name: web
          containerPort: 80
        - name: websecure
          containerPort: 443
        - name: admin
          containerPort: 8080
        volumeMounts:
        - name: data
          mountPath: /data
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: traefik-data
---
apiVersion: v1
kind: Service
metadata:
  name: traefik
  namespace: traefik-system
spec:
  selector:
    app: traefik
  ports:
  - name: http
    port: 80
    targetPort: web
  - name: https
    port: 443
    targetPort: websecure
  - name: admin
    port: 8080
    targetPort: admin
  type: LoadBalancer
---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: traefik-internal
spec:
  controller: traefik.io/ingress-controller

This sets up Traefik as a highly available ingress controller in Kubernetes with automatic HTTPS25.

Rate Limiting Configuration

For public APIs, rate limiting is essential:

http:
  middlewares:
    api-rate-limit:
      rateLimit:
        average: 100
        burst: 200
        period: 1m
        sourceCriterion:
          ipStrategy:
            depth: 1
          requestHostName: true

  routers:
    api:
      rule: "Host(`api.example.com`) && PathPrefix(`/v1`)"
      middleware: api-rate-limit
      service: api-service

The sourceCriterion with requestHostName: true applies rate limiting per domain, which is useful when multiple domains point to the same API.

Circuit Breaker

For resilience, implement circuit breakers:

http:
  services:
    backend:
      loadBalancer:
        healthCheck:
          path: /health
          interval: 30s
        passHostHeader: true
        responseForwarding:
          flushInterval: 100ms
        servers:
          - url: "http://backend-1:8080"
          - url: "http://backend-2:8080"
        circuits:
          expression: "NetworkErrorRatio() > 0.50"
          window: 10s
          sleepWindow: 10s
          threshold: 5

If the error rate exceeds 50% for 10 seconds, Traefik stops sending requests for 10 seconds (circuit open), then gradually tests if service has recovered.

Common Pitfalls and Troubleshooting

Even with careful planning, issues arise. Let's examine common problems and their solutions.

Configuration Not Applying

Symptom: You've updated Traefik configuration but nothing changes.

Causes:

  1. Dynamic configuration not reloading:

    • Check if provider file watching is enabled (--providers.file.watch=true)
    • Verify file path is correct
    • Check Traefik logs for parsing errors
  2. Router/service not found:

    • Verify middleware references exist
    • Check for typos in router service field
    • Ensure providers section includes the provider source

Solution:

# Check Traefik logs for errors
docker logs traefik --tail 100
kubectl logs -n traefik-system deployment/traefik

# Verify configuration is loaded
curl http://localhost:8080/api/http/routers | jq .

Health Checks Failing

Symptom: Backends marked unhealthy, no traffic flowing.

Causes:

  1. Health check path returns non-2xx status
  2. Health check interval too short for app startup time
  3. Network/firewall blocking health check requests

Solution:

  • Verify health endpoint works independently: curl http://backend:port/health
  • Adjust health check configuration: increase interval, timeout, or threshold
  • Check network connectivity between Traefik and backends

TLS Certificate Issues

Symptom: Browser shows certificate warnings, or Let's Encrypt challenges failing.

Causes:

  1. Port 80/443 not exposed (required for HTTP-01 challenge)
  2. DNS not pointing to Traefik correctly
  3. Rate limits from Let's Encrypt exceeded

Solution:

  • Ensure ports 80 and 443 are publicly accessible
  • Verify DNS A record resolves to Traefik's IP
  • Check Traefik logs for ACME-specific errors
  • Use --certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v2.api.letsencrypt.org/directory for testing

High Memory Usage

Symptom: Traefik pod/container using excessive memory.

Causes:

  1. Too many concurrent connections without limits
  2. Large number of routers/services (thousands)
  3. Debug logging enabled in production
  4. Dashboard enabled without access restrictions

Solution:

  • Set appropriate resource limits in Kubernetes
  • Reduce log level: --log.level=INFO (not DEBUG)
  • Disable dashboard in production or protect it
  • Consolidate middleware where possible

CORS Errors

Symptom: Browser console shows CORS policy blocking requests.

Solution: Add CORS middleware:

http:
  middlewares:
    cors:
      headers:
        accessControlAllowOrigin:
          - "https://app.example.com"
        accessControlAllowMethods:
          - "GET"
          - "POST"
          - "PUT"
          - "DELETE"
          - "OPTIONS"
        accessControlAllowHeaders:
          - "Authorization"
          - "Content-Type"
        accessControlMaxAge: 100
        addVaryHeader: true

Apply to routers: middleware: cors@file26.

Metrics Not Appearing in Prometheus

Symptom: Prometheus targets show as down or metrics missing.

Causes:

  1. Metrics endpoint not exposed
  2. Prometheus scrape configuration wrong port/path
  3. Network policies blocking Prometheus access

Solution:

# Enable metrics in Traefik
--metrics.prometheus=true
--metrics.prometheus.entryPoint=traefik

# Create ServiceMonitor (if using Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: traefik-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: traefik
  endpoints:
  - port: traefik
    interval: 30s
    path: /metrics/prometheus

Comparing Traefik with Alternatives

We're examining Traefik in a series that includes Caddy. Let's clarify how Traefik compares, and when you might choose differently.

Traefik vs Caddy

This comparison will be covered in-depth in Parts 3 and 4, but let's establish high-level distinctions:

Aspect Traefik Caddy
Primary Use Case Kubernetes/container orchestration Simplicity and automatic HTTPS
Configuration Declarative (YAML, CRDs) Caddyfile (simple) or JSON
Learning Curve Moderate to steep Gentle
Feature Depth Very extensive middleware ecosystem Comprehensive but more opinionated
Auto-HTTPS Yes, via Let's Encrypt Yes, automatic and default
Kubernetes Native ingress controller Supported but first-class is Caddyfile
Community Large, open source with commercial options Strong open source community

Rough rule of thumb:

  • Choose Traefik if you're heavily invested in Kubernetes and need advanced routing capabilities
  • Choose Caddy if you prioritize simplicity, excellent documentation, and straightforward HTTP/TLS needs

Traefik vs Traditional Proxies

Comparing to Nginx or HAProxy:

Aspect Traefik Nginx/HAProxy
Configuration Dynamic, automatic discovery Static configuration files
Reloads Hot reload, zero downtime Master process reload (brief downtime risk)
Service Discovery Built-in for Docker, Kubernetes, Consul Requires external tools/scripts
Configuration Management Declarative, version-controlled Manual updates or configuration management
Initial Complexity Higher learning curve Lower for simple setups
Operational Overhead Low (automatic) High (manual updates)
Performance Very good Excellent (mature, optimized)

The trade-off is clear: Traefik trades some simplicity and raw performance for dramatically reduced operational overhead. If your infrastructure changes frequently, Traefik's automation usually outweighs its complexity cost.

When NOT to Use Traefik

Be honest about limitations — Traefik isn't right for every scenario:

  • Simple static sites: Caddy or even Nginx is simpler
  • Non-containerized legacy applications: Traefik's value proposition is weakest here
  • Extreme performance requirements: Nginx or HAProxy may have slight edge (though Traefik is fast)
  • Very constrained resources (ultra-low memory): Minimal proxies exist
  • Teams without container orchestration experience: Consider the learning curve

If you fall into these categories, that's fine — use the right tool for your needs. We'll cover Caddy in Part 3 as an alternative for simpler deployments.

Conclusion

Traefik represents a powerful option for organizations with cloud-native architectures and sophisticated DevOps practices. Its deep Kubernetes integration, extensive middleware ecosystem, and automatic service discovery make it particularly well-suited for container-heavy environments.

The key insight from examining Traefik is this: its value goes beyond technical — it's operational. By automatically discovering services and configuring routing, Traefik reduces toil, prevents configuration drift, and enables GitOps workflows. These benefits compound over time, especially in dynamic environments where services scale, change, and evolve frequently.

Over the coming articles in this series, we'll examine Caddy's simplicity-first approach, then provide a direct comparison to help you choose the right tool for your specific needs. Finally, we'll cover implementation best practices that apply regardless of which tool you select.

Optimizing your reverse proxy setup? Learn how our Infrastructure Consulting can streamline your operations.

Footnotes

  1. OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/

  2. Kubernetes. "Ingress Resources." Kubernetes Documentation, 2026. https://kubernetes.io/docs/concepts/services-networking/ingress/

  3. Google. "Service Level Objectives." Google SRE Workbook. https://sre.google/resources/practices-and-policies/service-level-objectives/

  4. OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/

  5. Netflix. "Chaos Engineering." Netflix Tech Blog. https://netflixtechblog.com/

  6. OneUptime. "How to Build API SLO Dashboards (Availability, Latency P99, Error Budget) from OpenTelemetry Metrics." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-api-slo-dashboards-opentelemetry-metrics/view

  7. OneUptime. "How to Define and Enforce Performance Budgets Using OpenTelemetry P50/P95/P99 Latency Histograms." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-otel-performance-budgets-latency-histograms/view

  8. OptyxStack. "Latency Distributions in Practice: Reading P50/P95/P99 Without Fooling Yourself." OptyxStack, February 2, 2026. https://optyxstack.com/performance/latency-distributions-in-practice-reading-p50-p95-p99-without-fooling-yourself

  9. W3C. "Trace Context Specification." W3C Trace Context, 2026. https://www.w3.org/TR/trace-context/

  10. Prometheus. "Prometheus Documentation." Prometheus.io, 2026. https://prometheus.io/docs/

  11. Grafana Labs. "Grafana Documentation." Grafana, 2026. https://grafana.com/docs/

  12. Google. "Service Level Objectives." Google SRE Workbook. https://sre.google/resources/practices-and-policies/service-level-objectives/

  13. Traefik Labs. "Traefik Mesh Documentation." Traefik, 2026. https://docs.traefik.io/traefik-mesh/

  14. Cloud Native Computing Foundation. "CNCF Landscape." CNCF, 2026. https://landscape.cncf.io/

  15. OneUptime. "How to Build API SLO Dashboards (Availability, Latency P99, Error Budget) from OpenTelemetry Metrics." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-api-slo-dashboards-opentelemetry-metrics/view

  16. OneUptime. "How to Define and Enforce Performance Budgets Using OpenTelemetry P50/P95/P99 Latency Histograms." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-otel-performance-budgets-latency-histograms/view

  17. OptyxStack. "Latency Distributions in Practice: Reading P50/P95/P99 Without Fooling Yourself." OptyxStack, February 2, 2026. https://optyxstack.com/performance/latency-distributions-in-practice-reading-p50-p95-p99-without-fooling-yourself

  18. Google. "Site Reliability Engineering: How Google Runs Production Systems." O'Reilly Media, 2016. https://sre.google/books/

  19. OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/

  20. Prometheus. "Prometheus Documentation." Prometheus.io, 2026. https://prometheus.io/docs/

  21. OneUptime. "How to Define and Enforce Performance Budgets Using OpenTelemetry P50/P95/P99 Latency Histograms." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-otel-performance-budgets-latency-histograms/view

  22. Google. "Site Reliability Engineering: How Google Runs Production Systems." O'Reilly Media, 2016. https://sre.google/books/

  23. OpenTelemetry. "OpenTelemetry Specifications." OpenTelemetry Documentation, 2026. https://opentelemetry.io/docs/specs/

  24. Prometheus. "Prometheus Documentation." Prometheus.io, 2026. https://prometheus.io/docs/

  25. Grafana Labs. "Grafana Documentation." Grafana, 2026. https://grafana.com/docs/

  26. OneUptime. "How to Build API SLO Dashboards (Availability, Latency P99, Error Budget) from OpenTelemetry Metrics." OneUptime Blog, February 6, 2026. https://oneuptime.com/blog/post/2026-02-06-api-slo-dashboards-opentelemetry-metrics/view

traefik reverse-proxy kubernetes