Canary Deployment

Canary deployment permite liberar novas versoes gradualmente, controlando a porcentagem de trafego.

Conceito

       100% trafego
           |
     +-----+-----+
     |           |
   90% v1.0.0  10% v1.1.0-rc1   <- Canary inicial
     |           |
   50% v1.0.0  50% v1.1.0-rc1   <- Aumentando
     |           |
     +-----+-----+
           |
     100% v1.1.0               <- Promoted

Quando Usar

  • Mudancas arriscadas
  • Novas features grandes
  • Mudancas de infraestrutura
  • Qualquer deploy que precise validacao em producao

Fluxo Manual

1. Deploy da Versao Canary

# Deploy normal (sera a versao canary)
runner deploy meu-app --version v1.1.0-rc1

Neste ponto:

  • Ambas as versoes estao rodando
  • v1.0.0 tem 100% do trafego
  • v1.1.0-rc1 tem 0%

2. Iniciar com 10%

runner weight meu-app production v1.1.0-rc1 10

Distribuicao:

  • v1.0.0: 90%
  • v1.1.0-rc1: 10%

3. Monitorar

Aguarde e monitore:

  • Logs de erro
  • Metricas de performance
  • Feedback de usuarios
  • Health checks
# Ver distribuicao atual
runner weights meu-app -i production

# Ver logs
docker logs meu-app_production_v1.1.0-rc1 --tail 100

4. Aumentar para 50%

Se tudo estiver bem:

runner weight meu-app production v1.1.0-rc1 50

5. Promover para 100%

Se tudo estiver OK:

runner promote meu-app -i production v1.1.0-rc1

Este comando:

  1. Define 100% do trafego para v1.1.0-rc1
  2. Remove v1.0.0 do routing
  3. Limpa container antigo (opcional)

Canary Automatico (v1.2.0)

A partir da v1.2.0, o Runner suporta promocao automatica de canary deployments.

Configuracao no .deploy.yml

project: meu-app
routing_mode: traefik_dynamic    # OBRIGATORIO para canary

instances:
  production:
    domain: app.meusite.com.br
    source:
      type: dist
    canary:
      enabled: true
      auto_promote:
        enabled: true
        initial_weight: 10       # Peso inicial
        increment: 10            # Incremento por step
        interval: 5m             # Intervalo entre steps
        final_weight: 100        # Peso final
        health_check_before: true
      abort_on:
        - health_check_failed
        - error_rate_above:5%
        - latency_p99_above:500ms

Timeline de Promocao

t=0:    Deploy canary v1.2.4 → weight=10%
t=5m:   Health OK → weight=20%
t=10m:  Health OK → weight=30%
...
t=40m:  Health OK → weight=90%
t=45m:  Health OK → weight=100% → PROMOTED
        → Remove versao antiga
        → Notifica sucesso

Comandos de Controle

# Ver status do scheduler
runner canary status meu-app -i production

# Pausar promocao (mantém peso atual)
runner canary pause meu-app -i production

# Retomar promocao
runner canary resume meu-app -i production

# Abortar e fazer rollback
runner canary abort meu-app -i production

Iniciando o Scheduler

O scheduler e executado via runner serve:

runner serve                    # Auto-detecta modo
runner serve --mode cron        # Apenas scheduler
runner serve --tick-interval 30 # Tick a cada 30s

Ou como servico systemd:

# /etc/systemd/system/runner-serve.service
[Unit]
Description=Runner Canary Scheduler
After=network.target

[Service]
Type=simple
ExecStart=/opt/runner/runner serve
WorkingDirectory=/opt/runner
Restart=always

[Install]
WantedBy=multi-user.target

Estado do Scheduler

O estado e persistido em /opt/runner/state/canary/:

# /opt/runner/state/canary/meu-app_production.yml
project: meu-app
instance: production
stable_version: v1.2.3
canary_version: v1.2.4
current_weight: 30
state: running              # running | paused | promoted | aborted
started_at: "2026-02-16T14:00:00Z"
last_step_at: "2026-02-16T14:10:00Z"
next_step_at: "2026-02-16T14:15:00Z"

Condicoes de Abort

abort_on:
  - health_check_failed         # Health check falhou
  - error_rate_above:5%         # Taxa de erro > 5%
  - latency_p99_above:500ms     # Latencia p99 > 500ms

Quando uma condicao e atingida:

  1. Scheduler para imediatamente
  2. Peso do canary e zerado
  3. Trafego volta 100% para versao estavel
  4. Notificacao de abort enviada

Rollback do Canary

Se algo der errado em qualquer etapa:

Opcao 1: Reduzir Peso

# Voltar para 0%
runner weight meu-app production v1.1.0-rc1 0

Opcao 2: Abortar (se usando scheduler)

runner canary abort meu-app -i production

Opcao 3: Destruir Versao

# Remove completamente a versao canary
runner destroy meu-app -i production --version v1.1.0-rc1

Como Funciona

O Runner usa Traefik weighted services:

# /etc/traefik/dynamic/meu-app.yml (gerado automaticamente)
http:
  services:
    meu-app-production:
      weighted:
        services:
          - name: meu-app-production-v1-0-0
            weight: 90
          - name: meu-app-production-v1-1-0-rc1
            weight: 10

Quando voce ajusta o peso, o Runner:

  1. Atualiza o arquivo YAML
  2. Traefik faz hot-reload (~5s)
  3. Trafego e redistribuido

Zero downtime: Nenhum container e reiniciado.

Requisito: routing_mode

IMPORTANTE: Canary deployment requer routing_mode: traefik_dynamic.

# .deploy.yml
routing_mode: traefik_dynamic   # OBRIGATORIO

instances:
  production:
    canary:
      enabled: true
      # ...

Se canary.enabled: true sem routing_mode: traefik_dynamic, o Runner retorna erro.

Motivo: Labels nos containers (traefik_labels) nao suportam weighted routing.

Exemplo Completo

Manual

# Estado inicial
$ runner versions meu-app -i production
Versoes de meu-app (production):
  v1.0.0 (current, 100%)

# Deploy canary
$ runner deploy meu-app --version v1.1.0-rc1
Deploy concluido: v1.1.0-rc1

# Iniciar canary
$ runner weight meu-app production v1.1.0-rc1 10
Peso atualizado:
  v1.0.0: 90%
  v1.1.0-rc1: 10%

# Apos monitoramento
$ runner weight meu-app production v1.1.0-rc1 50
Peso atualizado:
  v1.0.0: 50%
  v1.1.0-rc1: 50%

# Promover
$ runner promote meu-app -i production v1.1.0-rc1
Versao v1.1.0-rc1 promovida para 100%

# Estado final
$ runner versions meu-app -i production
Versoes de meu-app (production):
  v1.1.0-rc1 (current, 100%)
  v1.0.0

Automatico (v1.2.0)

# Configurar no .deploy.yml (canary.auto_promote.enabled: true)

# Deploy aciona scheduler automaticamente
$ runner deploy meu-app --version v1.2.0
Deploy concluido: v1.2.0
Canary scheduler iniciado (10%  100% em steps de 10%, 5m cada)

# Acompanhar progresso
$ runner canary status meu-app -i production
Canary Status: meu-app (production)
  Stable:  v1.1.0
  Canary:  v1.2.0
  Weight:  30%
  State:   running
  Next:    +5m 40%
  ETA:     35m para 100%

# Se precisar pausar
$ runner canary pause meu-app -i production
Scheduler pausado em 30%

# Retomar
$ runner canary resume meu-app -i production
Scheduler retomado

# Ou abortar
$ runner canary abort meu-app -i production
Canary abortado, rollback para v1.1.0

Boas Praticas

Nomenclatura de Versao

Use sufixos para indicar canary:

  • v1.1.0-rc1 - Release candidate 1
  • v1.1.0-rc2 - Release candidate 2
  • v1.1.0 - Versao final

Tempo de Monitoramento (Manual)

  • 10% → 50%: Minimo 15 minutos
  • 50% → 100%: Minimo 1 hora
  • Producao critica: 24 horas em cada etapa

Intervalo de Auto-Promocao

Ambiente Intervalo Sugerido
Staging 1-2 minutos
Producao baixo risco 5 minutos
Producao critico 30-60 minutos

Metricas a Observar

  • Taxa de erro (deveria permanecer igual)
  • Latencia (nao deveria aumentar)
  • CPU/Memoria (verificar consumo)
  • Logs de erro (novos erros)

Criterios de Rollback

Faca rollback se:

  • Taxa de erro aumentar
  • Latencia aumentar significativamente
  • Usuarios reportarem problemas
  • Health checks falharem

Automacao com Script

Para canary manual automatizado:

#!/bin/bash
# canary.sh

VERSION=$1
STEPS=(10 30 50 70 100)

runner deploy meu-app --version $VERSION

for weight in "${STEPS[@]}"; do
    echo "Setando peso para $weight%"
    runner weight meu-app production $VERSION $weight

    if [ $weight -lt 100 ]; then
        echo "Aguardando 30 minutos..."
        sleep 1800

        # Verificar health
        if ! runner health --app /data/apps/meu-app; then
            echo "Health check falhou, rollback!"
            runner weight meu-app production $VERSION 0
            exit 1
        fi
    fi
done

echo "Promovendo para 100%"
runner promote meu-app -i production $VERSION

Notificacoes

Configure notificacoes para eventos de canary:

# .deploy.yml
notify:
  channels:
    - integration: telegram
      events:
        - canary_weight_changed
        - canary_promoted
        - health_check_failed

Eventos disponiveis:

  • canary_weight_changed - Peso alterado
  • canary_promoted - Promovido para 100%
  • health_check_failed - Health check falhou (possivel rollback)
By Borlot.com.br on 16/02/2026