Canary Deployment
Canary deployment permite liberar novas versoes gradualmente, controlando a porcentagem de trafego.
Conceito
100% trafego
|
+-----+-----+
| |
90% v1.0.0 10% v1.1.0-rc1 <- Canary inicial
| |
50% v1.0.0 50% v1.1.0-rc1 <- Aumentando
| |
+-----+-----+
|
100% v1.1.0 <- PromotedQuando Usar
- Mudancas arriscadas
- Novas features grandes
- Mudancas de infraestrutura
- Qualquer deploy que precise validacao em producao
Fluxo Manual
1. Deploy da Versao Canary
# Deploy normal (sera a versao canary)
runner deploy meu-app --version v1.1.0-rc1Neste ponto:
- Ambas as versoes estao rodando
- v1.0.0 tem 100% do trafego
- v1.1.0-rc1 tem 0%
2. Iniciar com 10%
runner weight meu-app production v1.1.0-rc1 10Distribuicao:
- v1.0.0: 90%
- v1.1.0-rc1: 10%
3. Monitorar
Aguarde e monitore:
- Logs de erro
- Metricas de performance
- Feedback de usuarios
- Health checks
# Ver distribuicao atual
runner weights meu-app -i production
# Ver logs
docker logs meu-app_production_v1.1.0-rc1 --tail 1004. Aumentar para 50%
Se tudo estiver bem:
runner weight meu-app production v1.1.0-rc1 505. Promover para 100%
Se tudo estiver OK:
runner promote meu-app -i production v1.1.0-rc1Este comando:
- Define 100% do trafego para v1.1.0-rc1
- Remove v1.0.0 do routing
- Limpa container antigo (opcional)
Canary Automatico (v1.2.0)
A partir da v1.2.0, o Runner suporta promocao automatica de canary deployments.
Configuracao no .deploy.yml
project: meu-app
routing_mode: traefik_dynamic # OBRIGATORIO para canary
instances:
production:
domain: app.meusite.com.br
source:
type: dist
canary:
enabled: true
auto_promote:
enabled: true
initial_weight: 10 # Peso inicial
increment: 10 # Incremento por step
interval: 5m # Intervalo entre steps
final_weight: 100 # Peso final
health_check_before: true
abort_on:
- health_check_failed
- error_rate_above:5%
- latency_p99_above:500msTimeline de Promocao
t=0: Deploy canary v1.2.4 → weight=10%
t=5m: Health OK → weight=20%
t=10m: Health OK → weight=30%
...
t=40m: Health OK → weight=90%
t=45m: Health OK → weight=100% → PROMOTED
→ Remove versao antiga
→ Notifica sucessoComandos de Controle
# Ver status do scheduler
runner canary status meu-app -i production
# Pausar promocao (mantém peso atual)
runner canary pause meu-app -i production
# Retomar promocao
runner canary resume meu-app -i production
# Abortar e fazer rollback
runner canary abort meu-app -i productionIniciando o Scheduler
O scheduler e executado via runner serve:
runner serve # Auto-detecta modo
runner serve --mode cron # Apenas scheduler
runner serve --tick-interval 30 # Tick a cada 30sOu como servico systemd:
# /etc/systemd/system/runner-serve.service
[Unit]
Description=Runner Canary Scheduler
After=network.target
[Service]
Type=simple
ExecStart=/opt/runner/runner serve
WorkingDirectory=/opt/runner
Restart=always
[Install]
WantedBy=multi-user.targetEstado do Scheduler
O estado e persistido em /opt/runner/state/canary/:
# /opt/runner/state/canary/meu-app_production.yml
project: meu-app
instance: production
stable_version: v1.2.3
canary_version: v1.2.4
current_weight: 30
state: running # running | paused | promoted | aborted
started_at: "2026-02-16T14:00:00Z"
last_step_at: "2026-02-16T14:10:00Z"
next_step_at: "2026-02-16T14:15:00Z"Condicoes de Abort
abort_on:
- health_check_failed # Health check falhou
- error_rate_above:5% # Taxa de erro > 5%
- latency_p99_above:500ms # Latencia p99 > 500msQuando uma condicao e atingida:
- Scheduler para imediatamente
- Peso do canary e zerado
- Trafego volta 100% para versao estavel
- Notificacao de abort enviada
Rollback do Canary
Se algo der errado em qualquer etapa:
Opcao 1: Reduzir Peso
# Voltar para 0%
runner weight meu-app production v1.1.0-rc1 0Opcao 2: Abortar (se usando scheduler)
runner canary abort meu-app -i productionOpcao 3: Destruir Versao
# Remove completamente a versao canary
runner destroy meu-app -i production --version v1.1.0-rc1Como Funciona
O Runner usa Traefik weighted services:
# /etc/traefik/dynamic/meu-app.yml (gerado automaticamente)
http:
services:
meu-app-production:
weighted:
services:
- name: meu-app-production-v1-0-0
weight: 90
- name: meu-app-production-v1-1-0-rc1
weight: 10Quando voce ajusta o peso, o Runner:
- Atualiza o arquivo YAML
- Traefik faz hot-reload (~5s)
- Trafego e redistribuido
Zero downtime: Nenhum container e reiniciado.
Requisito: routing_mode
IMPORTANTE: Canary deployment requer routing_mode: traefik_dynamic.
# .deploy.yml
routing_mode: traefik_dynamic # OBRIGATORIO
instances:
production:
canary:
enabled: true
# ...Se canary.enabled: true sem routing_mode: traefik_dynamic, o Runner retorna erro.
Motivo: Labels nos containers (traefik_labels) nao suportam weighted routing.
Exemplo Completo
Manual
# Estado inicial
$ runner versions meu-app -i production
Versoes de meu-app (production):
v1.0.0 (current, 100%)
# Deploy canary
$ runner deploy meu-app --version v1.1.0-rc1
Deploy concluido: v1.1.0-rc1
# Iniciar canary
$ runner weight meu-app production v1.1.0-rc1 10
Peso atualizado:
v1.0.0: 90%
v1.1.0-rc1: 10%
# Apos monitoramento
$ runner weight meu-app production v1.1.0-rc1 50
Peso atualizado:
v1.0.0: 50%
v1.1.0-rc1: 50%
# Promover
$ runner promote meu-app -i production v1.1.0-rc1
Versao v1.1.0-rc1 promovida para 100%
# Estado final
$ runner versions meu-app -i production
Versoes de meu-app (production):
v1.1.0-rc1 (current, 100%)
v1.0.0Automatico (v1.2.0)
# Configurar no .deploy.yml (canary.auto_promote.enabled: true)
# Deploy aciona scheduler automaticamente
$ runner deploy meu-app --version v1.2.0
Deploy concluido: v1.2.0
Canary scheduler iniciado (10% → 100% em steps de 10%, 5m cada)
# Acompanhar progresso
$ runner canary status meu-app -i production
Canary Status: meu-app (production)
Stable: v1.1.0
Canary: v1.2.0
Weight: 30%
State: running
Next: +5m → 40%
ETA: 35m para 100%
# Se precisar pausar
$ runner canary pause meu-app -i production
Scheduler pausado em 30%
# Retomar
$ runner canary resume meu-app -i production
Scheduler retomado
# Ou abortar
$ runner canary abort meu-app -i production
Canary abortado, rollback para v1.1.0Boas Praticas
Nomenclatura de Versao
Use sufixos para indicar canary:
v1.1.0-rc1- Release candidate 1v1.1.0-rc2- Release candidate 2v1.1.0- Versao final
Tempo de Monitoramento (Manual)
- 10% → 50%: Minimo 15 minutos
- 50% → 100%: Minimo 1 hora
- Producao critica: 24 horas em cada etapa
Intervalo de Auto-Promocao
| Ambiente | Intervalo Sugerido |
|---|---|
| Staging | 1-2 minutos |
| Producao baixo risco | 5 minutos |
| Producao critico | 30-60 minutos |
Metricas a Observar
- Taxa de erro (deveria permanecer igual)
- Latencia (nao deveria aumentar)
- CPU/Memoria (verificar consumo)
- Logs de erro (novos erros)
Criterios de Rollback
Faca rollback se:
- Taxa de erro aumentar
- Latencia aumentar significativamente
- Usuarios reportarem problemas
- Health checks falharem
Automacao com Script
Para canary manual automatizado:
#!/bin/bash
# canary.sh
VERSION=$1
STEPS=(10 30 50 70 100)
runner deploy meu-app --version $VERSION
for weight in "${STEPS[@]}"; do
echo "Setando peso para $weight%"
runner weight meu-app production $VERSION $weight
if [ $weight -lt 100 ]; then
echo "Aguardando 30 minutos..."
sleep 1800
# Verificar health
if ! runner health --app /data/apps/meu-app; then
echo "Health check falhou, rollback!"
runner weight meu-app production $VERSION 0
exit 1
fi
fi
done
echo "Promovendo para 100%"
runner promote meu-app -i production $VERSIONNotificacoes
Configure notificacoes para eventos de canary:
# .deploy.yml
notify:
channels:
- integration: telegram
events:
- canary_weight_changed
- canary_promoted
- health_check_failedEventos disponiveis:
canary_weight_changed- Peso alteradocanary_promoted- Promovido para 100%health_check_failed- Health check falhou (possivel rollback)