· 16 min read

Building Real AI Agents with the Copilot SDK in Go

If you follow the GitHub Copilot ecosystem, you have probably heard of *.agent.md files. They are great for simple things, basically a boosted prompt that runs inside Copilot

If you follow the GitHub Copilot ecosystem, you have probably heard of *.agent.md files. They are great for simple things, basically a boosted prompt that runs inside Copilot. But when you need a real agent that calls APIs, queries databases, applies permission policies, injects context through RAG, and runs in production as a microservice… Markdown is not enough.

That is where the GitHub Copilot SDK comes in.

In this post, I will show how to build an incident response agent, from scratch to a deploy-ready container, using the SDK in Go. The idea is that, by the end, you will have a complete view of everything the SDK offers and why it exists.

What does the Copilot SDK do that an agent.md does not?

Before diving into the code, it is worth understanding the gap:

Capability*.agent.mdCopilot SDK
Custom tools in native code (without the overhead of an external MCP)⚠️ via a separate MCP server
Hooks (intercept prompts, tool calls, errors)⚠️ PostToolUse via shell (Preview in VS Code)
Permission control (approve/deny by type)
Streaming with delta events
Elicitation UI (forms, selections)
Infinite sessions with automatic compaction
Granular system prompt control (by section)
Programmatic multi-session orchestration⚠️ subagents + declarative handoffs in VS Code
BYOK (Bring Your Own Key)
Telemetry (native OpenTelemetry)
Embedding in a backend/CLI/worker
Custom slash commands
Built-in tool overrides

The agent.md is great for static instructions. The SDK is for when you need real logic.

Architecture: how the SDK works

The Copilot SDK follows a two-process architecture. Your application talks to the SDK, which communicates with the Copilot CLI over JSON-RPC:

Sua Aplicação (API/Worker/CLI)
  SDK Client
       ↓ JSON-RPC (stdio ou TCP)
  Copilot CLI (modo headless)
  ☁️ GitHub Copilot / Provedor de modelo

The Go SDK has an important advantage over the other SDKs: it can embed the CLI binary directly into your Go binary using //go:embed. This means your final artifact is a single static binary with no external dependencies, perfect for distroless containers.

What we are going to build

An Incident Commander Agent, an agent that on-call engineers can trigger during a production incident. It:

  • Queries metrics in Prometheus automatically

  • Fetches runbooks from a knowledge base

  • Injects context from past incidents through RAG

  • Scales services in Kubernetes and rolls back deploys

  • Requires confirmation before destructive actions

  • Records everything in an audit log

  • Exports traces through OpenTelemetry

All of this packaged in a single Go binary inside a ~60MB container.

Setting up the project

mkdir incident-commander && cd incident-commander
go mod init github.com/seu-usuario/incident-commander

# Adiciona o SDK
go get github.com/github/copilot-sdk/go

# Adiciona o bundler como ferramenta (uma vez só)
go get -tool github.com/github/copilot-sdk/go/cmd/bundler

Project structure

incident-commander/
├── main.go                       # Entry point + HTTP server
├── agent.go                      # Sessão, permissões, hooks
├── tools.go                      # Ferramentas customizadas
├── webhooks.go                   # Webhooks (PagerDuty, AM)
├── slack.go                      # Integração com Slack
├── cron.go                       # Health checks proativos
├── go.mod
├── go.sum
├── Dockerfile
├── docker-compose.yml
├── zcopilot_*_linux_amd64.zst      # ← gerado pelo bundler
├── zcopilot_*_linux_amd64.license  # ← gerado pelo bundler
└── zcopilot_linux_amd64.go         # ← gerado pelo bundler

Custom tools

This is where the SDK shines. Each tool is a typed Go function that the model can call when needed:

package main

import (
	"fmt"
	copilot "github.com/github/copilot-sdk/go"
)

type QueryPrometheusParams struct {
	Query    string `json:"query" jsonschema:"PromQL query string"`
	Duration string `json:"duration,omitempty" jsonschema:"Time range (e.g. 5m, 1h). Default: 5m"`
}

type GetRunbookParams struct {
	Service string `json:"service" jsonschema:"Service name to look up runbook for"`
}

type ScaleServiceParams struct {
	Service  string `json:"service" jsonschema:"Kubernetes service/deployment name"`
	Replicas int    `json:"replicas" jsonschema:"Target replica count"`
}

type RollbackDeployParams struct {
	Service string `json:"service" jsonschema:"Service to rollback"`
	Version string `json:"version,omitempty" jsonschema:"Target version. If empty, rolls back to previous."`
}

func incidentTools() []copilot.Tool {
	queryPrometheus := copilot.DefineTool("query_prometheus",
		"Query Prometheus for metrics. Use for CPU, memory, error rates, latency percentiles, and Kafka consumer lag.",
		func(params QueryPrometheusParams, inv copilot.ToolInvocation) (any, error) {
			duration := params.Duration
			if duration == "" {
				duration = "5m"
			}
			result, err := prometheusQuery(params.Query, duration)
			if err != nil {
				return nil, fmt.Errorf("prometheus query failed: %w", err)
			}
			return result, nil
		})
	queryPrometheus.SkipPermission = true // read-only, auto-approve

	getRunbook := copilot.DefineTool("get_runbook",
		"Retrieve the incident runbook for a service from the knowledge base.",
		func(params GetRunbookParams, inv copilot.ToolInvocation) (any, error) {
			runbook, err := fetchRunbook(params.Service)
			if err != nil {
				return nil, err
			}
			return runbook, nil
		})
	getRunbook.SkipPermission = true

	scaleService := copilot.DefineTool("scale_service",
		"Scale a Kubernetes deployment. Requires engineer confirmation.",
		func(params ScaleServiceParams, inv copilot.ToolInvocation) (any, error) {
			if err := kubeScale(params.Service, params.Replicas); err != nil {
				return nil, err
			}
			return fmt.Sprintf("Scaled %s to %d replicas", params.Service, params.Replicas), nil
		})

	rollbackDeploy := copilot.DefineTool("rollback_deploy",
		"Rollback a service deployment via the CD pipeline. DESTRUCTIVE, requires confirmation.",
		func(params RollbackDeployParams, inv copilot.ToolInvocation) (any, error) {
			version, err := triggerRollback(params.Service, params.Version)
			if err != nil {
				return nil, err
			}
			return fmt.Sprintf("Rollback of %s to %s initiated", params.Service, version), nil
		})

	return []copilot.Tool{queryPrometheus, getRunbook, scaleService, rollbackDeploy}
}

Notice SkipPermission = true on the read tools. Tools that only query data do not need to ask for permission. Meanwhile, scale_service and rollback_deploy will go through the permission handler, which we will see next.

DefineTool uses generics and the jsonschema-go package to generate the JSON Schema automatically from the Go structs. The jsonschema:"..." tags become the parameter descriptions for the model.

Permissions: fine-grained control over what the agent can do

The permission handler is mandatory in the SDK. You need to explicitly decide what to approve or deny. This is intentional design: production agents need guardrails.

package main

import (
	"log"
	copilot "github.com/github/copilot-sdk/go"
)

func incidentPermissionHandler() copilot.PermissionHandlerFunc {
	return func(req copilot.PermissionRequest, inv copilot.PermissionInvocation) (copilot.PermissionRequestResult, error) {
		switch req.Kind {
		case "shell":
			// Nunca permitir execução de comandos shell
			log.Printf("[PERMISSION DENIED] Shell command blocked: %v", req.FullCommandText)
			return copilot.PermissionRequestResult{
				Kind: copilot.PermissionRequestResultKindDeniedByRules,
			}, nil

		case "custom_tool":
			toolName := ""
			if req.ToolName != nil {
				toolName = *req.ToolName
			}

			switch toolName {
			case "query_prometheus", "get_runbook":
				return copilot.PermissionRequestResult{
					Kind: copilot.PermissionRequestResultKindApproved,
				}, nil
			case "scale_service", "rollback_deploy":
				// Aprovar, mas logar para audit trail
				log.Printf("[AUDIT] Destructive tool approved: %s (call: %v)", toolName, req.ToolCallID)
				return copilot.PermissionRequestResult{
					Kind: copilot.PermissionRequestResultKindApproved,
				}, nil
			}
		}

		// Default: approve
		return copilot.PermissionRequestResult{
			Kind: copilot.PermissionRequestResultKindApproved,
		}, nil
	}
}

This is impossible with agent.md. There, the agent runs with Copilot’s default permissions. Here, you define the policy.

Hooks: RAG, auditing, and error handling

Hooks are interceptors that run at specific points in the session lifecycle. We will use three:

  1. OnUserPromptSubmitted: injects context from past incidents (RAG) before the model processes it

  2. OnPostToolUse: logs every tool execution

  3. OnErrorOccurred: defines the retry/skip/abort strategy

package main

import (
	"fmt"
	"log"
	copilot "github.com/github/copilot-sdk/go"
)

func incidentHooks() *copilot.SessionHooks {
	return &copilot.SessionHooks{
		// RAG: enriquece cada prompt com incidentes passados similares
		OnUserPromptSubmitted: func(input copilot.UserPromptSubmittedHookInput, inv copilot.HookInvocation) (*copilot.UserPromptSubmittedHookOutput, error) {
			context, err := ragSearch(input.Prompt)
			if err != nil {
				log.Printf("[RAG] Search failed (non-fatal): %v", err)
				return &copilot.UserPromptSubmittedHookOutput{
					ModifiedPrompt: input.Prompt,
				}, nil
			}

			enriched := fmt.Sprintf(
				"%s\n\n<past_incidents>\n%s\n</past_incidents>",
				input.Prompt, context,
			)
			return &copilot.UserPromptSubmittedHookOutput{
				ModifiedPrompt: enriched,
			}, nil
		},

		// Audit: loga cada execução de ferramenta
		OnPostToolUse: func(input copilot.PostToolUseHookInput, inv copilot.HookInvocation) (*copilot.PostToolUseHookOutput, error) {
			log.Printf("[TOOL] %s completed (session: %s)", input.ToolName, inv.SessionID)
			return &copilot.PostToolUseHookOutput{}, nil
		},

		// Erros: retry automático em falhas transientes
		OnErrorOccurred: func(input copilot.ErrorOccurredHookInput, inv copilot.HookInvocation) (*copilot.ErrorOccurredHookOutput, error) {
			log.Printf("[ERROR] %s in context %s", input.Error, input.ErrorContext)
			return &copilot.ErrorOccurredHookOutput{
				ErrorHandling: "retry",
			}, nil
		},
	}
}

func ragSearch(query string) (string, error) {
	// Na prática: consulte seu vector DB (pgvector, Pinecone, Weaviate, etc.)
	return "Incidente similar (INC-2024-0312): spike de latência p99 causado por exaustão do connection pool. Resolvido escalando para 8 réplicas e reiniciando pods.", nil
}

The RAG hook is particularly powerful. Every time the engineer sends a message, the hook intercepts the prompt before it reaches the model, queries a vector database of past postmortems, and injects the most relevant results as additional context. The model receives the engineer’s message together with similar incidents, without the engineer having to search manually. This is completely transparent to the person using it.

Assembling the agent session

Now we put everything together: tools, hooks, permissions, and system prompt configuration:

package main

import (
	"context"
	copilot "github.com/github/copilot-sdk/go"
)

func createIncidentSession(ctx context.Context, client *copilot.Client, sessionID, prompt string) (*copilot.Session, error) {
	return client.CreateSession(ctx, &copilot.SessionConfig{
		SessionID: sessionID,
		Model:     "gpt-4.1",
		Streaming: true,
		Tools:     incidentTools(),
		Hooks:     incidentHooks(),
		SystemMessage: &copilot.SystemMessageConfig{
			Mode: "customize",
			Sections: map[string]copilot.SectionOverride{
				copilot.SectionIdentity: {
					Action:  "replace",
					Content: "You are an Incident Commander assistant. You help on-call engineers diagnose and resolve production incidents.",
				},
				copilot.SectionCodeChangeRules: {Action: "remove"},
				copilot.SectionGuidelines: {
					Action: "append",
					Content: `
* Always check metrics before suggesting a remediation.
* Never execute a rollback without explicit engineer confirmation.
* Log every destructive action.`,
				},
			},
			Content: "Focus on incident triage, diagnosis, and remediation for a microservices platform.",
		},
		OnPermissionRequest: incidentPermissionHandler(),
		InfiniteSessions: &copilot.InfiniteSessionConfig{
			Enabled:                       copilot.Bool(true),
			BackgroundCompactionThreshold: copilot.Float64(0.80),
		},
	})
}

Notice the SystemMessage with Mode: "customize". Instead of replacing the entire prompt or only concatenating text at the end, the SDK lets you control it section by section: replace the identity, remove code rules (irrelevant for this agent), and add specific guidelines. The other sections (safety, tool instructions, etc.) are preserved automatically.

InfiniteSessions with BackgroundCompactionThreshold: 0.80 makes the SDK automatically compact the history in the background when the context reaches 80% of the window, without the engineer noticing. This is essential for long incidents with hundreds of messages.

Triggers: when and how is the agent triggered?

The agent is an HTTP server that keeps running. The question is: who calls it, and when? There are several patterns, and in practice you combine more than one.

PagerDuty webhook (main trigger)

When an alert fires, PagerDuty sends a webhook. The agent receives it, creates a session, runs the initial diagnosis, and posts to Slack:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"net/http"

	copilot "github.com/github/copilot-sdk/go"
)

type PagerDutyWebhook struct {
	Event struct {
		EventType string `json:"event_type"`
		Data      struct {
			ID      string `json:"id"`
			Title   string `json:"title"`
			Service struct {
				Name string `json:"name"`
			} `json:"service"`
			Urgency   string `json:"urgency"`
			Assignees []struct {
				Summary string `json:"summary"`
			} `json:"assignees"`
		} `json:"data"`
	} `json:"event"`
}

func handlePagerDutyWebhook(ctx context.Context, client *copilot.Client, w http.ResponseWriter, r *http.Request) {
	var webhook PagerDutyWebhook
	if err := json.NewDecoder(r.Body).Decode(&webhook); err != nil {
		http.Error(w, err.Error(), http.StatusBadRequest)
		return
	}

	if webhook.Event.EventType != "incident.triggered" {
		w.WriteHeader(http.StatusOK)
		return
	}

	data := webhook.Event.Data
	sessionID := fmt.Sprintf("incident-%s", data.ID)

	prompt := fmt.Sprintf(
		`A new incident has been triggered:
- **Incident**: %s
- **Service**: %s
- **Urgency**: %s
- **ID**: %s

Please:
1. Query Prometheus for the current health of service "%s" (error rate, p99 latency, CPU, memory)
2. Fetch the runbook for this service
3. Provide an initial triage summary with likely root cause and recommended actions`,
		data.Title, data.Service.Name, data.Urgency, data.ID, data.Service.Name,
	)

	session, err := createIncidentSession(ctx, client, sessionID, prompt)
	if err != nil {
		log.Printf("[WEBHOOK] Failed to create session: %v", err)
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}

	response, err := session.SendAndWait(ctx, copilot.MessageOptions{Prompt: prompt})
	if err != nil {
		log.Printf("[WEBHOOK] Agent failed: %v", err)
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}

	if d, ok := response.Data.(*copilot.AssistantMessageData); ok {
		postToSlack(data.Service.Name, data.ID, sessionID, d.Content)
	}

	w.WriteHeader(http.StatusAccepted)
}

Slack Bot (interactive conversation)

After the initial diagnosis, the engineer continues the conversation through Slack. Each message is a SendAndWait in the same session:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"net/http"
	"strings"
	"sync"

	copilot "github.com/github/copilot-sdk/go"
)

var threadToSession sync.Map

type SlackEvent struct {
	Type      string `json:"type"`
	Challenge string `json:"challenge"`
	Event     struct {
		Type     string `json:"type"`
		Text     string `json:"text"`
		Channel  string `json:"channel"`
		ThreadTS string `json:"thread_ts"`
		TS       string `json:"ts"`
		User     string `json:"user"`
	} `json:"event"`
}

func handleSlackEvent(ctx context.Context, client *copilot.Client, w http.ResponseWriter, r *http.Request) {
	var event SlackEvent
	if err := json.NewDecoder(r.Body).Decode(&event); err != nil {
		http.Error(w, err.Error(), http.StatusBadRequest)
		return
	}

	if event.Type == "url_verification" {
		json.NewEncoder(w).Encode(map[string]string{"challenge": event.Challenge})
		return
	}

	threadTS := event.Event.ThreadTS
	if threadTS == "" {
		threadTS = event.Event.TS
	}

	message := stripBotMention(event.Event.Text)
	if message == "" {
		w.WriteHeader(http.StatusOK)
		return
	}

	sessionIDVal, ok := threadToSession.Load(threadTS)
	if !ok {
		w.WriteHeader(http.StatusOK)
		return
	}
	sessionID := sessionIDVal.(string)

	go func() {
		session, err := client.ResumeSession(ctx, sessionID, &copilot.ResumeSessionConfig{
			OnPermissionRequest: incidentPermissionHandler(),
		})
		if err != nil {
			log.Printf("[SLACK] Failed to resume session %s: %v", sessionID, err)
			return
		}
		defer session.Disconnect()

		response, err := session.SendAndWait(ctx, copilot.MessageOptions{
			Prompt: fmt.Sprintf("[Engineer %s]: %s", event.Event.User, message),
		})
		if err != nil {
			return
		}

		if d, ok := response.Data.(*copilot.AssistantMessageData); ok {
			postToSlackThread(event.Event.Channel, threadTS, d.Content)
		}
	}()

	w.WriteHeader(http.StatusOK)
}

func stripBotMention(text string) string {
	if idx := strings.Index(text, "> "); idx != -1 {
		return strings.TrimSpace(text[idx+2:])
	}
	return strings.TrimSpace(text)
}

Alertmanager webhook (directly from alerts)

You can skip PagerDuty and react directly to Prometheus alerts:

# alertmanager.yml
route:
  receiver: incident-commander
  group_by: ['alertname', 'service']
  group_wait: 30s

receivers:
  - name: incident-commander
    webhook_configs:
      - url: 'http://incident-commander:8080/api/webhook/alertmanager'
        send_resolved: true

Cron (proactive health checks)

The agent does not have to be only reactive. It can run periodically and find problems before they become incidents:

package main

import (
	"context"
	"fmt"
	"log"
	"strings"
	"time"

	copilot "github.com/github/copilot-sdk/go"
)

func StartHealthCheckCron(ctx context.Context, client *copilot.Client, interval time.Duration) {
	ticker := time.NewTicker(interval)
	defer ticker.Stop()

	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			runProactiveCheck(ctx, client)
		}
	}
}

func runProactiveCheck(ctx context.Context, client *copilot.Client) {
	sessionID := fmt.Sprintf("healthcheck-%d", time.Now().Unix())

	session, err := createIncidentSession(ctx, client, sessionID,
		`Run a proactive health check across critical services:
1. Check error rates for the main services
2. Check p99 latency for each
3. Check Kafka consumer lag for payment processing topics
4. Flag anything anomalous with severity and recommended action`)
	if err != nil {
		log.Printf("[CRON] Failed: %v", err)
		return
	}

	response, err := session.SendAndWait(ctx, copilot.MessageOptions{
		Prompt: "Run proactive health check now.",
	})
	if err != nil {
		log.Printf("[CRON] Agent failed: %v", err)
		return
	}

	if d, ok := response.Data.(*copilot.AssistantMessageData); ok {
		if strings.Contains(strings.ToLower(d.Content), "anomal") ||
			strings.Contains(strings.ToLower(d.Content), "elevated") ||
			strings.Contains(strings.ToLower(d.Content), "degraded") {
			postToSlack("platform", "proactive-check", sessionID, d.Content)
		}
	}

	client.DeleteSession(ctx, sessionID)
}

The entry point: putting it all together

package main

import (
	"context"
	"log"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"

	copilot "github.com/github/copilot-sdk/go"
)

func main() {
	ctx, cancel := signal.NotifyContext(context.Background(),
		syscall.SIGINT, syscall.SIGTERM)
	defer cancel()

	// Sem CLIPath: o CLI embutido é usado automaticamente
	client := copilot.NewClient(&copilot.ClientOptions{
		LogLevel: "error",
		Telemetry: &copilot.TelemetryConfig{
			OTLPEndpoint: os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT"),
			SourceName:   "incident-commander",
		},
	})

	if err := client.Start(ctx); err != nil {
		log.Fatalf("Failed to start Copilot client: %v", err)
	}
	defer client.Stop()

	log.Println("Copilot client started with embedded CLI")

	mux := http.NewServeMux()

	// Webhooks
	mux.HandleFunc("POST /api/webhook/pagerduty", func(w http.ResponseWriter, r *http.Request) {
		handlePagerDutyWebhook(ctx, client, w, r)
	})

	// Slack
	mux.HandleFunc("POST /api/slack/events", func(w http.ResponseWriter, r *http.Request) {
		handleSlackEvent(ctx, client, w, r)
	})

	// API direta
	mux.HandleFunc("POST /api/incident", func(w http.ResponseWriter, r *http.Request) {
		handleIncident(ctx, client, w, r)
	})
	mux.HandleFunc("POST /api/incident/{sessionID}/message", func(w http.ResponseWriter, r *http.Request) {
		handleMessage(ctx, client, w, r)
	})

	// Health check
	mux.HandleFunc("GET /healthz", func(w http.ResponseWriter, r *http.Request) {
		if _, err := client.Ping(ctx, "health"); err != nil {
			http.Error(w, "unhealthy", 503)
			return
		}
		w.WriteHeader(200)
	})

	// Background: health checks proativos a cada 15 minutos
	go StartHealthCheckCron(ctx, client, 15*time.Minute)

	srv := &http.Server{Addr: ":8080", Handler: mux}
	go func() {
		<-ctx.Done()
		srv.Shutdown(context.Background())
	}()

	log.Println("Incident Commander listening on :8080")
	if err := srv.ListenAndServe(); err != http.ErrServerClosed {
		log.Fatal(err)
	}
}

The embedded CLI: how it works

The Go SDK has a bundler that automates the whole process of embedding the Copilot CLI into your binary.

What happens when you run go tool bundler:

  1. Reads go.mod to detect the SDK version

  2. Looks up the corresponding CLI version in the npm registry

  3. Downloads the platform-specific binary for the target platform (for example, linux/amd64)

  4. Compresses it with zstd

  5. Generates a Go file with the //go:embed directive

The generated file looks roughly like this:

// Code generated by copilot-sdk bundler; DO NOT EDIT.
package main

import (
    _ "embed"
    "github.com/github/copilot-sdk/go/embeddedcli"
)

//go:embed zcopilot_0.25.0_linux_amd64.zst
var localEmbeddedCopilotCLI []byte

func init() {
    embeddedcli.Setup(embeddedcli.Config{
        Cli:     cliReader(), // descomprime o zst
        Version: "0.25.0",
        CliHash: mustDecodeBase64("sha256-hash..."),
    })
}

At runtime, when NewClient detects there is no CLIPath and no COPILOT_CLI_PATH, it uses the embedded blob: decompresses it into a cache directory and verifies the SHA-256.

The complete build looks like this:

#!/bin/bash
set -euo pipefail

# 1. Baixa + embute o CLI para a plataforma alvo
go tool bundler --platform linux/amd64

# 2. Compila o binário final
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o incident-commander .

Dockerfile: multi-stage with distroless

# Stage 1: Build
FROM golang:1.24-bookworm AS builder

WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download

COPY . .

# Baixa e embute o CLI
RUN go tool bundler --platform linux/amd64

# Compila binário estático
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build -ldflags="-s -w" -o /incident-commander .

# Stage 2: Runtime (distroless para superfície de ataque mínima)
FROM gcr.io/distroless/static-debian12:nonroot

ENV HOME=/home/nonroot

COPY --from=builder /incident-commander /incident-commander

# Sessões persistem aqui: monte um volume em produção
VOLUME /home/nonroot/.copilot/session-state

EXPOSE 8080

ENTRYPOINT ["/incident-commander"]

Why distroless/static? Since the Go binary is fully static (CGO_ENABLED=0) and the embedded Copilot CLI is also static, there are no OS dependencies. No apt, no bash, no glibc. The final image is around 60-90MB (your binary + compressed CLI). This also means fewer CVEs to worry about.

Docker Compose for local development

version: "3.8"

services:
  incident-commander:
    build: .
    ports:
      - "8080:8080"
    environment:
      - COPILOT_GITHUB_TOKEN=${COPILOT_GITHUB_TOKEN}
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
    volumes:
      - session-data:/home/nonroot/.copilot/session-state
    depends_on:
      - otel-collector
    restart: unless-stopped

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    ports:
      - "4318:4318"
    volumes:
      - ./otel-config.yaml:/etc/otelcol-contrib/config.yaml

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"

volumes:
  session-data:

Deploying on Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: incident-commander
spec:
  replicas: 2
  selector:
    matchLabels:
      app: incident-commander
  template:
    metadata:
      labels:
        app: incident-commander
    spec:
      containers:
        - name: agent
          image: ghcr.io/seu-usuario/incident-commander:latest
          ports:
            - containerPort: 8080
          env:
            - name: COPILOT_GITHUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: copilot-secrets
                  key: github-token
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://otel-collector.observability:4318"
          volumeMounts:
            - name: session-state
              mountPath: /home/nonroot/.copilot/session-state
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 30
          resources:
            requests:
              memory: "256Mi"
              cpu: "100m"
            limits:
              memory: "1Gi"
              cpu: "500m"
      volumes:
        - name: session-state
          persistentVolumeClaim:
            claimName: incident-commander-sessions

You deploy this agent exactly like any other Go microservice. The only extra detail is the persistent volume for session state.

CI/CD with GitHub Actions

name: Build & Push

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: "1.24"

      - uses: actions/cache@v4
        with:
          path: zcopilot_*
          key: copilot-cli-${{ hashFiles('go.sum') }}-linux-amd64

      - name: Bundle CLI
        run: go tool bundler --platform linux/amd64

      - name: Test
        run: go test ./...

      - name: Build & Push Container
        run: |
          echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
          docker build -t ghcr.io/seu-usuario/incident-commander:${{ github.sha }} .
          docker push ghcr.io/seu-usuario/incident-commander:${{ github.sha }}

The zcopilot_* cache avoids re-downloading the CLI on every build. It only changes when the SDK is updated (reflected in go.sum).

Production considerations

Some important things for keeping this running for real:

ConcernRecommendation
StorageMount /home/nonroot/.copilot/session-state/ on a persistent volume
SecretsUse K8s Secrets or Vault for COPILOT_GITHUB_TOKEN
Health checkclient.Ping() as a liveness probe, restart if it does not respond
Session cleanupCron to delete sessions older than 24h (+ 30min idle auto-cleanup)
LockingRedis SETNX if multiple pods access the same session
ObservabilityOpenTelemetry → OTLP collector → Grafana/Jaeger
Graceful shutdownDrain active sessions before stopping the CLI
Rate limitingApply it in your API layer; the SDK does not limit

Conclusion

The Copilot SDK turns Copilot from a code assistant into an agent runtime. You define behavior in code (not Markdown), control permissions, intercept the flow with hooks, inject dynamic context, and deploy it like any microservice.

The Go SDK in particular has a very elegant proposition: the bundler + //go:embed generates a single static binary with the CLI included. No sidecar, no Docker-in-Docker, no external dependency. go build, and you are done.

The agent we built here is a real use case: it receives webhooks from monitoring systems, diagnoses incidents autonomously, keeps conversations with engineers going through Slack, and executes remediations with permission gates. Everything is observable through OpenTelemetry.

This is not an AI “Hello, World!”. It is a production microservice that happens to have an LLM inside.


The code and examples in this post are based on the Copilot SDK documentation. The SDK is in public preview; check the repository for the latest version.