terraform plan 审查与 CI 集成
核心问题:怎样把基础设施变更纳入代码审查流程,让每次变更都有审批、有记录、可回溯?
terraform plan 输出解读
terraform plan 是 Terraform 最重要的安全机制——先预览,再执行:
terraform plan
# 输出示例
Terraform will perform the following actions:
# aws_instance.web[0] will be updated in-place
~ resource "aws_instance" "web" {
id = "i-0a1b2c3d"
~ instance_type = "t3.small" -> "t3.medium" # ~ 表示就地更新
tags = {...}
}
# aws_security_group.db will be destroyed
- resource "aws_security_group" "db" { # - 表示删除
id = "sg-0x1y2z"
name = "db-sg"
}
# aws_rds_cluster.main must be replaced
-/+ resource "aws_rds_cluster" "main" { # -/+ 表示删除再重建!
~ engine_version = "14.9" -> "15.4"
# (forces replacement)
}
Plan: 1 to add, 1 to change, 1 to destroy.
变更符号含义
| 符号 | 含义 | 风险 |
|---|---|---|
+ | 新增资源 | 低 |
~ | 就地更新属性 | 中 |
- | 删除资源 | 高(数据可能丢失) |
-/+ | 删除并重建(替换) | 极高(RDS、EKS 节点组等有停机风险) |
<= | Data Source 刷新 | 无 |
Plan 文件保存与应用
# 保存 plan 到文件(确保 apply 与 plan 完全一致)
terraform plan -out=tfplan.bin
# 查看保存的 plan(人类可读格式)
terraform show tfplan.bin
# 应用保存的 plan(不再询问确认)
terraform apply tfplan.bin
# JSON 格式 plan(机器解析用)
terraform show -json tfplan.bin | jq '.resource_changes'
Atlantis:Pull Request 驱动的 Terraform 工作流
Atlantis 是最流行的开源 Terraform Pull Request 自动化工具:
sequenceDiagram
participant Dev as 工程师
participant GH as GitHub PR
participant ATL as Atlantis Server
participant AWS as AWS
Dev->>GH: 提交 PR
GH->>ATL: Webhook 触发
ATL->>AWS: terraform plan
ATL->>GH: 在 PR 评论中输出 plan 结果
Dev->>GH: 代码审查 + 评论 atlantis apply
GH->>ATL: Webhook 触发
ATL->>AWS: terraform apply
ATL->>GH: 评论执行结果 + 关闭 PR
atlantis.yaml 配置
# atlantis.yaml(放在仓库根目录)
version: 3
automerge: false
delete_source_branch_on_merge: false
projects:
- name: prod-vpc
dir: infra/environments/production/vpc
workspace: default
terraform_version: v1.6.0
autoplan:
when_modified: ["*.tf", "../../modules/vpc/**/*.tf"]
enabled: true
apply_requirements:
- approved # 需要 PR 审批
- mergeable # PR 无冲突
- name: prod-eks
dir: infra/environments/production/eks
depends_on: [prod-vpc] # 先 apply vpc
apply_requirements:
- approved
- mergeable
GitHub Actions:Terraform CI 完整流水线
# .github/workflows/terraform.yml
name: Terraform CI/CD
on:
pull_request:
paths: ['infra/**']
push:
branches: [main]
paths: ['infra/**']
env:
TF_VERSION: "1.6.4"
jobs:
validate:
name: Terraform 格式和验证
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: terraform fmt check
run: terraform fmt -check -recursive infra/
- name: terraform validate
run: |
for dir in infra/environments/*/; do
echo "验证 $dir"
cd $dir && terraform init -backend=false && terraform validate
cd -
done
security-scan:
name: 安全扫描
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aquasecurity/trivy-action@master
with:
scan-type: 'config'
scan-ref: 'infra/'
exit-code: '1'
severity: 'HIGH,CRITICAL'
plan:
name: Terraform Plan
needs: [validate, security-scan]
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
permissions:
id-token: write
contents: read
pull-requests: write
strategy:
matrix:
env: [staging, production]
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: 配置 AWS OIDC 凭据
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars[format('{0}_ROLE_ARN', matrix.env)] }}
aws-region: ap-southeast-1
- name: Terraform Init
working-directory: infra/environments/${{ matrix.env }}
run: terraform init
- name: Terraform Plan
id: plan
working-directory: infra/environments/${{ matrix.env }}
run: |
terraform plan -no-color -out=tfplan 2>&1 | tee plan_output.txt
echo "exitcode=$?" >> $GITHUB_OUTPUT
env:
TF_VAR_db_password: ${{ secrets[format('{0}_DB_PASSWORD', matrix.env)] }}
continue-on-error: true
- name: 在 PR 中评论 Plan 结果
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const fs = require('fs');
const plan = fs.readFileSync('infra/environments/${{ matrix.env }}/plan_output.txt', 'utf8');
const truncated = plan.length > 65000 ? plan.substring(0, 65000) + '\n... (truncated)' : plan;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Terraform Plan: \`${{ matrix.env }}\`\n\`\`\`\n${truncated}\n\`\`\``
});
apply:
name: Terraform Apply
needs: plan
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production # 需要 GitHub Environment Protection Rules 审批
strategy:
matrix:
env: [staging, production]
max-parallel: 1 # 串行执行,先 staging 再 production
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: 配置 AWS 凭据
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars[format('{0}_ROLE_ARN', matrix.env)] }}
aws-region: ap-southeast-1
- name: Terraform Init + Apply
working-directory: infra/environments/${{ matrix.env }}
run: |
terraform init
terraform apply -auto-approve
env:
TF_VAR_db_password: ${{ secrets[format('{0}_DB_PASSWORD', matrix.env)] }}
最佳实践清单
变更前:
□ terraform fmt 格式化
□ terraform validate 语法检查
□ tfsec / checkov 安全扫描(见第 12 章)
□ terraform plan 仔细阅读输出
Plan Review 要点:
□ 有无意外的 -(删除)操作
□ 有无 -/+ 替换(尤其是数据库、EKS 节点组)
□ 预计影响的资源数量是否合理
□ 资源名称是否正确(避免命名漂移)
Apply 后:
□ 验证应用是否正常(健康检查)
□ 确认 State 已更新(terraform state list)
□ 记录变更(CHANGELOG 或 PR 描述)
常见错误
| 错误 | 原因 | 解决 |
|---|---|---|
| CI apply 权限不足 | IAM Role 权限太小 | 检查 CloudTrail 找出缺失权限,逐步补充 |
| Plan 和 Apply 不一致 | 两步之间有人手动修改了云资源 | 用 -out=tfplan 保存 plan,apply 时引用文件 |
| apply 成功但应用异常 | Terraform 只管资源存在,不管应用健康 | 在 CI 的 apply 步骤后加健康检查 |