State 文件与远程 Backend
核心问题:Terraform 怎么记住"现在云上有什么"?多人协作时怎么防止两个人同时修改基础设施导致冲突?
什么是 State
Terraform State(terraform.tfstate)是 Terraform 用来跟踪已管理资源的数据库:
graph LR
TF[Terraform 代码
期望状态] STATE[State 文件
上次已知状态] CLOUD[云平台 API
实际状态] TF -->|terraform plan| DIFF[计算差异] STATE --> DIFF CLOUD -->|refresh| STATE DIFF -->|terraform apply| CLOUD CLOUD -->|记录结果| STATE
期望状态] STATE[State 文件
上次已知状态] CLOUD[云平台 API
实际状态] TF -->|terraform plan| DIFF[计算差异] STATE --> DIFF CLOUD -->|refresh| STATE DIFF -->|terraform apply| CLOUD CLOUD -->|记录结果| STATE
State 文件是 JSON 格式,包含每个资源的所有属性和 ID:
{
"version": 4,
"terraform_version": "1.6.0",
"resources": [
{
"mode": "managed",
"type": "aws_vpc",
"name": "main",
"instances": [
{
"attributes": {
"id": "vpc-0a1b2c3d4e5f",
"cidr_block": "10.0.0.0/16",
"arn": "arn:aws:ec2:ap-southeast-1:123456789:vpc/vpc-0a1b2c3d4e5f"
}
}
]
}
]
}
为什么不能把 State 放本地
| 问题 | 后果 |
|---|---|
多人同时 terraform apply | 两份 State 不同步,资源被重复创建或删除 |
| State 存在 Git 仓库 | 包含敏感信息(密码、证书),安全风险 |
| 本地文件丢失 | Terraform 失去对云资源的跟踪,后续操作危险 |
| CI/CD 无法共享 State | 每次 CI 运行都从零开始,无法正确计算差异 |
远程 Backend:S3 + DynamoDB(AWS 推荐方案)
# backend.tf
terraform {
backend "s3" {
bucket = "my-company-tf-state" # S3 桶名
key = "prod/myapp/terraform.tfstate" # 桶内路径(每个环境不同)
region = "ap-southeast-1"
encrypt = true # S3 服务端加密(SSE-S3)
dynamodb_table = "terraform-state-lock" # DynamoDB 表(State Lock)
}
}
初始化脚本:创建 S3 桶和 DynamoDB 表
# 创建 S3 桶(只需一次,手动操作)
aws s3api create-bucket \
--bucket my-company-tf-state \
--region ap-southeast-1 \
--create-bucket-configuration LocationConstraint=ap-southeast-1
# 启用版本控制(State 历史,支持回滚)
aws s3api put-bucket-versioning \
--bucket my-company-tf-state \
--versioning-configuration Status=Enabled
# 启用加密
aws s3api put-bucket-encryption \
--bucket my-company-tf-state \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
# 阻止公共访问
aws s3api put-public-access-block \
--bucket my-company-tf-state \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# 创建 DynamoDB 表(State Lock)
aws dynamodb create-table \
--table-name terraform-state-lock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region ap-southeast-1
State Lock 原理
sequenceDiagram
participant A as 工程师 A
participant B as 工程师 B
participant DDB as DynamoDB
(Lock 表) participant S3 as S3
(State 文件) A->>DDB: 获取 Lock(写入 LockID) DDB-->>A: Lock 成功 A->>S3: 读取 State B->>DDB: 尝试获取 Lock DDB-->>B: 返回错误:已被 A 锁定 A->>S3: 更新 State A->>DDB: 释放 Lock(删除 LockID) B->>DDB: 重试获取 Lock DDB-->>B: Lock 成功
(Lock 表) participant S3 as S3
(State 文件) A->>DDB: 获取 Lock(写入 LockID) DDB-->>A: Lock 成功 A->>S3: 读取 State B->>DDB: 尝试获取 Lock DDB-->>B: 返回错误:已被 A 锁定 A->>S3: 更新 State A->>DDB: 释放 Lock(删除 LockID) B->>DDB: 重试获取 Lock DDB-->>B: Lock 成功
强制解锁(Lock 未正常释放时)
# 查看当前 Lock 信息
terraform force-unlock LOCK_ID
# 示例
terraform force-unlock "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"
⚠️ 谨慎操作:在确认没有其他 terraform apply 正在运行时才强制解锁。
其他 Backend 选项
| Backend | 适用场景 | 优点 | 缺点 |
|---|---|---|---|
| S3 + DynamoDB | AWS 环境 | 最成熟,原生 AWS 集成 | 需要手动初始化桶和表 |
| GCS | GCP 环境 | 内置锁定,无需额外资源 | 仅限 GCP |
| Azure Blob Storage | Azure 环境 | 与 Azure 集成 | 锁定机制依赖 Blob lease |
| Terraform Cloud / HCP Terraform | 商业方案 | 内置 UI、审批流程、团队权限 | 费用(免费层有限制) |
| local | 单人项目/学习 | 无需配置 | 不可多人协作 |
GCS Backend(GCP)
terraform {
backend "gcs" {
bucket = "my-company-tf-state"
prefix = "prod/myapp"
}
}
Azure Blob Backend
terraform {
backend "azurerm" {
resource_group_name = "rg-terraform-state"
storage_account_name = "mycompanytfstate"
container_name = "tfstate"
key = "prod/myapp/terraform.tfstate"
}
}
State 常用操作
# 查看 State 中的所有资源
terraform state list
# 查看某个资源的详细 State
terraform state show aws_instance.web[0]
# 从 State 中移除资源(不删除云上资源,只让 Terraform 忘记它)
terraform state rm aws_instance.legacy
# 导入已存在的云资源到 State(让 Terraform 开始管理它)
terraform import aws_instance.web i-0a1b2c3d4e5f6789
# 移动 State 中的资源(重构时重命名资源)
terraform state mv aws_instance.web aws_instance.app_server
# 推送本地 State 到远程 Backend
terraform state push terraform.tfstate
# 拉取远程 State(调试用)
terraform state pull > current.tfstate
多项目/多环境 State 组织
S3 桶:my-company-tf-state
├── global/
│ └── iam/terraform.tfstate # 全局 IAM 资源
├── staging/
│ ├── vpc/terraform.tfstate
│ ├── eks/terraform.tfstate
│ └── rds/terraform.tfstate
└── production/
├── vpc/terraform.tfstate
├── eks/terraform.tfstate
└── rds/terraform.tfstate
每个组件有独立 State,通过 data "terraform_remote_state" 跨 State 引用输出值。
生产 Backend 最小 IAM 权限
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::my-company-tf-state/*"
},
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::my-company-tf-state"
},
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem"
],
"Resource": "arn:aws:dynamodb:ap-southeast-1:*:table/terraform-state-lock"
}
]
}
常见错误
| 错误 | 原因 | 解决 |
|---|---|---|
Error acquiring the state lock | 前一次 apply 异常退出,Lock 未释放 | 确认无其他进程运行后,terraform force-unlock <ID> |
Error: state data in S3 does not have the expected content | State 文件损坏或被手动修改 | 从 S3 版本历史恢复上一版本 |
Error: Backend initialization required | 切换了 Backend 配置 | 运行 terraform init -migrate-state 迁移 State |
state contains ... which no longer exist | 云上资源被手动删除 | terraform apply -refresh-only 同步 State,或 terraform state rm 清理 |