State 文件与远程 Backend
High Contrast
Dark Mode
Light Mode
Sepia
Forest
3 min read541 words

State 文件与远程 Backend

核心问题:Terraform 怎么记住"现在云上有什么"?多人协作时怎么防止两个人同时修改基础设施导致冲突?


什么是 State

Terraform State(terraform.tfstate)是 Terraform 用来跟踪已管理资源的数据库:

graph LR TF[Terraform 代码
期望状态] STATE[State 文件
上次已知状态] CLOUD[云平台 API
实际状态] TF -->|terraform plan| DIFF[计算差异] STATE --> DIFF CLOUD -->|refresh| STATE DIFF -->|terraform apply| CLOUD CLOUD -->|记录结果| STATE

State 文件是 JSON 格式,包含每个资源的所有属性和 ID:

{
"version": 4,
"terraform_version": "1.6.0",
"resources": [
{
"mode": "managed",
"type": "aws_vpc",
"name": "main",
"instances": [
{
"attributes": {
"id": "vpc-0a1b2c3d4e5f",
"cidr_block": "10.0.0.0/16",
"arn": "arn:aws:ec2:ap-southeast-1:123456789:vpc/vpc-0a1b2c3d4e5f"
}
}
]
}
]
}

为什么不能把 State 放本地

问题 后果
多人同时 terraform apply 两份 State 不同步,资源被重复创建或删除
State 存在 Git 仓库 包含敏感信息(密码、证书),安全风险
本地文件丢失 Terraform 失去对云资源的跟踪,后续操作危险
CI/CD 无法共享 State 每次 CI 运行都从零开始,无法正确计算差异

远程 Backend:S3 + DynamoDB(AWS 推荐方案)

# backend.tf
terraform {
backend "s3" {
bucket         = "my-company-tf-state"       # S3 桶名
key            = "prod/myapp/terraform.tfstate"  # 桶内路径(每个环境不同)
region         = "ap-southeast-1"
encrypt        = true                         # S3 服务端加密(SSE-S3)
dynamodb_table = "terraform-state-lock"       # DynamoDB 表(State Lock)
}
}

初始化脚本:创建 S3 桶和 DynamoDB 表

# 创建 S3 桶(只需一次,手动操作)
aws s3api create-bucket \
--bucket my-company-tf-state \
--region ap-southeast-1 \
--create-bucket-configuration LocationConstraint=ap-southeast-1
# 启用版本控制(State 历史,支持回滚)
aws s3api put-bucket-versioning \
--bucket my-company-tf-state \
--versioning-configuration Status=Enabled
# 启用加密
aws s3api put-bucket-encryption \
--bucket my-company-tf-state \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
# 阻止公共访问
aws s3api put-public-access-block \
--bucket my-company-tf-state \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# 创建 DynamoDB 表(State Lock)
aws dynamodb create-table \
--table-name terraform-state-lock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region ap-southeast-1

State Lock 原理

sequenceDiagram participant A as 工程师 A participant B as 工程师 B participant DDB as DynamoDB
(Lock 表) participant S3 as S3
(State 文件) A->>DDB: 获取 Lock(写入 LockID) DDB-->>A: Lock 成功 A->>S3: 读取 State B->>DDB: 尝试获取 Lock DDB-->>B: 返回错误:已被 A 锁定 A->>S3: 更新 State A->>DDB: 释放 Lock(删除 LockID) B->>DDB: 重试获取 Lock DDB-->>B: Lock 成功

强制解锁(Lock 未正常释放时)

# 查看当前 Lock 信息
terraform force-unlock LOCK_ID
# 示例
terraform force-unlock "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"

⚠️ 谨慎操作:在确认没有其他 terraform apply 正在运行时才强制解锁。


其他 Backend 选项

Backend 适用场景 优点 缺点
S3 + DynamoDB AWS 环境 最成熟,原生 AWS 集成 需要手动初始化桶和表
GCS GCP 环境 内置锁定,无需额外资源 仅限 GCP
Azure Blob Storage Azure 环境 与 Azure 集成 锁定机制依赖 Blob lease
Terraform Cloud / HCP Terraform 商业方案 内置 UI、审批流程、团队权限 费用(免费层有限制)
local 单人项目/学习 无需配置 不可多人协作

GCS Backend(GCP)

terraform {
backend "gcs" {
bucket = "my-company-tf-state"
prefix = "prod/myapp"
}
}

Azure Blob Backend

terraform {
backend "azurerm" {
resource_group_name  = "rg-terraform-state"
storage_account_name = "mycompanytfstate"
container_name       = "tfstate"
key                  = "prod/myapp/terraform.tfstate"
}
}

State 常用操作

# 查看 State 中的所有资源
terraform state list
# 查看某个资源的详细 State
terraform state show aws_instance.web[0]
# 从 State 中移除资源(不删除云上资源,只让 Terraform 忘记它)
terraform state rm aws_instance.legacy
# 导入已存在的云资源到 State(让 Terraform 开始管理它)
terraform import aws_instance.web i-0a1b2c3d4e5f6789
# 移动 State 中的资源(重构时重命名资源)
terraform state mv aws_instance.web aws_instance.app_server
# 推送本地 State 到远程 Backend
terraform state push terraform.tfstate
# 拉取远程 State(调试用)
terraform state pull > current.tfstate

多项目/多环境 State 组织

S3 桶:my-company-tf-state
├── global/
│   └── iam/terraform.tfstate        # 全局 IAM 资源
├── staging/
│   ├── vpc/terraform.tfstate
│   ├── eks/terraform.tfstate
│   └── rds/terraform.tfstate
└── production/
├── vpc/terraform.tfstate
├── eks/terraform.tfstate
└── rds/terraform.tfstate

每个组件有独立 State,通过 data "terraform_remote_state" 跨 State 引用输出值。


生产 Backend 最小 IAM 权限

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::my-company-tf-state/*"
},
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::my-company-tf-state"
},
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem"
],
"Resource": "arn:aws:dynamodb:ap-southeast-1:*:table/terraform-state-lock"
}
]
}

常见错误

错误 原因 解决
Error acquiring the state lock 前一次 apply 异常退出,Lock 未释放 确认无其他进程运行后,terraform force-unlock <ID>
Error: state data in S3 does not have the expected content State 文件损坏或被手动修改 从 S3 版本历史恢复上一版本
Error: Backend initialization required 切换了 Backend 配置 运行 terraform init -migrate-state 迁移 State
state contains ... which no longer exist 云上资源被手动删除 terraform apply -refresh-only 同步 State,或 terraform state rm 清理

下一章Terraform 实战:模块化与多云管理