锁机制与死锁排查
High Contrast
Dark Mode
Light Mode
Sepia
Forest
2 min read340 words

锁机制与死锁排查

PostgreSQL 的锁是保证并发安全的基础,但也是生产故障的常见来源。理解行锁、表锁的获取时机,能让你设计出既安全又高吞吐的数据库操作——并在出现锁等待或死锁时快速定位根因。


锁的层次结构

数据库级
└── 表级锁(AccessShareLock, RowShareLock, ExclusiveLock...)
└── 行级锁(FOR UPDATE, FOR SHARE, FOR NO KEY UPDATE...)
└── 页级锁(内部使用,应用层不感知)

表级锁(常见类型)

锁类型 获取场景 与其他锁冲突
ACCESS SHARE SELECT 只与 ACCESS EXCLUSIVE 冲突
ROW SHARE SELECT ... FOR UPDATE/SHARE 与 EXCLUSIVE、ACCESS EXCLUSIVE 冲突
ROW EXCLUSIVE INSERT/UPDATE/DELETE 与 SHARE 及更高级别冲突
SHARE UPDATE EXCLUSIVE VACUUM、CREATE INDEX CONCURRENTLY 与自身及更高级别冲突
SHARE CREATE INDEX(非 CONCURRENTLY) 与 ROW EXCLUSIVE 及更高级别冲突
ACCESS EXCLUSIVE ALTER TABLE、DROP TABLE、TRUNCATE 与所有锁冲突
-- 这些操作会获取 ACCESS EXCLUSIVE 锁(阻塞所有查询!)
ALTER TABLE orders ADD COLUMN notes TEXT;          -- ACCESS EXCLUSIVE
ALTER TABLE orders ALTER COLUMN price TYPE BIGINT; -- ACCESS EXCLUSIVE
DROP TABLE temp_data;                              -- ACCESS EXCLUSIVE
TRUNCATE orders;                                   -- ACCESS EXCLUSIVE
-- 这些操作只获取低级别锁(允许并发读写)
CREATE INDEX CONCURRENTLY idx_orders_user ON orders (user_id);  -- SHARE UPDATE EXCLUSIVE
VACUUM orders;                                     -- SHARE UPDATE EXCLUSIVE

行级锁(SELECT FOR UPDATE)

-- SELECT FOR UPDATE:读取时加行锁,防止其他事务修改
-- 常用于"读取-检查-更新"模式
BEGIN;
-- 锁定库存行,防止其他事务并发扣减
SELECT available_qty
FROM inventory
WHERE product_id = 42
FOR UPDATE;  -- 加排他行锁
-- 检查库存
-- 应用层检查 available_qty >= order_qty
-- 扣减库存
UPDATE inventory
SET available_qty = available_qty - 1
WHERE product_id = 42;
COMMIT;
-- SELECT FOR UPDATE SKIP LOCKED:跳过被锁定的行(任务队列场景)
-- 多个 worker 并发处理任务,每个 worker 取一批未被处理的任务
BEGIN;
SELECT id, payload
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 5
FOR UPDATE SKIP LOCKED;  -- 跳过其他 worker 正在处理的行
-- 处理任务...
UPDATE job_queue SET status = 'done' WHERE id IN (...);
COMMIT;
-- SELECT FOR NO KEY UPDATE:比 FOR UPDATE 弱,不阻塞 FOR SHARE
-- 适合不涉及主键/唯一键的更新
-- SELECT FOR SHARE:允许其他事务也加 FOR SHARE,但阻止 UPDATE
SELECT * FROM users WHERE id = 1 FOR SHARE;

查看当前锁等待

-- 方法一:pg_locks + pg_stat_activity 联查(最详细)
SELECT
blocked.pid         AS blocked_pid,
blocked.query       AS blocked_query,
blocking.pid        AS blocking_pid,
blocking.query      AS blocking_query,
blocked.wait_event  AS wait_event,
now() - blocked.query_start AS blocked_duration
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE cardinality(pg_blocking_pids(blocked.pid)) > 0;
-- 方法二:pg_blocking_pids(PostgreSQL 9.6+,最简洁)
SELECT pid, pg_blocking_pids(pid) AS blocked_by, query
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0;
-- 方法三:查看所有活跃锁
SELECT
l.pid,
l.locktype,
l.relation::regclass AS table_name,
l.mode,
l.granted,
a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted  -- 只看等待中的锁
ORDER BY l.pid;

死锁:原因与排查

死锁发生在两个事务互相等待对方持有的锁:

事务 A:
1. 锁定 accounts WHERE id=1(想锁 id=2)
事务 B:
2. 锁定 accounts WHERE id=2(想锁 id=1)
→ 互相等待,形成死锁
-- 死锁复现示例:
-- 会话 A:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- 等待会话 B 提交...
-- 会话 B:
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
UPDATE accounts SET balance = balance + 50 WHERE id = 1;  -- 等待会话 A
-- 回到会话 A:
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- 等待会话 B
-- 结果:PostgreSQL 检测到死锁,自动终止其中一个事务:
-- ERROR: deadlock detected
-- DETAIL: Process 12345 waits for ShareLock on transaction 67890;
--   blocked by process 67890.
--   Process 67890 waits for ShareLock on transaction 12345;
--   blocked by process 12345.

查看死锁日志

# postgresql.conf 中开启死锁日志
log_lock_waits = on      # 记录锁等待超过 deadlock_timeout 的事件
deadlock_timeout = 1s    # 默认 1 秒,超时后检查死锁
# 在日志中查找死锁
grep "deadlock detected" /var/log/postgresql/postgresql-*.log

死锁预防:统一加锁顺序

-- ✅ 预防死锁:按固定顺序操作行
-- 规则:总是按 id 从小到大的顺序更新
-- 转账场景:from_id=5, to_id=3
-- 规则:先锁 id=3,再锁 id=5(按 id 升序)
BEGIN;
-- 用 FOR UPDATE 预锁,ORDER BY id 确保顺序
SELECT id, balance FROM accounts
WHERE id IN (3, 5)
ORDER BY id  -- 关键:统一顺序
FOR UPDATE;
-- 然后执行更新
UPDATE accounts SET balance = balance - 100 WHERE id = 5;
UPDATE accounts SET balance = balance + 100 WHERE id = 3;
COMMIT;

Advisory Locks(应用级锁)

Advisory Lock 是 PostgreSQL 提供的应用语义锁,不与任何表或行关联:

-- 场景:防止定时任务并发运行(分布式锁替代方案)
-- pg_try_advisory_lock:尝试获取锁,获取失败立即返回 false(非阻塞)
SELECT pg_try_advisory_lock(12345);
-- 返回 true:成功获取锁
-- 返回 false:锁已被其他会话持有
-- 典型用法:
DO $$
BEGIN
IF NOT pg_try_advisory_lock(42) THEN
RAISE NOTICE '任务已在运行,跳过';
RETURN;
END IF;
-- 执行只需要单实例运行的任务
PERFORM run_daily_report();
PERFORM pg_advisory_unlock(42);
END;
$$;
-- pg_advisory_lock:阻塞式,等待直到获取锁
SELECT pg_advisory_lock(42);
-- ... 执行任务 ...
SELECT pg_advisory_unlock(42);
-- 事务级 Advisory Lock(事务结束自动释放)
SELECT pg_advisory_xact_lock(42);
-- 不需要手动 unlock,事务提交/回滚时自动释放
-- 查看当前 Advisory Lock
SELECT pid, classid, objid, mode, granted
FROM pg_locks
WHERE locktype = 'advisory';

强制终止阻塞进程

-- 查找阻塞时间超过 5 分钟的进程
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
AND now() - query_start > INTERVAL '5 minutes'
ORDER BY duration DESC;
-- 取消查询(不杀死连接,只取消当前查询)
SELECT pg_cancel_backend(12345);
-- 强制终止进程(谨慎使用)
SELECT pg_terminate_backend(12345);
-- 批量终止所有空闲时间超过 30 分钟的连接
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND state_change < now() - INTERVAL '30 minutes'
AND datname = 'myapp_production';

下一节乐观锁与悲观锁选型——SELECT FOR UPDATE 是悲观锁,适合冲突频繁的场景;版本号乐观锁适合冲突少的高并发场景。如何选择和实现?