窗口函数:ROW_NUMBER、RANK、LAG、LEAD
High Contrast
Dark Mode
Light Mode
Sepia
Forest
2 min read399 words

窗口函数:ROW_NUMBER、RANK、LAG、LEAD

窗口函数(Window Function)是 PostgreSQL 最强大的分析工具之一。与 GROUP BY 不同,窗口函数在计算聚合值的同时保留了所有原始行——你既能看到每一行的细节,又能看到它在分组中的位置或与相邻行的关系。


窗口函数 vs GROUP BY

-- GROUP BY:每组聚合为一行(原始行消失)
SELECT user_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY user_id;
-- 窗口函数:每行都保留,同时添加聚合计算
SELECT
id,
user_id,
total_amount,
SUM(total_amount) OVER (PARTITION BY user_id) AS user_total_spent,
COUNT(*) OVER (PARTITION BY user_id) AS user_order_count
FROM orders;
-- 输出:每个订单行都有,同时有该用户的汇总统计

窗口函数的语法结构

function_name() OVER (
[PARTITION BY 分组列]    -- 可选:分组(窗口的边界)
[ORDER BY 排序列]        -- 可选:行顺序(影响 RANK、ROW_NUMBER、LAG 等)
[ROWS/RANGE BETWEEN ...]-- 可选:窗口帧(滑动窗口)
)

排名函数

ROW_NUMBER — 不重复的序号

-- 找出每个用户消费最高的前 3 笔订单
WITH ranked_orders AS (
SELECT
id,
user_id,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY total_amount DESC
) AS rn
FROM orders
)
SELECT id, user_id, total_amount
FROM ranked_orders
WHERE rn <= 3;

RANK vs DENSE_RANK — 处理相同值

SELECT
email,
total_spent,
RANK()       OVER (ORDER BY total_spent DESC) AS rank,        -- 并列后有跳跃(1,2,2,4)
DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank,  -- 并列后无跳跃(1,2,2,3)
PERCENT_RANK() OVER (ORDER BY total_spent DESC) AS pct_rank   -- 百分位排名(0~1)
FROM user_spending;

分析函数:LAG 和 LEAD

LAG 和 LEAD 让你能访问"前一行"或"后一行"的数据,常用于计算环比变化:

-- 计算每月订单数的环比增长
WITH monthly_orders AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS order_count
FROM orders
GROUP BY 1
)
SELECT
month,
order_count,
LAG(order_count) OVER (ORDER BY month) AS prev_month_count,
order_count - LAG(order_count) OVER (ORDER BY month) AS mom_change,
ROUND(
100.0 * (order_count - LAG(order_count) OVER (ORDER BY month))
/ NULLIF(LAG(order_count) OVER (ORDER BY month), 0),
1
) AS mom_pct_change
FROM monthly_orders
ORDER BY month;
-- LEAD:访问后一行(预测下一期)
SELECT
month,
order_count,
LEAD(order_count) OVER (ORDER BY month) AS next_month_count
FROM monthly_orders;

聚合窗口函数

聚合函数也可以用 OVER 变成窗口函数:

-- 累计总额(Running Total)
SELECT
created_at::date AS date,
total_amount,
SUM(total_amount) OVER (ORDER BY created_at) AS running_total
FROM orders
ORDER BY created_at;
-- 滑动 7 天移动平均
SELECT
created_at::date AS date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_revenue_summary;
-- 每个分组内的占比
SELECT
user_id,
total_amount,
ROUND(
100.0 * total_amount / SUM(total_amount) OVER (PARTITION BY user_id),
1
) AS pct_of_user_total
FROM orders;

实用:FIRST_VALUE、LAST_VALUE、NTH_VALUE

-- 每个用户的第一笔和最近一笔订单金额
SELECT
user_id,
total_amount,
created_at,
FIRST_VALUE(total_amount) OVER (
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_order_amount,
LAST_VALUE(total_amount) OVER (
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS latest_order_amount
FROM orders;

DISTINCT ON:PostgreSQL 特有的去重取首行

这不是窗口函数,但经常和窗口函数一起用,用于取每组的第一行:

-- 取每个用户最近的一笔订单(最简洁的写法)
SELECT DISTINCT ON (user_id)
user_id, id AS order_id, total_amount, created_at
FROM orders
ORDER BY user_id, created_at DESC;
-- DISTINCT ON 取的是 ORDER BY 后的第一行
-- 注意:ORDER BY 必须以 DISTINCT ON 的列开头

窗口函数速查表

函数 说明 需要 ORDER BY
ROW_NUMBER() 每组内不重复的序号
RANK() 并列时序号有跳跃
DENSE_RANK() 并列时序号无跳跃
PERCENT_RANK() 百分位排名(0~1)
NTILE(n) 把每组分成 n 个桶
LAG(col, n) 取前第 n 行的值
LEAD(col, n) 取后第 n 行的值
FIRST_VALUE(col) 窗口内第一行的值 可选
LAST_VALUE(col) 窗口内最后一行的值 可选
SUM() OVER 累计求和 / 分组求和 可选
AVG() OVER 移动平均 / 分组平均 可选
COUNT() OVER 分组内计数 可选

下一节横转纵与纵转横(Pivot)——如何把多行数据转成多列?如何把多列转成多行?