从一个游戏SQL的优化说起‐4 - l1t1/note GitHub Wiki
通过查看执行计划,绝大多数的时间都花在“生成所有可能的组合”这步,可见关键是优化这个子查询。仔细看关联条件ON a.category = b.category在这个例子中实际相当于按玩家分区,剩余的就是笛卡尔积。 再看过滤条件,a.n < b.n AND b.n < c.n AND e.n < c.n用于去重复,ARRAY_LENGTH(ARRAY_DISTINCT(ARRAY_CONCAT(a.n, ARRAY_CONCAT(b.n, ARRAY_CONCAT(c.n, d.n+e.n))))) = 14用于筛选结果中的14张牌组合。 容易想到,把过滤条件移到关联条件,可以实现剪枝,仅关联不重复的3张和对子组合,而14张牌只要把左连接改为内连接自然就能保证。因为数组比较的开销较大,设想通过把牌序号用位图表示,求它们按位与(bit_and)的方法来判断。改写结果如下:
WITH
-- games as(with t as (select chr(126983+i::int) pic from generate_series(0,26)t(i))select 2, string_agg(pic,''order by pic)j from t,generate_series(1,4) ),
-- 将108张牌分配给4个人,用轮流、顺序分配的方案
tmp0 AS (
SELECT
a.i AS category,
b.i AS seq,
SUBSTRING(c.j, (a.i * 27) + b.i, 1) AS chr,
ASCII(chr) - 126982 AS n,
(n - 1) // 9 AS hs,
CASE n % 9 WHEN 0 THEN 9 ELSE n % 9 END AS pd
FROM
range(4) a(i),
range(1, 28) b(i),
games c
),
-- 第一步,能凑成对子、2张顺子情况
tmp1 AS (
SELECT
a.*,
MIN(b.seq) FILTER (where a.pd = b.pd AND b.seq > a.seq) AS dz,
MIN(b.seq) FILTER (where a.pd = b.pd + 1 AND b.seq > a.seq) AS sz1,
MIN(b.seq) FILTER (where a.pd = b.pd - 1 AND b.seq > a.seq) AS sz2
FROM
tmp0 a
LEFT JOIN
tmp0 b ON a.category = b.category AND a.hs = b.hs
GROUP BY
a.category, a.seq, a.chr, a.n, a.hs, a.pd
ORDER BY
a.category, a.seq
),
-- 第二步,加能凑成三张顺子的情况
tmp2 AS (
SELECT
a.*,
MIN(b.seq) FILTER (where a.pd = b.pd + 2 AND b.seq > a.seq AND sz1 IS NOT NULL) AS sz11,
MIN(b.seq) FILTER (where a.pd = b.pd - 2 AND b.seq > a.seq AND sz2 IS NOT NULL) AS sz22,
MIN(b.seq) FILTER (where a.pd = b.pd AND b.seq > a.dz AND dz IS NOT NULL) AS sz
FROM
tmp1 a
LEFT JOIN
tmp0 b ON a.category = b.category AND a.hs = b.hs
GROUP BY
a.category, a.seq, a.chr, a.n, a.hs, a.pd, a.dz, a.sz1, a.sz2
ORDER BY
a.category, a.seq
),
-- 生成所有可以形成3张组合的牌
tmp_3z AS (select *,(1<<n[1])+(1<<n[2])+(1<<n[3])b from(
SELECT
category,
LIST_SORT([seq, sz1, sz11]) AS n
FROM
tmp2
WHERE
sz11 IS NOT NULL
UNION
SELECT
category,
LIST_SORT([seq, sz2, sz22]) AS n
FROM
tmp2
WHERE
sz22 IS NOT NULL
UNION
SELECT
category,
LIST_SORT([seq, dz, sz]) AS n
FROM
tmp2
WHERE
sz IS NOT NULL)
),
-- 生成所有对子组合
tmp_dz AS (
SELECT
category,
LIST_SORT([seq, dz]) AS n,(1<<seq)+(1<<dz)b
FROM
tmp1
WHERE
dz IS NOT NULL
)
-- 生成所有可能的组合
,tmp_all AS (
SELECT
a.category,
a.n AS sz1,
b.n AS sz2,
c.n AS sz3,e.n AS sz4,
d.n AS dz,
LIST_REVERSE_SORT(a.n+b.n+c.n+d.n+e.n) AS n
FROM
tmp_3z a
JOIN
tmp_3z b ON a.category = b.category and a.b < b.b and a.b & b.b =0
JOIN
tmp_3z c ON a.category = c.category and a.b & c.b =0 and b.b & c.b =0 and b.b < c.b
JOIN
tmp_3z e ON a.category = e.category and a.b & e.b =0 and b.b & e.b =0 and c.b & e.b =0 and c.b<e.b
JOIN
tmp_dz d ON a.category = d.category and a.b & d.b =0 and b.b & d.b =0 and c.b & d.b =0 and e.b & d.b =0
)
-- 用最少摸牌次数能胡牌的组合
,tmp_min AS (
SELECT
MIN(n) AS n ,category
FROM
tmp_all group by category order by 1,2 limit 1
),
-- 最终结果
final_result AS (
SELECT
STRING_AGG(a.chr, '' ORDER BY a.chr) AS ret
FROM
tmp1 a
JOIN
tmp_min m ON a.category = m.category and a.seq IN (SELECT UNNEST(m.n))
)
SELECT * FROM final_result;
│ 🀈🀉🀊🀑🀒🀓🀕🀕🀚🀛🀜🀟🀠🀡 │
Run Time (s): real 0.072 user 0.108000 sys 0.044000
D truncate table games;
D with t as (select chr(126983+i::int) pic from generate_series(0,26)t(i))insert into games select 2, string_agg(pic,''order by pic)j from t,generate_series(1,4) ;
D .read mj5.txt
│ 🀇🀇🀇🀇🀈🀈🀈🀈🀉🀉🀉🀉🀊🀊 │
Run Time (s): real 0.188 user 0.484000 sys 0.052000
可见,针对极端数据和随机数据的表现都比较好。 优化过程中,尝试过用array_intersect()和array_has_any()等函数放入on条件,虽然也有较大提升,但数组比较的开销还是比整数大,因此弃用。 后记:程宁同学把我最后的SQL丢给AI去分析,它给出如下结果,真的无所不知,无语了。
3.总结
1 << n[1]
。将数组中的每个元素转换为一个唯一的二进制位。
。通过相加得到一个整数,表示数组的元素组合。
a.b & d.b = 0
。检查两个数组是否有重复元素。
。如果结果为0,说明两个数组的元素没有重复。
作用:
。这些位运算用于高效地表示和检查数组中的元素是否重复,避免了复杂的数组操作和循环,提升了代码的性能。