从一个游戏SQL的优化说起‐4 - l1t1/note GitHub Wiki

通过查看执行计划,绝大多数的时间都花在“生成所有可能的组合”这步,可见关键是优化这个子查询。仔细看关联条件ON a.category = b.category在这个例子中实际相当于按玩家分区,剩余的就是笛卡尔积。 再看过滤条件,a.n < b.n AND b.n < c.n AND e.n < c.n用于去重复,ARRAY_LENGTH(ARRAY_DISTINCT(ARRAY_CONCAT(a.n, ARRAY_CONCAT(b.n, ARRAY_CONCAT(c.n, d.n+e.n))))) = 14用于筛选结果中的14张牌组合。 容易想到,把过滤条件移到关联条件,可以实现剪枝,仅关联不重复的3张和对子组合,而14张牌只要把左连接改为内连接自然就能保证。因为数组比较的开销较大,设想通过把牌序号用位图表示,求它们按位与(bit_and)的方法来判断。改写结果如下:

WITH 
-- games as(with t as (select chr(126983+i::int) pic from generate_series(0,26)t(i))select 2, string_agg(pic,''order by pic)j from t,generate_series(1,4) ),
-- 将108张牌分配给4个人,用轮流、顺序分配的方案
tmp0 AS (
    SELECT 
        a.i AS category,
        b.i AS seq,
        SUBSTRING(c.j, (a.i * 27) + b.i, 1) AS chr,
        ASCII(chr) - 126982 AS n,
        (n - 1) // 9 AS hs,
        CASE n % 9 WHEN 0 THEN 9 ELSE n % 9 END AS pd
    FROM 
        range(4) a(i),
        range(1, 28) b(i),
        games c
),
-- 第一步,能凑成对子、2张顺子情况
tmp1 AS (
    SELECT 
        a.*,
        MIN(b.seq) FILTER (where a.pd = b.pd AND b.seq > a.seq) AS dz,
        MIN(b.seq) FILTER (where a.pd = b.pd + 1 AND b.seq > a.seq) AS sz1,
        MIN(b.seq) FILTER (where a.pd = b.pd - 1 AND b.seq > a.seq) AS sz2
    FROM 
        tmp0 a
    LEFT JOIN 
        tmp0 b ON a.category = b.category AND a.hs = b.hs
    GROUP BY 
        a.category, a.seq, a.chr, a.n, a.hs, a.pd
    ORDER BY 
        a.category, a.seq
),
-- 第二步,加能凑成三张顺子的情况
tmp2 AS (
    SELECT 
        a.*,
        MIN(b.seq) FILTER (where a.pd = b.pd + 2 AND b.seq > a.seq AND sz1 IS NOT NULL) AS sz11,
        MIN(b.seq) FILTER (where a.pd = b.pd - 2 AND b.seq > a.seq AND sz2 IS NOT NULL) AS sz22,
        MIN(b.seq) FILTER (where a.pd = b.pd AND b.seq > a.dz AND dz IS NOT NULL) AS sz
    FROM 
        tmp1 a
    LEFT JOIN 
        tmp0 b ON a.category = b.category AND a.hs = b.hs
    GROUP BY 
        a.category, a.seq, a.chr, a.n, a.hs, a.pd, a.dz, a.sz1, a.sz2
    ORDER BY 
        a.category, a.seq
),
-- 生成所有可以形成3张组合的牌
tmp_3z AS (select *,(1<<n[1])+(1<<n[2])+(1<<n[3])b from(
    SELECT 
        category, 
        LIST_SORT([seq, sz1, sz11]) AS n 
    FROM 
        tmp2 
    WHERE 
        sz11 IS NOT NULL
    UNION  
    SELECT 
        category, 
        LIST_SORT([seq, sz2, sz22]) AS n 
    FROM 
        tmp2 
    WHERE 
        sz22 IS NOT NULL
    UNION 
    SELECT 
        category, 
        LIST_SORT([seq, dz, sz]) AS n 
    FROM 
        tmp2 
    WHERE 
        sz IS NOT NULL)
),
-- 生成所有对子组合
tmp_dz AS (
    SELECT 
        category, 
        LIST_SORT([seq, dz]) AS n,(1<<seq)+(1<<dz)b 
    FROM 
        tmp1 
    WHERE 
        dz IS NOT NULL
)
-- 生成所有可能的组合
,tmp_all AS (
    SELECT
        a.category,
        a.n AS sz1,
        b.n AS sz2,
        c.n AS sz3,e.n AS sz4,
        d.n AS dz,
        LIST_REVERSE_SORT(a.n+b.n+c.n+d.n+e.n) AS n
    FROM
        tmp_3z a
    JOIN
        tmp_3z b ON a.category = b.category and a.b < b.b and a.b & b.b =0
    JOIN
        tmp_3z c ON a.category = c.category and a.b & c.b =0 and b.b & c.b =0 and b.b < c.b
    JOIN
        tmp_3z e ON a.category = e.category and a.b & e.b =0 and b.b & e.b =0 and c.b & e.b =0 and c.b<e.b
    JOIN
        tmp_dz d ON a.category = d.category and a.b & d.b =0 and b.b & d.b =0 and c.b & d.b =0 and e.b & d.b =0
)
-- 用最少摸牌次数能胡牌的组合
,tmp_min AS (
    SELECT
        MIN(n) AS n ,category
    FROM
        tmp_all group by category order by 1,2 limit 1
),
-- 最终结果
final_result AS (
    SELECT
        STRING_AGG(a.chr, '' ORDER BY a.chr) AS ret
    FROM
        tmp1 a
    JOIN
        tmp_min m ON a.category = m.category and a.seq IN (SELECT UNNEST(m.n))
)
SELECT * FROM  final_result;

│ 🀈🀉🀊🀑🀒🀓🀕🀕🀚🀛🀜🀟🀠🀡 │

Run Time (s): real 0.072 user 0.108000 sys 0.044000
D truncate table games;

D with t as (select chr(126983+i::int) pic from generate_series(0,26)t(i))insert into games select 2, string_agg(pic,''order by pic)j from t,generate_series(1,4) ;

D .read mj5.txt
│ 🀇🀇🀇🀇🀈🀈🀈🀈🀉🀉🀉🀉🀊🀊 │

Run Time (s): real 0.188 user 0.484000 sys 0.052000

可见,针对极端数据和随机数据的表现都比较好。 优化过程中,尝试过用array_intersect()和array_has_any()等函数放入on条件,虽然也有较大提升,但数组比较的开销还是比整数大,因此弃用。 后记:程宁同学把我最后的SQL丢给AI去分析,它给出如下结果,真的无所不知,无语了。

3.总结
1 << n[1]
。将数组中的每个元素转换为一个唯一的二进制位。
。通过相加得到一个整数,表示数组的元素组合。 
a.b & d.b = 0
。检查两个数组是否有重复元素。
。如果结果为0,说明两个数组的元素没有重复。
作用:
。这些位运算用于高效地表示和检查数组中的元素是否重复,避免了复杂的数组操作和循环,提升了代码的性能。