在WSL中试用pg_mooncake插件的docker镜像 - l1t1/note GitHub Wiki
pg_mooncake插件刚刚发布了0.1版,最简单的试用方法是利用他们发布的docker镜像。 1.拉取镜像
docker pull docker.1ms.run/mooncakelabs/pg_mooncake
Trying to pull docker.1ms.run/mooncakelabs/pg_mooncake:latest...
Getting image source signatures
Copying blob 23db180a1f67 done
Copying blob 7cebbe7849b3 done
Copying blob af302e5c37e9 done
Copying blob dc59dd9c8eb3 done
Copying blob 4dd47a683737 done
Copying blob aec09e638045 done
Copying blob dc4330b02129 done
Copying blob 498cc40b9fe9 done
Copying blob 8f14f34d54d3 done
Copying blob 6d3411bb4696 done
Copying blob 88d4f7416643 done
Copying blob e91ad5cfb8d0 done
Copying blob e0c4d5055fb9 done
Copying blob 254ee626d709 done
Copying blob d8d4b0c4a2ef done
Copying blob aeea401801ad done
Copying config 8d085fd5b2 done
Writing manifest to image destination
Storing signatures
8d085fd5b274ccca4d160239c79082306ca602167744621e42f63aedaac4b50a
2.查看镜像ID
docker image list -a
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.1ms.run/mooncakelabs/pg_mooncake latest 8d085fd5b274 4 days ago 630 MB
3.在后台运行服务器端容器
docker run --cgroup-manager=cgroupfs --events-backend=file --net=host -it --name mooncake-demo -e POSTGRES_HOST_AUTH_METHOD=trust -d 8d085fd5b274
013eb0cab9f67f03005ecb8458b93dbd3e6fdce61ef61167d342176cb0d49950
4.连接到服务器端,并执行psql
docker exec -it 013eb0cab9f67f03 /bin/bash
root@DESKTOP-59T6U68:/# psql -h localhost -U postgres
psql (17.2 (Debian 17.2-1.pgdg120+1))
Type "help" for help.
postgres=# CREATE EXTENSION pg_mooncake;
CREATE EXTENSION
postgres=# CREATE TABLE user_activity(
user_id BIGINT,
activity_type TEXT,
activity_timestamp TIMESTAMP,
duration INT
) USING columnstore;
CREATE TABLE
postgres=# INSERT INTO user_activity VALUES
(1, 'login', '2024-01-01 08:00:00', 120),
(2, 'page_view', '2024-01-01 08:05:00', 30),
(3, 'logout', '2024-01-01 08:30:00', 60),
(4, 'error', '2024-01-01 08:13:00', 60);
INSERT 0 4
postgres=# SELECT * from user_activity;
user_id | activity_type | activity_timestamp | duration
---------+---------------+---------------------+----------
1 | login | 2024-01-01 08:00:00 | 120
2 | page_view | 2024-01-01 08:05:00 | 30
3 | logout | 2024-01-01 08:30:00 | 60
4 | error | 2024-01-01 08:13:00 | 60
(4 rows)
postgres=# SELECT * FROM mooncake.columnstore_tables;
table_name | path
---------------+---------------------------------------------------------------------------------------
user_activity | /var/lib/postgresql/data/mooncake_local_tables/mooncake_postgres_user_activity_16441/
(1 row)
postgres=# \q
root@DESKTOP-59T6U68:/# ls -la /var/lib/postgresql/data/mooncake_local_tables/mooncake_postgres_user_activity_16441/
total 16
drwx------ 3 postgres postgres 4096 Jan 25 00:57 .
drwx------ 3 postgres postgres 4096 Jan 25 00:56 ..
drwx------ 2 postgres postgres 4096 Jan 25 00:57 _delta_log
-rw------- 1 postgres postgres 699 Jan 25 00:57 f02a351a-d530-49b8-87f6-c6275ebf90d6.parquet
root@DESKTOP-59T6U68:/# psql -h localhost -U postgres
psql (17.2 (Debian 17.2-1.pgdg120+1))
Type "help" for help.
postgres=# INSERT INTO user_activity select i,'a'||i,now(),i*10%200 from generate_series(5,10000000)t(i);
INSERT 0 9999996
postgres=# \timing on
Timing is on.
postgres=# select sum(i) from user_activity;
ERROR: column "i" does not exist
LINE 1: select sum(i) from user_activity;
^
Time: 0.319 ms
postgres=# select sum(user_id) from user_activity;
sum
----------------
50000005000000
(1 row)
Time: 16.652 ms
postgres=# select sum(i) from generate_series(5,10000000)t(i);
sum
----------------
50000004999990
(1 row)
Time: 793.756 ms
postgres=# explain analyze select sum(i) from generate_series(5,10000000)t(i);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=124999.95..124999.96 rows=1 width=8) (actual time=1030.718..1030.719 rows=1 loops=1)
-> Function Scan on generate_series t (cost=0.00..99999.96 rows=9999996 width=4) (actual time=374.282..763.039 rows=9999996 loops=1)
Planning Time: 0.180 ms
JIT:
Functions: 5
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.567 ms (Deform 0.094 ms), Inlining 0.000 ms, Optimization 0.096 ms, Emission 1.426 ms, Total 2.089 ms
Execution Time: 1037.091 ms
(8 rows)
Time: 1038.131 ms (00:01.038)
postgres=# explain analyze select sum(user_id) from user_activity;
Time: 24.427 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Custom Scan (MooncakeDuckDBScan) (cost=0.00..0.00 rows=0 width=0) (actual time=12.070..12.110 rows=1 loops=1)
DuckDB Execution Plan:
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE SELECT sum(user_id) AS sum FROM pgmooncake.public.user_activity
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.0095s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ UNGROUPED_AGGREGATE │
│ ──────────────────── │
│ Aggregates: sum(#0) │
│ │
│ 1 Rows │
│ (0.01s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ user_id │
│ │
│ 10000000 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ TABLE_SCAN │
│ ──────────────────── │
│ Function: │
│ COLUMNSTORE_SCAN │
│ │
│ Projections: user_id │
│ │
│ │
│ 10000000 Rows │
│ (0.10s) │
└───────────────────────────┘
Planning Time: 1.085 ms
Execution Time: 12.410 ms
(54 rows)
(END)
可见,pg_mooncake的列存格式表在后台采取parquet文件和_delta_log的形式,执行查询则调用duckdb库,执行计划也是复用duckdb的计划。 这个镜像是基于postgresql 17.2官方镜像改造的。