[关闭]
@SovietPower 2022-05-01T15:03:37.000000Z 字数 20856 阅读 904

数据库系统 实验5 存储与索引

DB



概念

Database (数据库)

Tablespace (表空间)

Schema (模式)

表空间管理命令

创建表空间
可指定owner,其拥有该表空间的所有权限。
要指定的相对位置是相对安装目录下的位置var/lib/opengauss/data。可指定最大大小。

  1. CREATE TABLESPACE tablespace_name [ OWNER user_name ]
  2. RELATIVE LOCATION 'directory' [ MAXSIZE 'space_size' ] [ with_option_clause ];

重命名表空间

  1. ALTER TABL ESPACE tablespace_name RENAME TO new_tablespace_name;

修改表空间所有者

  1. ALTER TABLESPACE tablespace_name OWNER TO new_owner;

修改表空间限额

  1. ALTER TABL ESPACE tablespace_name RESIZE MAXSIZE { UNLIMITED I'space_size'};

删除表空间
删除前表空间中不能有任何数据库对象

  1. DROP TABLESPACE [ IF EXISTS ] tablespace_name;

模式管理命令

用指定名称创建模式

  1. CREATE SCHEMA schema_name [ AUTHORIZATION user_name ] [ schema_element [...] ];

用用户名创建模式

  1. CREATE SCHEMA AUTHORIZATION username [ schema_element [...] ];

修改模式的名称

  1. ALTER SCHEMA schema_name RENAME TO new_name;

修改模式的所有者

  1. ALTER SCHEMA schema_name OWNER TO new_owner;

删除模式
若该模式下有其它对象,CASCADE为全删掉;RESTRICT为不允许删除该模式。

  1. DROP SCHEMA [IF EXISTS ] schema_name [, ...] [ CASCADE | RESTRICT ];

索引

  1. CREATE [UNIQUE] INDEX [ CONCURRENTLY ] [ [schema_name.]index_name ] ON table_name [ USING method ]
  2. ({{ col_name | ( expression)} [COLLATE collation ][ opclass ][ ASC | DESC][ NULLS { FIRST | LAST}]}[,..])
  3. [WITH ( {storage_parameter = value}[,..])]
  4. [ TABLESPACE tablespace_name ]
  5. [ WHERE predicate ];

unique表示不允许该索引值有重复。CONCURRENTLY为并行创建。

method:

  1. btree:类似于B+树
  2. hashhash索引
  3. gin:倒排索引
  4. gist:适用于几何和地理等多维数据
  5. Psort:针对列存表进行局部排序索引
  6. ubtree:多版本B-tree索引
  1. DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] index name [,..]
  2. [ CASCADE | RESTRICT ];

查看索引:

  1. select * from pg_indexes where [schemaname='...' and ] tablename='...';

实验

首先运行之前创建的openGauss容器,连接数据库db2022。

  1. PS H:\> docker exec -it opengauss3 bash
  2. root@97375c3acbd1:/# su omm
  3. omm@97375c3acbd1:/$ gsql -d db2022 -r

创建一个名为 myspace 的表空间

  1. db2022=# create tablespace myspace RELATIVE LOCATION 'tablespace/myspace';
  2. CREATE TABLESPACE

创建一个名为 myschema 的模式

同时在该模式下创建数据表 mytable 和视图 myview。

  1. db2022=# create schema myschema authorization gxb
  2. db2022-# create table mytable (id int, value int, primary key (id))
  3. db2022-# create view myview as select * from public.takes;
  4. NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "mytable_pkey" for table "mytable"
  5. CREATE SCHEMA

练习在查询中使用模式

记录每条语句返回的结果并解释其完成的操作

  1. db2022=# show search_path;
  2. search_path
  3. ----------------
  4. "$user", public
  5. (1 row)
  6. 当前模式的search_path
  7. db2022=# insert into mytable values(1,100);
  8. ERROR: relation "mytable" does not exist on gaussdb
  9. LINE 1: insert into mytable values(1,100);
  10. 之前创建的表位于 myschema,但当前 search_path 不包含 myschema,故不指定 myschema 则找不到该表。
  11. ^
  12. db2022=# create table mytable (id int, value int, primary key (id));
  13. NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "mytable_pkey" for table "mytable"
  14. CREATE TABLE
  15. 创建表 public.mytable
  16. db2022=# insert into mytable values(1,100);
  17. INSERT 0 1
  18. search_path 包含 public,故可找到表 public.mytable
  19. db2022=# insert into myschema.mytable values(1,100);
  20. INSERT 0 1
  21. 指定模式,故可找到表 myschema.mytable
  22. db2022=# SET search_path TO myschema,public;
  23. SET
  24. 修改 search_path
  25. db2022=# insert into mytable values(2,200);
  26. INSERT 0 1
  27. 因为 myschema 包含该表,所以向表 myschema.mytable 中插入。
  28. db2022=# insert into myschema.mytable values(2,200);
  29. ERROR: duplicate key value violates unique constraint "mytable_pkey"
  30. DETAIL: Key (id)=(2) already exists.
  31. 重复插入。此时 myschema.mytable 就是 mytable
  32. db2022=# insert into public.mytable values(2,200);
  33. INSERT 0 1
  34. 此时使用 public.mytable 需指定模式名 public

观察索引对查询性能的影响

记录每条语句返回的结果并解释其完成的操作

  1. db2022=# create table takes (like public.takes) tablespace myspace;
  2. CREATE TABLE
  3. 创建 myschema.takes,结构同之前的takes,使用自定义表空间。
  4. db2022=# insert into takes select * from public.takes;
  5. INSERT 0 30094
  6. 导入之前的 takes 中的数据
  7. db2022=# select * from pg_indexes where schemaname='public' and tablename='takes';
  8. schemaname | tablename | indexname | tablespace | indexdef
  9. ------------+-----------+------------+------------+---------------------------------------------------------------------
  10. ----------------------------------------------
  11. public | takes | takes_pkey | | CREATE UNIQUE INDEX takes_pkey ON takes USING btree (id, course_id,
  12. sec_id, semester, year) TABLESPACE pg_default
  13. (1 row)
  14. 之前的 takes 包含各字段上的 btree 索引
  15. db2022=# select * from pg_indexes where schemaname='myschema' and tablename='takes';
  16. schemaname | tablename | indexname | tablespace | indexdef
  17. ------------+-----------+-----------+------------+----------
  18. (0 rows)
  19. 新创建的 takes 不含索引
  20. db2022=# \timing on
  21. Timing is on.
  22. 显示每一次 sql 执行所花的时间
  23. db2022=# select * from myschema.takes where id='79446' and course_id='748' and sec_id='1' and semester='Fall' and year=2
  24. 003;
  25. id | course_id | sec_id | semester | year | grade
  26. -------+-----------+--------+----------+------+-------
  27. 79446 | 748 | 1 | Fall | 2003 | A
  28. (1 row)
  29. Time: 99.561 ms
  30. db2022=# select * from public.takes where id='79446' and course_id='748' and sec_id='1' and semester='Fall' and year=200
  31. 3;
  32. id | course_id | sec_id | semester | year | grade
  33. -------+-----------+--------+----------+------+-------
  34. 79446 | 748 | 1 | Fall | 2003 | A
  35. (1 row)
  36. Time: 45.003 ms
  37. 该单值查询中,有索引的表比无索引的表快1倍多

新建索引再次对比(在两个表中,后续的查询均比之前的查询快得多,可能是缓存了?但索引依旧有效)

  1. db2022=# create unique index takes_pk on myschema.takes (id,course_id,sec_id,semester,year) tablespace myspace;
  2. CREATE INDEX
  3. Time: 411.531 ms
  4. 在新表中也建相同的索引。
  5. db2022=# select * from myschema.takes where id='79446' and course_id='748' and sec_id='1' and semester='Fall' and year=2
  6. 003;
  7. id | course_id | sec_id | semester | year | grade
  8. -------+-----------+--------+----------+------+-------
  9. 79446 | 748 | 1 | Fall | 2003 | A
  10. (1 row)
  11. Time: 1.068 ms
  12. db2022=# select * from public.takes where id='79446' and course_id='748' and sec_id='1' and semester='Fall' and year=200
  13. 3;
  14. id | course_id | sec_id | semester | year | grade
  15. -------+-----------+--------+----------+------+-------
  16. 79446 | 748 | 1 | Fall | 2003 | A
  17. (1 row)
  18. Time: 1.008 ms
  19. 此时两个表中查询时间基本一致。
  20. db2022=# drop index myschema.takes_pk;
  21. DROP INDEX
  22. Time: 239.175 ms
  23. 删除索引
  24. db2022=# select * from myschema.takes where id='79446' and course_id='748' and sec_id='1' and semester='Fall' and year=2003;
  25. id | course_id | sec_id | semester | year | grade
  26. -------+-----------+--------+----------+------+-------
  27. 79446 | 748 | 1 | Fall | 2003 | A
  28. (1 row)
  29. Time: 4.761 ms
  30. db2022=# select * from public.takes where id='79446' and course_id='748' and sec_id='1' and semester='Fall' and year=2003;
  31. id | course_id | sec_id | semester | year | grade
  32. -------+-----------+--------+----------+------+-------
  33. 79446 | 748 | 1 | Fall | 2003 | A
  34. (1 row)
  35. Time: 0.885 ms
  36. 删除索引后,两表间查询时间又有了明显差距

观察索引对更新性能的影响

记录每条语句返回的结果并解释其完成的操作。
可能是电脑问题,用时浮动比较大。

无索引更新:

  1. db2022=# insert into myschema.takes values ('00000','000','0','Fall',2022,'Z');
  2. INSERT 0 1
  3. Time: 208.899 ms

对每个属性建立btree索引:
用时为无索引的3到4倍。

  1. db2022=# create index takes_1 on myschema.takes (id) tablespace myspace;
  2. CREATE INDEX
  3. Time: 567.284 ms
  4. db2022=# create index takes_2 on myschema.takes (course_id) tablespace myspace;
  5. CREATE INDEX
  6. Time: 347.021 ms
  7. db2022=# create index takes_3 on myschema.takes (sec_id) tablespace myspace;
  8. CREATE INDEX
  9. Time: 383.941 ms
  10. db2022=# create index takes_4 on myschema.takes (semester) tablespace myspace;
  11. CREATE INDEX
  12. Time: 453.346 ms
  13. db2022=# create index takes_5 on myschema.takes (year) tablespace myspace;
  14. CREATE INDEX
  15. Time: 787.605 ms
  16. db2022=# create index takes_6 on myschema.takes (grade) tablespace myspace;
  17. CREATE INDEX
  18. Time: 2343.505 ms
  19. db2022=# select * from pg_indexes where schemaname='myschema' and tablename='takes';
  20. schemaname | tablename | indexname | tablespace | indexdef
  21. ------------+-----------+-----------+------------+----------------------------------------------------------------------
  22. ----
  23. myschema | takes | takes_1 | myspace | CREATE INDEX takes_1 ON takes USING btree (id) TABLESPACE myspace
  24. myschema | takes | takes_2 | myspace | CREATE INDEX takes_2 ON takes USING btree (course_id) TABLESPACE mysp
  25. ace
  26. myschema | takes | takes_3 | myspace | CREATE INDEX takes_3 ON takes USING btree (sec_id) TABLESPACE myspace
  27. myschema | takes | takes_4 | myspace | CREATE INDEX takes_4 ON takes USING btree (semester) TABLESPACE myspa
  28. ce
  29. myschema | takes | takes_5 | myspace | CREATE INDEX takes_5 ON takes USING btree (year) TABLESPACE myspace
  30. myschema | takes | takes_6 | myspace | CREATE INDEX takes_6 ON takes USING btree (grade) TABLESPACE myspace
  31. (6 rows)
  32. Time: 102.131 ms
  33. db2022=# insert into myschema.takes values ('11111','111','1','Fall',2022,'Z');
  34. INSERT 0 1
  35. Time: 5397.037 ms
  36. db2022=# delete myschema.takes where id='11111' and course_id='111';
  37. DELETE 1
  38. Time: 545.066 ms
  39. db2022=# insert into myschema.takes values ('11111','111','1','Fall',2022,'Z');
  40. INSERT 0 1
  41. Time: 290.331 ms
  42. 删除之后重新插入会更快,可能也是缓存的问题?
  43. db2022=# insert into myschema.takes values ('11111','1110','1','Fall',2022,'Z');
  44. INSERT 0 1
  45. Time: 829.721 ms
  46. db2022=# insert into myschema.takes values ('11111','1120','1','Fall',2022,'Z');
  47. INSERT 0 1
  48. Time: 767.316 ms

对每个属性建立hash索引:
用时依旧为无索引的3到4倍。

  1. db2022=# create index takes_7 on myschema.takes using hash (id) tablespace myspace;
  2. CREATE INDEX
  3. Time: 994.802 ms
  4. db2022=# create index takes_8 on myschema.takes using hash (course_id) tablespace myspace;
  5. CREATE INDEX
  6. Time: 675.056 ms
  7. db2022=# create index takes_9 on myschema.takes using hash (sec_id) tablespace myspace;
  8. CREATE INDEX
  9. Time: 624.564 ms
  10. db2022=# create index takes_10 on myschema.takes using hash (semester) tablespace myspace;
  11. CREATE INDEX
  12. Time: 639.372 ms
  13. db2022=# create index takes_11 on myschema.takes using hash (year) tablespace myspace;
  14. CREATE INDEX
  15. Time: 356.669 ms
  16. db2022=# create index takes_12 on myschema.takes using hash (grade) tablespace myspace;
  17. CREATE INDEX
  18. Time: 3122.687 ms
  19. db2022=# insert into myschema.takes values ('22222','222','2','Fall',2022,'Z');
  20. INSERT 0 1
  21. Time: 723.869 ms
  22. db2022=# insert into myschema.takes values ('22221','227','2','Fall',2022,'Z');
  23. INSERT 0 1
  24. Time: 845.770 ms

对每个属性建立hash索引:
用时依旧为无索引的3到4倍。
前几次 insert 用时浮动极大,不知道原因。

  1. db2022=# create index takes_13 on myschema.takes (id,course_id) tablespace myspace;
  2. CREATE INDEX
  3. Time: 2344.680 ms
  4. db2022=# create index takes_14 on myschema.takes (course_id,sec_id) tablespace myspace;
  5. CREATE INDEX
  6. Time: 1309.072 ms
  7. db2022=# create index takes_15 on myschema.takes (sec_id,semester) tablespace myspace;
  8. CREATE INDEX
  9. Time: 1117.506 ms
  10. db2022=# create index takes_16 on myschema.takes (semester,year) tablespace myspace;
  11. CREATE INDEX
  12. Time: 1349.810 ms
  13. db2022=# create index takes_17 on myschema.takes (year,grade) tablespace myspace;
  14. CREATE INDEX
  15. Time: 625.121 ms
  16. db2022=# create index takes_18 on myschema.takes (grade,id) tablespace myspace;
  17. CREATE INDEX
  18. Time: 2183.103 ms
  19. db2022=# insert into myschema.takes values ('33333','333','3','Fall',2022,'Z');
  20. INSERT 0 1
  21. Time: 4466.036 ms
  22. db2022=# insert into myschema.takes values ('33444','355','4','Fall',2022,'Z');
  23. INSERT 0 1
  24. Time: 103.846 ms
  25. db2022=# insert into myschema.takes values ('44444','555','4','Fall',2022,'Z');
  26. INSERT 0 1
  27. Time: 378.363 ms
  28. db2022=# insert into myschema.takes values ('55555','555','4','Fall',2022,'Z');
  29. INSERT 0 1
  30. Time: 803.185 ms
  31. db2022=# insert into myschema.takes values ('66666','666','1','Fall',2022,'Z');
  32. INSERT 0 1
  33. Time: 836.411 ms
  34. db2022=# insert into myschema.takes values ('11166','116','3','Fall',2022,'Z');
  35. INSERT 0 1
  36. Time: 837.816 ms

删除本实验中所创建的模式和表空间

  1. db2022=# drop schema myschema;
  2. ERROR: cannot drop schema myschema because other objects depend on it
  3. DETAIL: table mytable depends on schema myschema
  4. view myview depends on schema myschema
  5. table takes depends on schema myschema
  6. HINT: Use DROP ... CASCADE to drop the dependent objects too.
  7. Time: 873.420 ms
  8. db2022=# drop tablespace myspace;
  9. ERROR: tablespace "myspace" is not empty
  10. Time: 2811.570 ms
  11. db2022=# drop schema myschema CASCADE;
  12. NOTICE: drop cascades to 3 other objects
  13. DETAIL: drop cascades to table mytable
  14. drop cascades to view myview
  15. drop cascades to table takes
  16. DROP SCHEMA
  17. Time: 1311.816 ms
  18. db2022=# drop tablespace myspace;
  19. DROP TABLESPACE
  20. Time: 3746.748 ms
  21. db2022=# select count(*) from myschema.takes;
  22. ERROR: schema "myschema" does not exist
  23. LINE 1: select count(*) from myschema.takes;
  24. ^
  25. Time: 0.941 ms

索引性能测试

测试维度

测试环境

  1. db2022=# show server_version;
  2. server_version
  3. ----------------
  4. 9.2.4
  5. (1 row)
  6. db2022=# select version();
  7. version
  8. (openGauss 2.1.0 build 590b0f8e) compiled at 2021-09-30 14:29:04 commit 0 last mr on x86_64-unknown-linux-gnu, compil
  9. ed by g++ (GCC) 7.3.0, 64-bit
  10. (1 row)

数据集
使用:prereq包含100条数据,takes包含约3w条数据,mytakes包含约100w条数据。
100条数据的不测了,比较麻烦,结论基本就是随数据集规模增长,索引查询优势会更大。
为方便删除index,使用新建的与takes相同的表takes1

下面是mytakes的生成方式,会包含大量idcourse_id相同的元组。

  1. db2022=# create table mytakes (like takes);
  2. CREATE TABLE
  3. Time: 5100.183 ms
  4. db2022=# insert into mytakes select * from takes;
  5. INSERT 0 30094
  6. Time: 1527.305 ms
  7. db2022=# create table takes1 (like takes);
  8. CREATE TABLE
  9. Time: 1974.941 ms
  10. db2022=# insert into takes1 select * from takes;
  11. INSERT 0 30094
  12. Time: 466.372 ms
  13. db2022=# update takes1 set course_id = course_id||'x';
  14. UPDATE 30094
  15. Time: 334.094 ms
  16. db2022=# insert into mytakes select * from takes1;
  17. INSERT 0 30094
  18. Time: 2277.785 ms
  19. db2022=# delete takes1;
  20. DELETE 30094
  21. Time: 12980.146 ms
  22. db2022=# insert into takes1 select * from takes;
  23. INSERT 0 30094
  24. Time: 11817.119 ms
  25. db2022=# update takes1 set course_id = course_id||'y';
  26. UPDATE 30094
  27. Time: 13041.597 ms
  28. db2022=# insert into mytakes select * from takes1;
  29. INSERT 0 30094
  30. Time: 15906.420 ms
  31. db2022=# delete takes1
  32. db2022-# ;
  33. DELETE 30094
  34. Time: 18279.577 ms
  35. db2022=# insert into takes1 select * from mytakes;
  36. INSERT 0 90282
  37. db2022=# update takes1 set sec_id=sec_id||'a';
  38. UPDATE 90282
  39. db2022=# insert into mytakes select * from takes1;
  40. INSERT 0 90282
  41. Time: 36853.872 ms
  42. db2022=# update takes1 set sec_id=sec_id||'b';
  43. UPDATE 90282
  44. Time: 654.162 ms
  45. db2022=# insert into mytakes select * from takes1;
  46. INSERT 0 90282
  47. Time: 532.678 ms
  48. db2022=# delete takes1;
  49. DELETE 90282
  50. Time: 3595.527 ms
  51. db2022=# select count(*) from mytakes;
  52. count
  53. --------
  54. 270846
  55. (1 row)
  56. Time: 50.027 ms
  57. db2022=# insert into takes1 select * from mytakes;
  58. INSERT 0 270846
  59. Time: 6825.950 ms
  60. db2022=# update takes1 set sec_id=sec_id||'c';
  61. UPDATE 270846
  62. db2022=# insert into mytakes select * from takes1;
  63. INSERT 0 270846
  64. db2022=# update takes1 set sec_id=sec_id||'d';
  65. UPDATE 270846
  66. db2022=# insert into mytakes select * from takes1;
  67. INSERT 0 270846
  68. db2022=# update takes1 set sec_id=sec_id||'e';
  69. UPDATE 270846
  70. db2022=# insert into mytakes select * from takes1;
  71. INSERT 0 270846
  72. db2022=# select count(*) from mytakes;
  73. count
  74. ---------
  75. 1083384
  76. (1 row)

无索引

3w条数据测试 等值查询

  1. db2022=# select * from pg_indexes where tablename='takes1';
  2. schemaname | tablename | indexname | tablespace | indexdef
  3. ------------+-----------+-----------+------------+----------
  4. (0 rows)
  5. Time: 2.366 ms
  6. db2022=# select * from takes1 where id='28361' and course_id='960' and sec_id='1';
  7. id | course_id | sec_id | semester | year | grade
  8. -------+-----------+--------+----------+------+-------
  9. 28361 | 960 | 1 | Fall | 2009 | C+
  10. (1 row)
  11. Time: 65.290 ms
  12. db2022=# select * from takes1 where id='259' and course_id='692' and sec_id='1';
  13. id | course_id | sec_id | semester | year | grade
  14. -----+-----------+--------+----------+------+-------
  15. 259 | 692 | 1 | Spring | 2010 | B-
  16. (1 row)
  17. Time: 67.276 ms

3w条数据测试 范围查询

  1. db2022=# select * from takes1 where course_id='476';
  2. id | course_id | sec_id | semester | year | grade
  3. -------+-----------+--------+----------+------+-------
  4. 44551 | 476 | 1 | Fall | 2010 | C-
  5. 51008 | 476 | 1 | Fall | 2010 | B+
  6. ...
  7. Time: 1076.589 ms
  8. db2022=# select * from takes1 where course_id='949';
  9. id | course_id | sec_id | semester | year | grade
  10. -------+-----------+--------+----------+------+-------
  11. 49450 | 949 | 1 | Fall | 2007 | A-
  12. 54728 | 949 | 1 | Fall | 2007 | C
  13. ...
  14. Time: 887.121 ms

100w条数据测试 等值查询

  1. db2022=# select * from mytakes where id='37339' and course_id='237y' and sec_id='1abc';
  2. id | course_id | sec_id | semester | year | grade
  3. -------+-----------+--------+----------+------+-------
  4. 37339 | 237y | 1abc | Spring | 2008 | B
  5. (1 row)
  6. Time: 192.314 ms
  7. db2022=# select * from mytakes where id='37339' and course_id='237y' and sec_id='1abcd';
  8. id | course_id | sec_id | semester | year | grade
  9. -------+-----------+--------+----------+------+-------
  10. 37339 | 237y | 1abcd | Spring | 2008 | B
  11. (1 row)
  12. Time: 193.495 ms

100w条数据测试 范围查询

  1. db2022=# select * from mytakes where course_id='486y';
  2. id | course_id | sec_id | semester | year | grade
  3. -------+-----------+--------+----------+------+-------
  4. 123 | 486y | 1 | Fall | 2009 | C
  5. 827 | 486y | 1 | Fall | 2009 | B
  6. 86573 | 486y | 1 | Fall | 2009 | B
  7. ...
  8. Time: 1230.652 ms
  9. db2022=# select * from mytakes where course_id='704y';
  10. id | course_id | sec_id | semester | year | grade
  11. -------+-----------+--------+----------+------+-------
  12. 55940 | 704y | 1 | Spring | 2008 | B-
  13. 17128 | 704y | 1 | Spring | 2008 | C+
  14. 60366 | 704y | 1 | Spring | 2008 | B
  15. ...
  16. Time: 850.035 ms

唯一btree索引

3w条数据测试 等值查询

  1. db2022=# create unique index takes1_pk on takes1 (id,course_id,sec_id,semester,year);
  2. CREATE INDEX
  3. Time: 446.259 ms
  4. db2022=# select * from pg_indexes where tablename='takes1';
  5. schemaname | tablename | indexname | tablespace | indexdef
  6. ------------+-----------+-----------+------------+----------------------------------------------------------------------
  7. ---------------------------------------------
  8. public | takes1 | takes1_pk | | CREATE UNIQUE INDEX takes1_pk ON takes1 USING btree (id, course_id, s
  9. ec_id, semester, year) TABLESPACE pg_default
  10. (1 row)
  11. Time: 12.272 ms
  12. db2022=# select * from takes where id='28361' and course_id='960' and sec_id='1';
  13. id | course_id | sec_id | semester | year | grade
  14. -------+-----------+--------+----------+------+-------
  15. 28361 | 960 | 1 | Fall | 2009 | C+
  16. (1 row)
  17. Time: 1.225 ms
  18. db2022=# select * from takes where id='259' and course_id='692' and sec_id='1';
  19. id | course_id | sec_id | semester | year | grade
  20. -----+-----------+--------+----------+------+-------
  21. 259 | 692 | 1 | Spring | 2010 | B-
  22. (1 row)
  23. Time: 1.095 ms

3w条数据测试 范围查询

  1. db2022=# select * from takes where course_id='476';
  2. id | course_id | sec_id | semester | year | grade
  3. -------+-----------+--------+----------+------+-------
  4. 44551 | 476 | 1 | Fall | 2010 | C-
  5. 51008 | 476 | 1 | Fall | 2010 | B+
  6. ...
  7. Time: 290.466 ms
  8. db2022=# select * from takes where course_id='949';
  9. id | course_id | sec_id | semester | year | grade
  10. -------+-----------+--------+----------+------+-------
  11. 49450 | 949 | 1 | Fall | 2007 | A-
  12. 54728 | 949 | 1 | Fall | 2007 | C
  13. ...
  14. Time: 200.423 ms

100w条数据测试 等值查询

  1. db2022=# select * from pg_indexes where tablename='mytakes';
  2. schemaname | tablename | indexname | tablespace | indexdef
  3. ------------+-----------+-----------+------------+----------
  4. (0 rows)
  5. Time: 3.418 ms
  6. db2022=# create unique index mytakes_pk on mytakes (id,course_id,sec_id,semester,year);
  7. CREATE INDEX
  8. Time: 11262.171 ms
  9. db2022=# select * from mytakes where id='37339' and course_id='237y' and sec_id='1abc';
  10. id | course_id | sec_id | semester | year | grade
  11. -------+-----------+--------+----------+------+-------
  12. 37339 | 237y | 1abc | Spring | 2008 | B
  13. (1 row)
  14. Time: 1.326 ms
  15. db2022=# select * from mytakes where id='37339' and course_id='237y' and sec_id='1abcd';
  16. id | course_id | sec_id | semester | year | grade
  17. -------+-----------+--------+----------+------+-------
  18. 37339 | 237y | 1abcd | Spring | 2008 | B
  19. (1 row)
  20. Time: 0.888 ms

100w条数据测试 范围查询

  1. db2022=# select * from mytakes where course_id='486y';
  2. id | course_id | sec_id | semester | year | grade
  3. -------+-----------+--------+----------+------+-------
  4. 123 | 486y | 1 | Fall | 2009 | C
  5. 827 | 486y | 1 | Fall | 2009 | B
  6. 86573 | 486y | 1 | Fall | 2009 | B
  7. ...
  8. Time: 1238.876 ms
  9. db2022=# select * from mytakes where course_id='704y';
  10. id | course_id | sec_id | semester | year | grade
  11. -------+-----------+--------+----------+------+-------
  12. 55940 | 704y | 1 | Spring | 2008 | B-
  13. 17128 | 704y | 1 | Spring | 2008 | C+
  14. 60366 | 704y | 1 | Spring | 2008 | B
  15. ...
  16. Time: 779.980 ms

非唯一btree索引

3w条数据测试 等值查询

  1. db2022=# drop index takes1_pk;
  2. DROP INDEX
  3. Time: 824.120 ms
  4. db2022=# create index takes1_pk on takes1 (id,course_id,sec_id,semester,year);
  5. CREATE INDEX
  6. Time: 1577.301 ms
  7. db2022=# select * from pg_indexes where tablename='takes1';
  8. schemaname | tablename | indexname | tablespace | indexdef
  9. ------------+-----------+-----------+------------+----------------------------------------------------------------------
  10. --------------------------------------
  11. public | takes1 | takes1_pk | | CREATE INDEX takes1_pk ON takes1 USING btree (id, course_id, sec_id,
  12. semester, year) TABLESPACE pg_default
  13. (1 row)
  14. Time: 4.266 ms
  15. db2022=# select * from takes where id='28361' and course_id='960' and sec_id='1';
  16. id | course_id | sec_id | semester | year | grade
  17. -------+-----------+--------+----------+------+-------
  18. 28361 | 960 | 1 | Fall | 2009 | C+
  19. (1 row)
  20. Time: 0.942 ms
  21. db2022=# select * from takes where id='259' and course_id='692' and sec_id='1';
  22. id | course_id | sec_id | semester | year | grade
  23. -----+-----------+--------+----------+------+-------
  24. 259 | 692 | 1 | Spring | 2010 | B-
  25. (1 row)
  26. Time: 0.969 ms

3w条数据测试 范围查询

  1. db2022=# select * from takes where course_id='476';
  2. id | course_id | sec_id | semester | year | grade
  3. -------+-----------+--------+----------+------+-------
  4. 44551 | 476 | 1 | Fall | 2010 | C-
  5. 51008 | 476 | 1 | Fall | 2010 | B+
  6. ...
  7. Time: 321.893 ms
  8. db2022=# select * from takes where course_id='949';
  9. id | course_id | sec_id | semester | year | grade
  10. -------+-----------+--------+----------+------+-------
  11. 49450 | 949 | 1 | Fall | 2007 | A-
  12. 54728 | 949 | 1 | Fall | 2007 | C
  13. ...
  14. Time: 351.266 ms

100w条数据测试 等值查询

  1. db2022=# drop index mytakes_pk;
  2. DROP INDEX
  3. Time: 526.671 ms
  4. db2022=# select * from pg_indexes where tablename='mytakes';
  5. schemaname | tablename | indexname | tablespace | indexdef
  6. ------------+-----------+-----------+------------+----------
  7. (0 rows)
  8. Time: 2.022 ms
  9. db2022=# create index mytakes_pk on mytakes (id,course_id,sec_id,semester,year);
  10. CREATE INDEX
  11. Time: 15270.766 ms
  12. db2022=# select * from mytakes where id='37339' and course_id='237y' and sec_id='1abc';
  13. id | course_id | sec_id | semester | year | grade
  14. -------+-----------+--------+----------+------+-------
  15. 37339 | 237y | 1abc | Spring | 2008 | B
  16. (1 row)
  17. Time: 1.112 ms
  18. db2022=# select * frommytakes where id='37339' and course_id='237y' and sec_id='1abcd';
  19. id | course_id | sec_id | semester | year | grade
  20. -------+-----------+--------+----------+------+-------
  21. 37339 | 237y | 1abcd | Spring | 2008 | B
  22. (1 row)
  23. Time: 0.926 ms

100w条数据测试 范围查询

  1. db2022=# select * from mytakes where course_id='486y';
  2. id | course_id | sec_id | semester | year | grade
  3. -------+-----------+--------+----------+------+-------
  4. 123 | 486y | 1 | Fall | 2009 | C
  5. 827 | 486y | 1 | Fall | 2009 | B
  6. 86573 | 486y | 1 | Fall | 2009 | B
  7. ...
  8. Time: 999.292 ms
  9. db2022=# select * from mytakes where course_id='704y';
  10. id | course_id | sec_id | semester | year | grade
  11. -------+-----------+--------+----------+------+-------
  12. 55940 | 704y | 1 | Spring | 2008 | B-
  13. 17128 | 704y | 1 | Spring | 2008 | C+
  14. 60366 | 704y | 1 | Spring | 2008 | B
  15. ...
  16. Time: 774.836 ms

总结

范围查询的用时与结果有关,取两次平均值。
因为结果比较简单,就不做图了。

3w数据集

查询方式 无索引 唯一索引 (无索引比例) 非唯一索引 (无索引比例)
等值查询 66 1.16 (1.76%) 0.95 (1.44%)
范围查询 981 245 (25.0%) 336 (34.25%)

100w数据集

查询方式 无索引 唯一索引 (无索引比例) 非唯一索引 (无索引比例)
等值查询 192 1.11 (0.58%) 1.02 (0.53%)
范围查询 1040 1010 (97.1%) 887 (85.3%)

结论
索引可大幅提高查询效率,但影响更新效率,且当结果集很大时,与无索引查询的差距会变小。
唯一索引和非唯一索引没有明显差别。

读取性能:
唯一索引:查到目标值后,即可停止查询,返回结果。
普通索引:查到目标值后,继续查找直至与目标值不相等时,停止查询,返回结果。
对比:每次查询,都将磁盘中的一个数据页读取到内存中,故即使唯一索引只查询一个值,也要读取整个数据页,所以两者读取性能相差不大。

更新性能:
唯一索引:将数据页读入内存(因为要判断有没有冲突,故不能直接用 change buffer),更新数据。
普通索引:将更新的数据缓存到 change buffer 中,更新结束,后续执行vmerge。
对比:除了更新完后要立刻查询的操作,普通索引因为不需要读入内存,性能更优.
故一般更推荐用普通索引。

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注