Project

General

Profile

Development #88960

ResourceLog: utiliser un index BRIN sur timestamp

Added by Benjamin Dauvergne 16 days ago. Updated 9 days ago.

Status:
Solution proposée
Priority:
Normal
Target version:
-
Start date:
02 April 2024
Due date:
% Done:

0%

Estimated time:
Patch proposed:
No
Planning:
No

Description

En remplacement de l'index B-Tree sur appname, -timestamp.


Related issues

Related to Passerelle - Development #88761: Partitionner la table ResourceLogNouveau27 March 2024

Actions

History

#1

Updated by Robot Gitea 16 days ago

  • Status changed from Nouveau to En cours

Benjamin Dauvergne (bdauvergne) a ouvert une pull request sur Gitea concernant cette demande :

#2

Updated by Robot Gitea 16 days ago

  • Status changed from En cours to Solution proposée
#3

Updated by Benjamin Dauvergne 16 days ago

#4

Updated by Benjamin Dauvergne 16 days ago

Pour la pagination c'est kif kif avec l'index existant :

passerelle=# begin;
BEGIN
passerelle=*# update pg_index set indisvalid = false where indexrelid = 'base_resour_appname_298cbc_idx'::regclass
;

passerelle=*# explain analyze select id from base_resourcelog where timestamp between '2024-04-01' and '2024-04-02' and appname = 'toulouse-maelis' limit 30 offset 4000;
                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7913.75..7972.54 rows=30 width=4) (actual time=32.321..32.444 rows=30 loops=1)
   ->  Bitmap Heap Scan on base_resourcelog  (cost=76.12..304372.39 rows=155300 width=4) (actual time=20.184..27.494 rows=4030 loops=1)
         Recheck Cond: (("timestamp" >= '2024-04-01 00:00:00+02'::timestamp with time zone) AND ("timestamp" <= '2024-04-02 00:00:00+02'::timestamp with time zone))
         Rows Removed by Index Recheck: 761
         Filter: ((appname)::text = 'toulouse-maelis'::text)
         Rows Removed by Filter: 132
         Heap Blocks: lossy=772
         ->  Bitmap Index Scan on log_timestanp_idx  (cost=0.00..37.29 rows=399044 width=0) (actual time=20.076..20.078 rows=2769280 loops=1)
               Index Cond: (("timestamp" >= '2024-04-01 00:00:00+02'::timestamp with time zone) AND ("timestamp" <= '2024-04-02 00:00:00+02'::timestamp with time zone))
 Planning Time: 0.116 ms
 Execution Time: 32.508 ms
(11 lignes)

passerelle=*# rollback;
ROLLBACK
passerelle=# explain analyze select id from base_resourcelog where timestamp between '2024-04-01' and '2024-04-02' and appname = 'toulouse-maelis' limit 30 offset 4000;
                                                                                                    QUERY PLAN                                                                                                     
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2172.74..2189.04 rows=30 width=4) (actual time=15.111..15.249 rows=30 loops=1)
   ->  Index Scan using base_resour_appname_298cbc_idx on base_resourcelog  (cost=0.56..84335.80 rows=155300 width=4) (actual time=0.044..9.547 rows=4030 loops=1)
         Index Cond: (((appname)::text = 'toulouse-maelis'::text) AND ("timestamp" >= '2024-04-01 00:00:00+02'::timestamp with time zone) AND ("timestamp" <= '2024-04-02 00:00:00+02'::timestamp with time zone))
 Planning Time: 0.201 ms
 Execution Time: 15.316 ms
(5 lignes)

Par contre l'index fait 180Ko au lieu de 800Mo :

passerelle=# SELECT pg_size_pretty(pg_relation_size('log_timestanp_idx'));
 pg_size_pretty 
----------------
 168 kB
(1 ligne)

passerelle=# SELECT pg_size_pretty(pg_relation_size('base_resour_appname_298cbc_idx'));
 pg_size_pretty 
----------------
 759 MB
(1 ligne)

L'index sur transaction id fait 200Mo et la table resource_log elle même fait 4,5Go (sur parsifal, peut-être un exemple extrême, sur toodego la table fait 1Go et les index 100Mo et 50 Mo, mais c'est aussi de l'espace gagné dans le WAL sur les mises à jour d'index je pense) :

passerelle=# SELECT pg_size_pretty(pg_relation_size('base_resourcelog'));
 pg_size_pretty 
----------------
 4556 MB
(1 ligne)

Le taux de corrélation n'est que de 0.25 pour les colonnes id et timestamp :

passerelle=*# SELECT correlation FROM pg_stats
WHERE tablename = 'base_resourcelog' AND schemaname = 'passerelle_famille_loisirs_eservices_toulouse_metropole_fr' AND attname = 'timestamp';
 correlation 
-------------
  0.24036705
(1 ligne)

passerelle=*# SELECT correlation FROM pg_stats
WHERE tablename = 'base_resourcelog' AND schemaname = 'passerelle_famille_loisirs_eservices_toulouse_metropole_fr' AND attname = 'id';
 correlation 
-------------
  0.24036705
(1 ligne)

Je ne sais pas à quel point c'est mauvais ou bon, j'espère que le fait de faire un vacuum analyze et de supprimer en un seul DELETE améliorera ça.

#5

Updated by Robot Gitea 9 days ago

Benjamin Dauvergne (bdauvergne) a demandé une relecture de Pierre Ducroquet (pducroquet) sur une pull request sur Gitea concernant cette demande :

Also available in: Atom PDF