Project

General

Profile

Actions

Développement #113422

open

Réduire la consommation d'espace en revoyant le schéma pour les données payer_, user_ et event

Added by Benjamin Dauvergne 4 days ago. Updated 4 days ago.

Status:
Nouveau
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
09 January 2026
Due date:
% Done:

0%

Estimated time:
Patch proposed:
No
Planning:
No

Description

Il y a beaucoup de redondance dans les modèles suivants:
  • invoicing.DraftJournalLine
  • invoicing.JournalLine
  • invoicing.DraftInvoiceLine
  • invoicing.InvoiceLine

sur les colonnes dont le préfixe est payer_ ou user_ et la colonne @event@abdallah.bouhal

J'ai mesuré l'impact et il est notable:

bdauvergne@front1.prod:~ (PROD) $ sudo -u lingo lingo-manage tenant_command runscript -d paiement.mes-demarches.nimes.fr ./script-lingo.py 
Model invoicing.DraftJournalLine Size 35 GB Count 11623703
  user_ ['user_external_id', 'user_first_name', 'user_last_name'] ... 
    > SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))), COUNT(*) FROM (SELECT * FROM (SELECT user_external_id, user_first_name, user_last_name FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (user_external_id, user_first_name, user_last_name)) AS V
   Size 614 734 082
   Unique size 732 140 for 13 798 rows (0.1 % of original size)
  payer_ ['payer_external_id', 'payer_first_name', 'payer_last_name', 'payer_address', 'payer_email', 'payer_phone', 'payer_direct_debit'] ... 
    > SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))), COUNT(*) FROM (SELECT * FROM (SELECT payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit)) AS V
   Size 1 235 711 024
   Unique size 1 806 171 for 16 181 rows (0.1 % of original size)
  JSON field event ['event'] ... 
    > SELECT SUM(pg_column_size((event))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((event))), COUNT(*) FROM (SELECT * FROM (SELECT event FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (event)) AS V
   Size 6 438 827 198
   Unique size 64 488 916 for 116 537 rows (1.0 % of original size)
  JSON field booking ['booking'] ... 
    > SELECT SUM(pg_column_size((booking))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((booking))), COUNT(*) FROM (SELECT * FROM (SELECT booking FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (booking)) AS V
   Size 1 729 290 049
   Unique size 1 277 124 829 for 2 123 664 rows (73.9 % of original size)
  JSON field pricing_data ['pricing_data'] ... 
    > SELECT SUM(pg_column_size((pricing_data))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((pricing_data))), COUNT(*) FROM (SELECT * FROM (SELECT pricing_data FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (pricing_data)) AS V
   Size 3 159 958 213
   Unique size 795 791 546 for 1 172 672 rows (25.2 % of original size)
Model invoicing.JournalLine Size 12 GB Count 9414155
  user_ ['user_external_id', 'user_first_name', 'user_last_name'] ... 
    > SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))), COUNT(*) FROM (SELECT * FROM (SELECT user_external_id, user_first_name, user_last_name FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (user_external_id, user_first_name, user_last_name)) AS V
   Size 497 776 455
   Unique size 730 266 for 13 762 rows (0.1 % of original size)
  payer_ ['payer_external_id', 'payer_first_name', 'payer_last_name', 'payer_address', 'payer_email', 'payer_phone', 'payer_direct_debit'] ... 
    > SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))), COUNT(*) FROM (SELECT * FROM (SELECT payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit)) AS V
   Size 954 666 774
   Unique size 1 801 850 for 16 142 rows (0.2 % of original size)
  JSON field event ['event'] ... 
    > SELECT SUM(pg_column_size((event))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((event))), COUNT(*) FROM (SELECT * FROM (SELECT event FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (event)) AS V
   Size 5 179 627 964
   Unique size 57 334 712 for 103 783 rows (1.1 % of original size)
  JSON field booking ['booking'] ... 
    > SELECT SUM(pg_column_size((booking))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((booking))), COUNT(*) FROM (SELECT * FROM (SELECT booking FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (booking)) AS V
   Size 1 347 475 684
   Unique size 1 227 391 462 for 2 047 346 rows (91.1 % of original size)
  JSON field pricing_data ['pricing_data'] ... 
    > SELECT SUM(pg_column_size((pricing_data))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((pricing_data))), COUNT(*) FROM (SELECT * FROM (SELECT pricing_data FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (pricing_data)) AS V
   Size 2 574 575 475
   Unique size 763 618 651 for 1 119 945 rows (29.7 % of original size)
Model invoicing.DraftInvoiceLine Size 875 MB Count 623313
  JSON field details ['details'] ... 
    > SELECT SUM(pg_column_size((details))) FROM (SELECT * FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((details))), COUNT(*) FROM (SELECT * FROM (SELECT details FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T GROUP BY (details)) AS V
   Size 154 825 509
   Unique size 72 836 025 for 213 551 rows (47.0 % of original size)
  user_ ['user_external_id', 'user_first_name', 'user_last_name'] ... 
    > SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))) FROM (SELECT * FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T
    > SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))), COUNT(*) FROM (SELECT * FROM (SELECT user_external_id, user_first_name, user_last_name FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T GROUP BY (user_external_id, user_first_name, user_last_name)) AS V
   Size 32 975 549
   Unique size 949 724 for 17 767 rows (2.9 % of original size)

On peut récupérer une dizaines de Go sur une base de 54Go dont 31Go pour DraftJournalLine et 12Go pour JournalLine.

La colonne pricing_data donne aussi un gain mais il faudrait la découper en ses différents éléments, ça impliquerait plusieurs modèles plutôt qu'un seul, certains étant plus redondant que d'autres, notamment booking_details, calculation_details.criterias et calculation_details.reduction_rate:

lingo=# select count(distinct pricing_data->>'booking_details'), count(distinct pricing_data->'calculation_details'->>'criterias'), count(distinct pricing_data->'calculation_details'->>'reduction_rate') from (select * from invoicing_journalline limit 10000000) as t;
 count | count | count 
-------+-------+-------
    45 |    28 | 14115
(1 ligne)


Files

script-lingo.py (2.41 KB) script-lingo.py Benjamin Dauvergne, 09 January 2026 09:36 AM
Actions

Also available in: Atom PDF