Actions
Développement #113422
openRéduire la consommation d'espace en revoyant le schéma pour les données payer_, user_ et event
Status:
Nouveau
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
09 January 2026
Due date:
% Done:
0%
Estimated time:
Patch proposed:
No
Planning:
No
Description
Il y a beaucoup de redondance dans les modèles suivants:
- invoicing.DraftJournalLine
- invoicing.JournalLine
- invoicing.DraftInvoiceLine
- invoicing.InvoiceLine
sur les colonnes dont le préfixe est payer_ ou user_ et la colonne @event@abdallah.bouhal
J'ai mesuré l'impact et il est notable:
bdauvergne@front1.prod:~ (PROD) $ sudo -u lingo lingo-manage tenant_command runscript -d paiement.mes-demarches.nimes.fr ./script-lingo.py
Model invoicing.DraftJournalLine Size 35 GB Count 11623703
user_ ['user_external_id', 'user_first_name', 'user_last_name'] ...
> SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))), COUNT(*) FROM (SELECT * FROM (SELECT user_external_id, user_first_name, user_last_name FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (user_external_id, user_first_name, user_last_name)) AS V
Size 614 734 082
Unique size 732 140 for 13 798 rows (0.1 % of original size)
payer_ ['payer_external_id', 'payer_first_name', 'payer_last_name', 'payer_address', 'payer_email', 'payer_phone', 'payer_direct_debit'] ...
> SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))), COUNT(*) FROM (SELECT * FROM (SELECT payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit)) AS V
Size 1 235 711 024
Unique size 1 806 171 for 16 181 rows (0.1 % of original size)
JSON field event ['event'] ...
> SELECT SUM(pg_column_size((event))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((event))), COUNT(*) FROM (SELECT * FROM (SELECT event FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (event)) AS V
Size 6 438 827 198
Unique size 64 488 916 for 116 537 rows (1.0 % of original size)
JSON field booking ['booking'] ...
> SELECT SUM(pg_column_size((booking))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((booking))), COUNT(*) FROM (SELECT * FROM (SELECT booking FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (booking)) AS V
Size 1 729 290 049
Unique size 1 277 124 829 for 2 123 664 rows (73.9 % of original size)
JSON field pricing_data ['pricing_data'] ...
> SELECT SUM(pg_column_size((pricing_data))) FROM (SELECT * FROM invoicing_draftjournalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((pricing_data))), COUNT(*) FROM (SELECT * FROM (SELECT pricing_data FROM invoicing_draftjournalline LIMIT 1000000000) AS T GROUP BY (pricing_data)) AS V
Size 3 159 958 213
Unique size 795 791 546 for 1 172 672 rows (25.2 % of original size)
Model invoicing.JournalLine Size 12 GB Count 9414155
user_ ['user_external_id', 'user_first_name', 'user_last_name'] ...
> SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))), COUNT(*) FROM (SELECT * FROM (SELECT user_external_id, user_first_name, user_last_name FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (user_external_id, user_first_name, user_last_name)) AS V
Size 497 776 455
Unique size 730 266 for 13 762 rows (0.1 % of original size)
payer_ ['payer_external_id', 'payer_first_name', 'payer_last_name', 'payer_address', 'payer_email', 'payer_phone', 'payer_direct_debit'] ...
> SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit))), COUNT(*) FROM (SELECT * FROM (SELECT payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (payer_external_id, payer_first_name, payer_last_name, payer_address, payer_email, payer_phone, payer_direct_debit)) AS V
Size 954 666 774
Unique size 1 801 850 for 16 142 rows (0.2 % of original size)
JSON field event ['event'] ...
> SELECT SUM(pg_column_size((event))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((event))), COUNT(*) FROM (SELECT * FROM (SELECT event FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (event)) AS V
Size 5 179 627 964
Unique size 57 334 712 for 103 783 rows (1.1 % of original size)
JSON field booking ['booking'] ...
> SELECT SUM(pg_column_size((booking))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((booking))), COUNT(*) FROM (SELECT * FROM (SELECT booking FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (booking)) AS V
Size 1 347 475 684
Unique size 1 227 391 462 for 2 047 346 rows (91.1 % of original size)
JSON field pricing_data ['pricing_data'] ...
> SELECT SUM(pg_column_size((pricing_data))) FROM (SELECT * FROM invoicing_journalline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((pricing_data))), COUNT(*) FROM (SELECT * FROM (SELECT pricing_data FROM invoicing_journalline LIMIT 1000000000) AS T GROUP BY (pricing_data)) AS V
Size 2 574 575 475
Unique size 763 618 651 for 1 119 945 rows (29.7 % of original size)
Model invoicing.DraftInvoiceLine Size 875 MB Count 623313
JSON field details ['details'] ...
> SELECT SUM(pg_column_size((details))) FROM (SELECT * FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((details))), COUNT(*) FROM (SELECT * FROM (SELECT details FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T GROUP BY (details)) AS V
Size 154 825 509
Unique size 72 836 025 for 213 551 rows (47.0 % of original size)
user_ ['user_external_id', 'user_first_name', 'user_last_name'] ...
> SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))) FROM (SELECT * FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T
> SELECT SUM(pg_column_size((user_external_id, user_first_name, user_last_name))), COUNT(*) FROM (SELECT * FROM (SELECT user_external_id, user_first_name, user_last_name FROM invoicing_draftinvoiceline LIMIT 1000000000) AS T GROUP BY (user_external_id, user_first_name, user_last_name)) AS V
Size 32 975 549
Unique size 949 724 for 17 767 rows (2.9 % of original size)
On peut récupérer une dizaines de Go sur une base de 54Go dont 31Go pour DraftJournalLine et 12Go pour JournalLine.
La colonne pricing_data donne aussi un gain mais il faudrait la découper en ses différents éléments, ça impliquerait plusieurs modèles plutôt qu'un seul, certains étant plus redondant que d'autres, notamment booking_details, calculation_details.criterias et calculation_details.reduction_rate:
lingo=# select count(distinct pricing_data->>'booking_details'), count(distinct pricing_data->'calculation_details'->>'criterias'), count(distinct pricing_data->'calculation_details'->>'reduction_rate') from (select * from invoicing_journalline limit 10000000) as t;
count | count | count
-------+-------+-------
45 | 28 | 14115
(1 ligne)
Files
Updated by Benjamin Dauvergne 4 days ago
- File script-lingo.py script-lingo.py added
Actions