When we delete or update a record in postgreSql, it hides the old data. These datas are called dead rows. PostgreSQL doesn’t physically remove the old row from the table but puts a “marker” on it so that queries don’t return that row. A vacuum is used for recovering space occupied by “dead tuples” in a table.
Let’s examine the derivatives of the “Vacuum” operation, which is a very important maintenance item that takes care of the planner’s access to the data in the most efficient way, especially marking the dead rows formed in this way or bringing them to disk, and what they do.
VACUUM
Vacuum is a process in PostgreSQL that reclaims storage occupied by dead rows, which are rows that have been deleted or updated. When a row is deleted or updated, PostgreSQL marks the old version of the row as dead but does not immediately remove it from the disk. Over time, as more operations occur on the table, dead rows can accumulate, leading to inefficient disk space utilization. Running Vacuum manually or allowing Autovacuum to handle it automatically ensures that dead rows are efficiently removed, and the space they occupied is made available for new data.
VACUUM FULL
The space occupied by dead rows badly affects our application’s performance. If you have not vacuum full into your tables for a long time and you have tables that are updated very often, your DV size will decrease after this operation because it saves dead rows to disk.
However, during this process, the table with vacuum full applied is exclusive lock and because a new copy of the table is created without these dirty areas, it takes a long time according to the size of the table, so you cannot get answers to your queries about the table.
From official documentation:
VACUUM FULL
rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires anACCESS EXCLUSIVE
lock on each table while it is being processed.
VACUUM ANALYZE
Vacuum Analyze combines the functionality of both Vacuum and Analyze into a single process. Analyze is used to update the statistics of the table, which are essential for the query planner to generate efficient query plans. By running Vacuum Analyze together, you not only reclaim disk space occupied by dead rows but also update the table statistics, leading to better query performance.
AUTOVACUUM
Autovacuum is a built-in feature in PostgreSQL that automatically manages the Vacuum and Vacuum Analyze processes. It continuously monitors the database’s activity and initiates Vacuum operations as needed to prevent dead row bloat and maintain data consistency. Autovacuum is enabled by default, and its configuration parameters can be adjusted to fine-tune its behavior based on the workload and requirements of the database.
Best Practices for Vacuum, Vacuum Analyze, and Autovacuum:
- Enable Autovacuum: It is recommended to keep Autovacuum enabled, as it automatically manages Vacuum operations, ensuring your database remains healthy and performs optimally.
- Monitor Autovacuum Activity: Regularly monitor Autovacuum’s activity using PostgreSQL logs and performance monitoring tools to ensure it is functioning as expected and not causing any performance issues.
- Set Appropriate Autovacuum Parameters: Adjust Autovacuum configuration parameters, such as
autovacuum_vacuum_scale_factor
andautovacuum_analyze_scale_factor
, to suit the specific needs of your database and workload. - Vacuum Regularly: For databases with high write activity, consider scheduling regular Vacuum operations during periods of low database activity to prevent the accumulation of dead rows.
- Analyze Tables: In addition to Vacuum, ensure regular Analyze operations to keep table statistics up-to-date, which aids the query planner in generating optimal query plans.
- Consider Manual Vacuum for Large Tables: For large tables or tables with a high volume of updates and deletes, consider running Vacuum manually during maintenance windows to have more control over the process.
- Avoid Overlapping Autovacuum Processes: Adjust the
autovacuum_vacuum_cost_limit
parameter to avoid multiple Autovacuum processes overlapping, which can lead to performance degradation.