Thursday, July 6, 2017

postgresql corruption: ERROR: tuple concurrently update (duplicate key in primary key column)


I am not exactly sure how to classify this type of issue. It seems like a pg_toast out of sync issue or typle concurrently update error. Either way, the table have an exact records where PK has been defined.

The workaround is to delete one of the duplicate record from the table.

The only way to delete the record is with using CTID instead of any ID defined in the column with PK as the record itself has 2 identical PK key, so it would not worked. It will wiped out both records and one is needed.


Typical errors
delete from vpx_vm where id=2398;
ERROR:  tuple concurrently updated


There are times, select on the table may be fine but the example id could return 2 of the identical records. The workaround is to delete one of them so it would not have duplicates. The trick is which on to delete ?

select ctid, id from vpx_vm where id = 1433; -- this may return 2 records with 2 different CTIDs. One of the 2 records will result of unexpected chunk number.







Example: 
select * from vpx_vm where ctid = (8,2);   fine
select * from vpx_vm where ctid = (69,2); - error out with pg_toast missing chunk 

delete the bad ctid
delete from vpx_vm where ctid = (69,2),    track that vm from vpx_host too.


Select each of the rows with their corresponding CTID and one of them will fail. Delete the one that is failed and leave the one return output alone.
select  * from vpx_vm where ctid = xxxx


Technique that have been used to address this issue and I found out that it may not be that efficient as there is a chance that the good rescord to be deleted and the corrupt record left in the table.

delete from vpx_vm where ctid=(select ctid from vpx_vm where id=2398 limit 1);  

The above issue is fairly common, typically found right after disk filled up and database crashed.