Thursday, July 6, 2017

postgresql corruption: ERROR: tuple concurrently update (duplicate key in primary key column)


I am not exactly sure how to classify this type of issue. It seems like a pg_toast out of sync issue or typle concurrently update error. Either way, the table have an exact records where PK has been defined.

The workaround is to delete one of the duplicate record from the table.

The only way to delete the record is with using CTID instead of any ID defined in the column with PK as the record itself has 2 identical PK key, so it would not worked. It will wiped out both records and one is needed.


Typical errors
delete from vpx_vm where id=2398;
ERROR:  tuple concurrently updated


There are times, select on the table may be fine but the example id could return 2 of the identical records. The workaround is to delete one of them so it would not have duplicates. The trick is which on to delete ?

select ctid, id from vpx_vm where id = 1433; -- this may return 2 records with 2 different CTIDs. One of the 2 records will result of unexpected chunk number.







Example: 
select * from vpx_vm where ctid = (8,2);   fine
select * from vpx_vm where ctid = (69,2); - error out with pg_toast missing chunk 

delete the bad ctid
delete from vpx_vm where ctid = (69,2),    track that vm from vpx_host too.


Select each of the rows with their corresponding CTID and one of them will fail. Delete the one that is failed and leave the one return output alone.
select  * from vpx_vm where ctid = xxxx


Technique that have been used to address this issue and I found out that it may not be that efficient as there is a chance that the good rescord to be deleted and the corrupt record left in the table.

delete from vpx_vm where ctid=(select ctid from vpx_vm where id=2398 limit 1);  

The above issue is fairly common, typically found right after disk filled up and database crashed.

Saturday, January 7, 2017

postgresql: corruption: ERROR: unexpected chunk number 0 (expected 1) for toast value XXXXXXXXX in pg_toast_XXXX


This issue is very common but luckily, they can be resolved majority of the time (it was once it did not worked for me). The corruption is a bit difficult to description and I am using a real scenario here.

The vcenter crashed and in the logs, the error reported was.


2017-01-05T15:51:06.640Z [7FC61081D700 error 'Default' opID=SWI-3a1cc46] [VdbStatement] SQLError was thrown: "ODBC error: (XX000) - ERROR: unexpected chunk number 0 (ex
pected 1) for toast value 25270341 in pg_toast_16875;
--> Error while executing the query" is returned when executing SQL statement "SELECT IS_CONSOLIDATE_NEEDED, DAS_PROTECTED_ACTUAL, DAS_PROTECTED_DESIRED, MANAGED_BY_TYP
E, MANAGED_BY_EXT_KEY, STORAGE_SPACE_UPDATED_TIME, AGGR_UNSHARED_STORAGE_SPACE, AGGR_UNCOMMITED_STORAGE_SPACE, AGGR_COMMITED_STORAGE_SPACE, RECORD_REPLAY_STATE, FAULT_T
OLERANCE_STATE, PENDING_ANNOT_SET_FLG, PENDING_NAME, ONLINE_STANDBY, MKS_CONNECTIONS, SCREEN_WIDTH, SCREEN_HEIGHT, TOOLS_MOUNTED, MEMORY_OVERHEAD, SUSPEND_INTERVAL, BOO
T_TIME, SUSPEND_TIME, UUID_INSTANCE, UUID_BIOS, NUM_DISK, NUM_NIC, NUM_VCPU, MEMORY_RES"


When trying to delete the offending entry from the vpx_vm using the id, it throws a different errors.

ERROR: tuple concurrently updated

Backing up the table will throw the same problem happens.

ERROR: unexpected chunk number 0 (expected 1) for toast value 25270341 in pg_toast_16875;


Steps to resolve the issue for VCSA.


1: Back up the database first.

2: Stop vpxd & vdcs service:
     service vmware-vpxd stop
     service vmware-vdcs stop

3: Connect to VCDB with:
   /opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB

4: Execute below query:
   select pg_toast_16875::regclass;

   Above query should give you output something like:
   regclass
   --------
   VPX_VM

5: Now re-index the table with below queries:
   REINDEX table pg_toast.pg_toast_16875;
   REINDEX table VPX_VM; (Table name should be whatever is returned from step#4)

6: Do vacuum and analyze for vpx_vm table with following query:
    VACUUM ANALYZE VPX_VM; (Table name should be whatever is returned from step#4)

7: \q

8: Start vpxd & vdcs service:
   service vmware-vpxd start
   service vmware-vdcs start

NOTE: The steps can lead to some data loss from table.

The key is to locate the pg_toast then reindex the pg_toast and its corresponding table.