Monday, January 26, 2015

Postgresql: Corruption: WAL files filled up

As last resort, environment was down and there is no transaction logs backup available for restoration. Resetting the WAL and this will result of losing data.

In this case, the transaction log of 0000000100000039000000A8 was missing.


From postgresql.log


Beginning archiving of 0000000100000039000000A8
0000000100000039000000A8 not found yet in archives, continuing
Last backup time found: 2014-10-24 00:00:01
Last backup path: /var/vmware/vpostgres/9.3-archive/backup/20141024/000001/arclog
grep: write error
ls: write error: Broken pipe
Beginning archiving of 0000000100000039000000A8
0000000100000039000000A8 not found yet in archives, continuing
Last backup time found: 2014-10-24 00:00:01
Last backup path: /var/vmware/vpostgres/9.3-archive/backup/20141024/000001/arclog
grep: write error
ls: write error: Broken pipe
Beginning archiving of 0000000100000039000000A8
0000000100000039000000A8 not found yet in archives, continuing
Last backup time found: 2014-10-24 00:00:01
Last backup path: /var/vmware/vpostgres/9.3-archive/backup/20141024/000001/arclog
grep: write error
From vpxd.log
2015-01-21 14:48:01.967 UTC,,,42343,,54bfbc21.a567,2,,2015-01-21 14:48:01 UTC,,0,LOG,00000,"invalid primary checkpoint record",,,,,,,,,""
2015-01-21 14:48:01.977 UTC,,,42343,,54bfbc21.a567,3,,2015-01-21 14:48:01 UTC,,0,LOG,00000,"invalid secondary checkpoint record",,,,,,,,,""
2015-01-21 14:48:01.977 UTC,,,42343,,54bfbc21.a567,4,,2015-01-21 14:48:01 UTC,,0,PANIC,XX000,"could not locate a valid checkpoint record",,,,,,,,,""


Stop all application services the reset the WAL.
sudo -u postgres /opt/vmware/vpostgres/current/bin/pg_resetxlog /var/vmware/vpostgres/current/pgdata/



There will be some data losses but the situation is better than database not coming back up. This typically happened after disk filed up.

Friday, January 23, 2015

Postgres: vacuum specific tables

VACUUM FULL can be intense running on production environment. However, majority of the problem I have heard of are database keeps growing. So, I would one address the space issue without jeopardizing the production environment too much ? Int his case, I know there are a few times that constantly get flush and fill. I know that the following 2 tables always problematic even in Oracle or SQL Server.

VCDB-# \d vpx_event; \d vpx_event_arg
                     Table "vpx.vpx_event"
        Column        |            Type             | Modifiers
----------------------+-----------------------------+-----------
 event_id             | integer                     | not null
 chain_id             | integer                     |
 event_type           | character varying(255)      |
 extended_class       | numeric(1,0)                |
 create_time          | timestamp without time zone |
 username             | character varying(255)      |
 category             | character varying(255)      |
 vm_id                | integer                     |
 vm_name              | character varying(255)      |
 host_id              | integer                     |
 host_name            | character varying(255)      |
 computeresource_id   | integer                     |
 computeresource_type | integer                     |
 computeresource_name | character varying(255)      |
 datacenter_id        | integer                     |
 datacenter_name      | character varying(255)      |
 datastore_id         | integer                     |
 datastore_name       | character varying(255)      |
 network_id           | integer                     |
 network_name         | character varying(255)      |
 network_type         | character(1)                |
 dvs_id               | integer                     |
 dvs_name             | character varying(255)      |
 storagepod_id        | integer                     |
 storagepod_name      | character varying(255)      |
 change_tag_id        | integer                     |
Indexes:
    "pk_vpx_event" PRIMARY KEY, btree (event_id)
    "vpx_event_f1" btree (computeresource_type)
    "vpxi_change_tag" btree (change_tag_id)
    "vpxi_event_category" btree (category)
    "vpxi_event_chain_id" btree (chain_id)
    "vpxi_event_computeresource_id" btree (computeresource_id)
    "vpxi_event_create_time" btree (create_time)
    "vpxi_event_datacenter_id" btree (datacenter_id)
    "vpxi_event_datastore_id" btree (datastore_id)
    "vpxi_event_dvs_id" btree (dvs_id)
    "vpxi_event_event_type" btree (event_type)
    "vpxi_event_extended_class" btree (extended_class)
    "vpxi_event_host_id" btree (host_id)
    "vpxi_event_network_id" btree (network_id)
    "vpxi_event_username" btree (username)
    "vpxi_event_vm_id" btree (vm_id)
Foreign-key constraints:
    "fk_vpx_change_tag" FOREIGN KEY (change_tag_id) REFERENCES vpx_change_tag(change_tag_id) ON DELETE SET NULL
    "fk_vpx_event_ref_computeres" FOREIGN KEY (computeresource_type) REFERENCES vpx_object_type(id)
Referenced by:
    TABLE "vpx_event_arg" CONSTRAINT "fk_vpx_event_arg_ref_event" FOREIGN KEY (event_id) REFERENCES vpx_event(event_id) ON DELETE CASCADE
    TABLE "vpx_entity_last_event" CONSTRAINT "fk_vpx_last_event_event" FOREIGN KEY (last_event_id) REFERENCES vpx_event(event_id) ON DELETE CASC
ADE


               Table "vpx.vpx_event_arg"
       Column       |          Type          | Modifiers
--------------------+------------------------+-----------
 event_id           | integer                | not null
 arg_id             | integer                | not null
 arg_type           | character varying(255) |
 arg_data           | text                   |
 obj_type           | integer                |
 obj_name           | character varying(255) |
 vm_id              | integer                |
 host_id            | integer                |
 computeresource_id | integer                |
 datastore_id       | integer                |
 network_id         | integer                |
 network_type       | character(1)           |
 datacenter_id      | integer                |
 resourcepool_id    | integer                |
 dvs_id             | integer                |
 folder_id          | integer                |
 alarm_id           | integer                |
 scheduledtask_id   | integer                |
Indexes:
    "pk_vpx_event_arg" PRIMARY KEY, btree (event_id, arg_id)
    "vpx_event_arg_f1" btree (obj_type)
    "vpx_event_arg_f2" btree (alarm_id)
    "vpx_event_arg_f3" btree (scheduledtask_id)
    "vpxi_event_arg_event_id" btree (event_id)
Foreign-key constraints:
    "fk_vpx_event_arg_ref_entity" FOREIGN KEY (obj_type) REFERENCES vpx_object_type(id)
    "fk_vpx_event_arg_ref_event" FOREIGN KEY (event_id) REFERENCES vpx_event(event_id) ON DELETE CASCADE


VCDB=# vacuum FULL ANALYZE VERBOSE vpx_event;
INFO:  vacuuming "vpx.vpx_event"
INFO:  "vpx_event": found 0 removable, 284990 nonremovable row versions in 6003 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.21s/0.70u sec elapsed 1.83 sec.
INFO:  analyzing "vpx.vpx_event"
INFO:  "vpx_event": scanned 6003 of 6003 pages, containing 284990 live rows and 0 dead rows; 30000 rows in sample, 284990 estimated total rows
VACUUM
VCDB=# vacuum FULL ANALYZE VERBOSE vpx_event_arg;
INFO:  vacuuming "vpx.vpx_event_arg"
INFO:  "vpx_event_arg": found 0 removable, 1289100 nonremovable row versions in 18360 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.44s/2.14u sec elapsed 5.92 sec.
INFO:  analyzing "vpx.vpx_event_arg"
INFO:  "vpx_event_arg": scanned 18360 of 18360 pages, containing 1289100 live rows and 0 dead rows; 30000 rows in sample, 1289100 estimated total rows
VACUUM
VCDB=#

By vacuum-ing only 2 of these tables, I would have to touch every single tables in the database.