Tuesday, August 11, 2015

Postgresql: could not fork autovacuum worker process: Cannot allocate memory


Problem statement

Postgresql crashes randomly. Increasing more memory do not helped much but only delaying the crashing.  There is no particularl activities that causing or caused to crash. It can be any DMLs or vacuum. Out of Memory was pretty consistent but do not always crashed the instance.


Errors from the logs


.................
      3 tm:2015-08-06 03:52:33.326 UTC db:VCDB pid:19666 STATEMENT:  UPDATE VPX_DVPORT SET 

      1 tm:2015-08-06 03:52:33.520 UTC db:VCDB pid:15745 LOG:  unexpected EOF on client connection
      1 tm:2015-08-06 03:52:33.524 UTC db:VCDB pid:8630 LOG:  could not receive data from client: Connection reset by peer
      2 tm:2015-08-06 03:52:33.524 UTC db:VCDB pid:8630 LOG:  unexpected EOF on 

..........................................

  could not receive data from client: Connection reset by peer
: could not fork autovacuum worker process: Cannot allocate memory
: could not fork autovacuum worker process: Cannot allocate memory

.......................................................................


ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used
      1 tm:2015-08-06 03:53:27.838 UTC db:VCDB pid:1234 ERROR:  out of memory
      2 tm:2015-08-06 03:53:27.838 UTC db:VCDB pid:1234 DETAIL:  Failed on request of size 32.
      3 tm:2015-08-06 03:53:27.838 UTC db:VCDB pid:1234 CONTEXT:  SQL statement "SELECT 1 FROM ONLY "vpx"."vpx_vdevice_file_backing" x WHERE "backing_id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"


Upon looking at the VM, I realized the swap was 0 even though the memory usage were very high. At such a high memory usage, I would expect swap to be used.

esxp-vc01:/var/log/vmware/vpx # free -m
             total       used       free     shared    buffers     cached
Mem:         24161      21849       2312          0        414       7797
-/+ buffers/cache:      13637      10524
Swap:            0          0          0


At that point, I suggested my customer to check with their administrator of why the Swap was at 0. Was it allocated? Was it turned on?

To get an overview of the swap, look at the fstab and swapon.

cat /etc/fstab
swapon -s


Conclusion


In ideal scenario, if swap is configured and turned on, it should have something similar as the following.

[root@Cloud3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/mapper/VolGroup00-LogVol01         partition       2064376 0       -1

[root@Cloud3 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         16050      13949       2101          0        313      13160
-/+ buffers/cache:        475      15575
Swap:         2015          0       2015


[root@Cloud3 ~]# cat /etc/fstab|grep swap
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0


[root@Cloud3 ~]# blkid|grep swap
/dev/mapper/VolGroup00-LogVol01: TYPE="swap"
/dev/VolGroup00/LogVol01: TYPE="swap"


Now that, the Postgresql: Cannot allocate memory, can be triggered by anything that causing the memory pool to exhaust. It basically saying, the box has ran out of memory.