Out-Of-Memory in LAMP server - CentOS 5.9 x86 - Why now?
Re: Out-Of-Memory in LAMP server - CentOS 5.9 x86 - Why now?
I thought top's -m showed virtual memory (as in RSS + swap). Surely you are most interested in RSS?
Re: Out-Of-Memory in LAMP server - CentOS 5.9 x86 - Why now?
It is my understanding the -m parameter in CentOS's version of top (which is different from other tops, for example Debian's top) has the effect of sorting by the memory's RES value, i.e., by physical real memory usage.aks wrote:I thought top's -m showed virtual memory (as in RSS + swap). Surely you are most interested in RSS?
I guess that whatever process spills into swap consuming all of it, should also be the process with higher physical RAM consumption. Yeah, I know it does not necessarily has to be that way, but I think that the odds of it being so are high.
What do you think?
Re: Out-Of-Memory in LAMP server - CentOS 5.9 x86 - Why now?
From an a CentOS 5 VM:
man top:
-m : VIRT/USED toggle
Reports USED (sum of process rss and swap total count) instead of VIRT
man top:
-m : VIRT/USED toggle
Reports USED (sum of process rss and swap total count) instead of VIRT
Re: Out-Of-Memory in LAMP server - CentOS 5.9 x86 - Why now?
From my CentOS 5.9 x86 server:aks wrote:From an a CentOS 5 VM:
man top:
-m : VIRT/USED toggle
Reports USED (sum of process rss and swap total count) instead of VIRT
Code: Select all
(man top)
-m : Sort by memory usage
This switch makes top to sort the processes by allocated memory
Code: Select all
(man top)
SORTING of task window
For compatibility, this top supports most of the former top sort keys. Since this
is primarily a service to former top users, these commands do not appear on any help screen.
command sorted field supported
A start time (non-display) No
M %MEM Yes
N PID Yes
P %CPU Yes
T TIME+ Yes
Also, I've moved these two command to a plain user's crontab, as I see that root permissions are not needed to run them:
Code: Select all
$ crontab -l
*/5 * * * * top -b -n1 -m | head -30 > /tmp/top-sample_`date +\%Y-\%m-\%d_\%H-\%M-\%S`_.txt
56 20 * * * find /tmp/top-sample_* -type f -mtime +2 -print0 | xargs -r -0 rm
So far the server is mostly idle, this is the last report from just some minutes ago:
Code: Select all
$ cat /tmp/top-sample_2015-08-21_10-00-01_.txt
top - 10:00:01 up 3 days, 10:30, 6 users, load average: 0.02, 0.07, 0.08
Tasks: 91 total, 1 running, 90 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.1%us, 1.2%sy, 0.0%ni, 88.5%id, 3.2%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 2075016k total, 2015992k used, 59024k free, 63016k buffers
Swap: 2097144k total, 72k used, 2097072k free, 1081352k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13948 apache 18 0 137m 61m 8464 S 0.0 3.0 1:35.08 httpd
2400 root 18 0 94216 60m 50m S 0.0 3.0 0:11.67 httpd
12772 apache 15 0 133m 56m 8536 S 0.0 2.8 1:57.91 httpd
15021 apache 15 0 133m 56m 8380 S 0.0 2.8 1:11.29 httpd
15032 apache 15 0 132m 56m 8516 S 0.0 2.8 1:14.92 httpd
12773 apache 18 0 132m 56m 8456 S 0.0 2.8 1:50.49 httpd
13971 apache 15 0 132m 56m 8452 S 0.0 2.8 1:37.29 httpd
17695 apache 15 0 132m 56m 8220 S 0.0 2.8 0:18.12 httpd
15009 apache 16 0 132m 55m 8372 S 0.0 2.8 1:09.46 httpd
13967 apache 15 0 132m 55m 8404 S 0.0 2.7 1:44.37 httpd
15027 apache 15 0 131m 55m 8408 S 0.0 2.7 1:09.44 httpd
13969 apache 17 0 131m 55m 8456 S 0.0 2.7 1:42.23 httpd
11846 apache 15 0 130m 53m 8572 S 0.0 2.7 2:12.39 httpd
15004 apache 15 0 128m 52m 8448 S 0.0 2.6 1:08.06 httpd
15008 apache 17 0 128m 51m 8464 S 0.0 2.6 1:11.67 httpd
15020 apache 15 0 127m 51m 8416 S 0.0 2.5 1:00.51 httpd
17414 apache 18 0 124m 49m 8196 S 0.0 2.5 0:20.22 httpd
17415 apache 15 0 124m 49m 8196 S 0.0 2.5 0:21.10 httpd
17413 apache 18 0 126m 49m 8228 S 0.0 2.5 0:23.77 httpd
17416 apache 15 0 124m 49m 8120 S 0.0 2.4 0:29.15 httpd
17427 apache 20 0 126m 49m 8372 S 0.0 2.4 0:29.11 httpd
5176 mysql 15 0 151m 33m 4908 S 0.0 1.7 30:47.08 mysqld
2169 ntp 15 0 4532 4528 3516 S 0.0 0.2 0:00.12 ntpd
Re: Out-Of-Memory in LAMP server - CentOS 5.9 x86 - Why now?
I've been digging around, and it may be that I have hit a bug in the Linux kernel: system stalling in infinite loop when reaching an OOM condition because the OOM-killer cannot complete the killing of candidate processes. It appears there is a design fault in the memory management subsystem of the Linux kernel, which can trigger this problem in certain special cases. This kernel problem is still unresolved.InitOrNot wrote:Here is a pastebin of the relevant logs, can anyone spot something out of the ordinary in them?
http://pastebin.com/raw.php?i=V3Ps2vNC
More info here: https://lwn.net/Articles/627419/
If you review my pasted logs above, you will see that the OOM-killer is triggered and starts killing processes, apparently with success (Postfix master, several Apache httpd), until it tries to kill mysqld (hour 01:53:12) but cannot finish that kill, and from that point onwards the system stalls with 100% consumption (hour 03:23:49 and following, with message "INFO: task mysqld:9402 blocked for more than 120 seconds" which appears several times after then), until I had to power cycle the system at hour 14:52:14.
So that explains why the OOM-killer did NOT return the system to an usable condition (albeit with its main application services killed).
I still have to find which process, MySQL or Apache, was who consumed all the RAM... The problem has not reoccurred since I upped the RAM to 2 GB.
Additional info: https://lwn.net/Articles/627436/
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 22 Dec 2014 17:16:43 -0500
Subject: [patch] mm: page_alloc: avoid page allocation vs. OOM killing
deadlock
The page allocator per default does not ever give up on allocations up
to order 3, and instead it keeps the allocating task in a loop running
direct reclaim and invoking the OOM killer. The assumed reason behind
this decade-old behavior is that the system is unusable once orders of
such small size start to fail, and the allocator might as well keep
killing processes one-by-one until the situation is rectified.
However, the allocating task itself might be holding locks that the
OOM victim might need to exit, and, to preserve the emergency memory
reserves, the OOM killer doesn't move on to the next victim until the
first choice has exited. The result is a deadlock between the task
that is trying to allocate and the OOM victim that can't be resolved
without a third party exiting or volunteering unreclaimable memory.
(...snip...)