Monday, January 17, 2011

Free memory Grid Control Agent

One problem we had with our Linux systems is: “where is all the memory gone?”

We’ve been there in the past and it looks like the Grid Control (GC) Agent suffers from exactly this misunderstanding.

In our case we have:

- Each of our PROD boxes has 32 GB (GiB to be precise) of physical memory

- They all have 16GB of swap space

- None of them is really using the swap space

- Two out of three report less than 1GB of free memory

o one 166MB

o two 720MB

o three 3978MB

- All three are using 20GB or more memory for caching

o one 23 GB

o two 21GB

o three 20GB

So, where is the memory gone? Caching of course!

Check memory using the free command or reading the MemFree line in /proc/meminfo is not good enough w/ Linux systems.

If there is any memory available, the kernel will take it for I/O caching. If another process requests more memory, the kernel will take it out of the chunk used for I/O caching.

Thus, in Linux systems, the memory available to applications is the memory reported to be free PLUS the memory used for I/O caching.

So if we add the two values:

grep ^MemFree /proc/meminfo | awk '{ print $2 }'

grep ^Cached /proc/meminfo | awk '{ print $2 }'

we get the effective memory available to applications (memory free).

Looks like this is the bit where the GC Agents gets confused.

What can we do?

- We could flush the I/O cache and monitor

- We could check with Oracle what the GC Agents is supposed to be reporting (and/or report a bug)

- Just monitor MemFree and Cached and wait for the next GC alert (monitoring should be done by TSG, they already are flooding our system logs w/ SNMP daemon messages)

- A combination of any of the above

No comments: