We are using RZ/A1 processor based custom target board . linux kernel version is 4.9.123 ( downloaded from renesas site).
user manual r01uh0403ej0200_rz_a1h pdf is referred .
1)the platform is low memory platform having memory 64MB
2) we are doing around 45MB TCP data transfer from PC to target using netcat utility .
On Target , a process receives data over socket and writes the data to flash disk .
3) At the start of data transfer , we explicitly clear linux kernel cached memory by calling echo 3 > /proc/sys/vm/drop_caches .
4) during TCP data transfer , we could see free -m showing "free" getting dropped to almost 1MB and most of the memory appearing as "cached"
# free -m
total used free shared buffers cached
Mem: 57 56 1 0 2 42
-/+ buffers/cache: 12 45
Swap: 0 0 0
5) sometimes , we observed kernel memory getting exhausted as page allocation failure happens in kernel and the target reboots after the backtrace is printed below :
# [ 775.947949] nc.traditional: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
[ 775.956362] CPU: 0 PID: 1288 Comm: nc.traditional Tainted: G O 4.9.123-pic6-g31a13de-dirty #19
[ 775.966085] Hardware name: Generic R7S72100 (Flattened Device Tree)
[ 775.972501] [<c0109829>] (unwind_backtrace) from [<c010796f>] (show_stack+0xb/0xc)
[ 775.980118] [<c010796f>] (show_stack) from [<c0151de3>] (warn_alloc+0x89/0xba)
[ 775.987361] [<c0151de3>] (warn_alloc) from [<c0152043>] (__alloc_pages_nodemask+0x1eb/0x634)
[ 775.995790] [<c0152043>] (__alloc_pages_nodemask) from [<c0152523>] (__alloc_page_frag+0x39/0xde)
[ 776.004685] [<c0152523>] (__alloc_page_frag) from [<c03190f1>] (__netdev_alloc_skb+0x51/0xb0)
[ 776.013217] [<c03190f1>] (__netdev_alloc_skb) from [<c02c1b6f>] (sh_eth_poll+0xbf/0x3c0)
[ 776.021342] [<c02c1b6f>] (sh_eth_poll) from [<c031fd8f>] (net_rx_action+0x77/0x170)
[ 776.029051] [<c031fd8f>] (net_rx_action) from [<c011238f>] (__do_softirq+0x107/0x160)
[ 776.036896] [<c011238f>] (__do_softirq) from [<c0112589>] (irq_exit+0x5d/0x80)
[ 776.044165] [<c0112589>] (irq_exit) from [<c012f4db>] (__handle_domain_irq+0x57/0x8c)
[ 776.052007] [<c012f4db>] (__handle_domain_irq) from [<c01012e1>] (gic_handle_irq+0x31/0x48)
[ 776.060362] [<c01012e1>] (gic_handle_irq) from [<c0108025>] (__irq_svc+0x65/0xac)
[ 776.067835] Exception stack(0xc1cafd70 to 0xc1cafdb8)
[ 776.072876] fd60: 0002751c c1dec6a0 0000000c 521c3be5
[ 776.081042] fd80: 56feb08e f64823a6 ffb35f7b feab513d f9cb0643 0000056c c1caff10 ffffe000
[ 776.089204] fda0: b1f49160 c1cafdc4 c180c677 c0234ace 200e0033 ffffffff
[ 776.095816] [<c0108025>] (__irq_svc) from [<c0234ace>] (__copy_to_user_std+0x7e/0x430)
[ 776.103796] [<c0234ace>] (__copy_to_user_std) from [<c0241715>] (copy_page_to_iter+0x105/0x250)
[ 776.112503] [<c0241715>] (copy_page_to_iter) from [<c0319aeb>] (skb_copy_datagram_iter+0xa3/0x108)
[ 776.121469] [<c0319aeb>] (skb_copy_datagram_iter) from [<c03443a7>] (tcp_recvmsg+0x3ab/0x5f4)
[ 776.130045] [<c03443a7>] (tcp_recvmsg) from [<c035e249>] (inet_recvmsg+0x21/0x2c)
[ 776.137576] [<c035e249>] (inet_recvmsg) from [<c031009f>] (sock_read_iter+0x51/0x6e)
[ 776.145384] [<c031009f>] (sock_read_iter) from [<c017795d>] (__vfs_read+0x97/0xb0)
[ 776.152967] [<c017795d>] (__vfs_read) from [<c01781d9>] (vfs_read+0x51/0xb0)
[ 776.159983] [<c01781d9>] (vfs_read) from [<c0178aab>] (SyS_read+0x27/0x52)
[ 776.166837] [<c0178aab>] (SyS_read) from [<c0105261>] (ret_fast_syscall+0x1/0x54)
6) please help investigate the issue , we have certain questions as below :
a) how the kernel memory got exhausted ? at the time of low memory conditions in kernel , are the kernel page flusher threads , which should have written dirty pages from page cache to flash disk , not executing at right time ? we observed that the page reclaim procedure in kernel triggers at a slow rate .
it looks to us that somehow the procedure which should have written pages from kernel disk cache to mmc disk does not execute at right time , so in low memory
condition , dirty pages present in kernel disk caches do not get sync to disk at right time and hence kernel goes out of memory when it tried to allocate
page on behalf of nc process . are there any parameters available within the linux memory subsystem with which the reclaim procedure can be fined tuned ?
b) can we reserve some amount of free memory so that linux kernel does not caches it and kernel can use it for its other required page allocation as needed on behalf of nc process above ?
can some tuning be done in linux memory subsystem eg by using /proc/sys/vm/min_free_kbytes to achieve this objective .
We want to ensure that all available free memory should not end up be used by disk cache mechanism in linux kernel but some amount of memory should be available for page allocations required as well.
c) can we be provided with further clues on how to debug this issue further for out of memory condition in kernel ?
what other reasons can be possible for this low memory condition in kernel ?
Regards
Amit