Page 1 of 1

memory leak/OOM kill

Posted: Wed Jun 05, 2013 8:02 pm
by inu
Hi, After noticing that my box was running cool and the system was running unusually slow (everything was moved to swap), I found out that the kernel killed all the running eons (I have 4GB mem, 2GB swap).

kern.log:

Code: Select all

Jun  5 12:09:44 cygnus kernel: [2974759.069810] Killed process 6271 (eonclient_5.00_) total-vm:1063100kB, anon-rss:842092kB, file-rss:208kB
Jun  5 12:12:08 cygnus kernel: [2974898.801129] Killed process 6272 (eonclient_5.00_) total-vm:1063100kB, anon-rss:911912kB, file-rss:0kB
Jun  5 12:13:31 cygnus kernel: [2974986.922215] Killed process 6296 (eonclient_5.00_) total-vm:1063100kB, anon-rss:725892kB, file-rss:0kB
Jun  5 12:13:31 cygnus kernel: [2974986.990299] Killed process 6299 (eonclient_5.00_) total-vm:1063100kB, anon-rss:725916kB, file-rss:0kB
Jun  5 12:13:31 cygnus kernel: [2974987.125096] Killed process 6297 (eonclient_5.00_) total-vm:1063100kB, anon-rss:720416kB, file-rss:8kB
Jun  5 12:16:14 cygnus kernel: [2975139.441500] Killed process 6315 (eonclient_5.00_) total-vm:774156kB, anon-rss:514640kB, file-rss:0kB
1 gig seems awfully huge. Normally, I think they run at 100-300 megs. My BOINC version is "6.12.40 x86_64-pc-linux-gnu", with eon as "eonclient_5.00_x86_64-pc-linux-gnu". I use the stock/downloaded binary for eon. I have been using these two flawlessly together for some time now (since 5.0's release), so I guess something's wrong with the workunits.

Task pages:
http://eon.ices.utexas.edu/eon2/result. ... =229032325
http://eon.ices.utexas.edu/eon2/result. ... =229032310
http://eon.ices.utexas.edu/eon2/result. ... =229031981
http://eon.ices.utexas.edu/eon2/result. ... =229031508

This also happened for four other tasks.
In case they get removed from the database, the "error message" on the workunit page says "Too many total results".

EDIT:
After doing some approximate research, the client sometimes crawls up to 1-1.3GiB across a handful of minutes then drops down to a few K. The times that it doesn't, everything works as expected. Limiting the amount of memory boinc is allowed to use prevents doomsday sluggishness/OOM kills, however the tasks still error with code -177 (0xffffffffffffff4f):

<core_client_version>6.12.40</core_client_version>
<![CDATA[
<message>
Maximum memory exceeded
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (2 frames):
[0x83ae89e]
[0xf77bb400]

Exiting...
SIGSEGV: segmentation violation
Stack trace (2 frames):
[0x83ae89e]
[0xf772e400]

Exiting...
SIGSEGV: segmentation violation
Stack trace (2 frames):
[0x83ae89e]
[0xf77c8400]

Exiting...
SIGSEGV: segmentation violation
Stack trace (2 frames):
[0x83ae89e]
[0xf77d8400]

Exiting...

</stderr_txt>
]]>

Re: memory leak/OOM kill

Posted: Tue Jun 18, 2013 1:03 am
by Augustine
Same here.

Please, advise.

Re: memory leak/OOM kill

Posted: Tue Jun 18, 2013 3:32 pm
by stauff
Hi guys,

We are trying to resolve this issue. Can either of you provide a workunitid or resultid of jobs that have crashed--this will greatly speedup our search for the source of this issue.

Re: memory leak/OOM kill

Posted: Tue Jun 18, 2013 7:30 pm
by Augustine
I aborted these when I noticed that they had grown to about 1GB of virtual memory and were suspended waiting for other projects to finish: http://bit.ly/11JCyV4. Others finished with a SIGSEGV, like this one.

HTH

Re: memory leak/OOM kill

Posted: Fri Jun 21, 2013 3:57 pm
by Conan
ALL current eOn work units are failing on my Windows 32 bit machine.

They reach 1 GB in 1 minute and 2 Gb in 3 minutes then fail. Chewing up 2 GB of RAMon a 32 bit computer leaves 1 Gb for every other process the computer runs, doesn't work.

Conan

Re: memory leak/OOM kill

Posted: Fri Jun 21, 2013 7:11 pm
by mickydl*

Re: memory leak/OOM kill

Posted: Sat Jun 22, 2013 12:40 am
by Conan
Just noticed that my Linux machine with 8 GB RAM got very sluggish.
Checking System Monitor showed that an eOn task consumed up to 6.1 GB of Memory and 6.9 GB Virtual memory after just 11 minutes.
It then dropped back to 24.4 MB of memory and 39.4 MB of Virtual memory and start to run normally. The Computer then started to respond as quick as it should normally be.

I can't be running work units like this as they will only just run on my 8 GB machine and all my others only have 4 GB with the 2 Windows machines only having 3 GB available, so all work units on my Windows host are failing.

Conan

Re: memory leak/OOM kill

Posted: Sat Jun 22, 2013 5:30 pm
by Yacob
Same here with Windows 7 64 bits.

Code: Select all

232158948 	241325083 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	231.74 	--- 	eOn Client v5.00
232158904 	241325039 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	232.60 	--- 	eOn Client v5.00
232158903 	241325038 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	230.69 	--- 	eOn Client v5.00
232158902 	241325037 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Error while computing 	277.82 	230.82 	--- 	eOn Client v5.00
232158901 	241325036 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	230.93 	--- 	eOn Client v5.00
232158900 	241325035 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	231.13 	--- 	eOn Client v5.00

Re: memory leak/OOM kill

Posted: Sun Jun 23, 2013 4:30 am
by Conan
Well that's it for eOn on my 32 bit Windows computers- NNW. Not requesting any more work for this project.

I have been running this project for ages as it is one of my favorites and hardly ever has a problem.
However after wrestling for control of my computer for over an hour due to eOn tasks consuming all memory on the computer, to the point that the computer no longer did anything, I have stopped getting new tasks.
As I stated before I only have 3 GB of memory available but these eOn tasks had a Windows Commit Charge of almost 5 GB and they was trying to go higher but couldn't get anymore memory so the computer froze.

Prior to recently this project has been very reliable so I don't know what has changed as there has not been an Application update that I can see.

My Linux 64 bit computer with 8 GB RAM can handle a couple of these eOn tasks running together but not my Windows computer.

Conan

Re: memory leak/OOM kill

Posted: Sun Jun 23, 2013 9:31 am
by Yacob
Not requesting any more work for this project.
Me too!!
Please, update when the issues are fixed so we can resume the work on this project.

Thanks!!!

Re: memory leak/OOM kill

Posted: Thu Jul 04, 2013 4:32 pm
by Sebastien
stauff wrote:Hi guys,

We are trying to resolve this issue. Can either of you provide a workunitid or resultid of jobs that have crashed--this will greatly speedup our search for the source of this issue.
The problem seems to be located in the function ParallelReplicaJob::dephase()
100,000 dephaseSteps is too high.

Re: memory leak/OOM kill

Posted: Fri Jul 05, 2013 3:59 am
by felixonmars
stauff wrote:Hi guys,

We are trying to resolve this issue. Can either of you provide a workunitid or resultid of jobs that have crashed--this will greatly speedup our search for the source of this issue.
http://eon.ices.utexas.edu/eon2/result. ... =233022466

My box has 16GB ram and I also enabled 16GB zram to make some of the WUs completed without an error - but that's still not stable so I have to select "no new tasks" for now. Waiting for this issue to be fixed :)

Re: memory leak/OOM kill

Posted: Mon Jul 08, 2013 4:14 pm
by felixonmars
ZPC2THLgate wrote:The client binary has been updated and should now work.
Thanks! It works fine here.

Re: memory leak/OOM kill

Posted: Mon Aug 19, 2013 2:23 am
by losyguy
Yacob wrote:Same here with Windows 7 64 bits.
Here is the code

Code: Select all

232158948 	241325083 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	231.74 	--- 	eOn Client v5.00
232158904 	241325039 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	232.60 	--- 	eOn Client v5.00
232158903 	241325038 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	230.69 	--- 	eOn Client v5.00
232158902 	241325037 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Error while computing 	277.82 	230.82 	--- 	eOn Client v5.00
232158901 	241325036 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	230.93 	--- 	eOn Client v5.00
232158900 	241325035 	33859 	22 Jun 2013 16:47:31 UTC 	22 Jun 2013 17:17:04 UTC 	Aborted by user 			277.82 	231.13 	--- 	eOn Client v5.00
Hello Yacob,

I am having the same problem and I am also on a Windows 7 machine. Did you ever figure out a solution to this? Thanks so much.