Page 1 of 1

Runaway WUs

Posted: Tue May 03, 2011 9:53 pm
by Paratima
Yesterday morning, I found these two WUs, both running on the same system.

82347095 82343175

Both had run-times of over 6 hours. Neither was using any CPU time, just idling along, preventing other WUs from being processed.
I aborted both. Haven't seen any others on my three active systems running EON.
I will cheerfully provide any supporting information you would like.

Re: Runaway WUs

Posted: Fri May 06, 2011 11:50 am
by Paratima
Caught and aborted another one this morning, after one hour and fifty minutes.
WU# 84912243
Same symptoms: run length over an hour and not using any CPU time.

If the admins are aware of this problem and have sufficient samples and info to solve it, just let me know and I'll shut up about it.

Re: Runaway WUs

Posted: Thu May 12, 2011 12:35 pm
by Paratima
Yet another. WU# 88214266
Aborted after 2+ hours.

Re: Runaway WUs

Posted: Thu May 26, 2011 11:34 am
by Paratima
Still another. WU# 96019272.
Aborted after 2 hours, 39 minutes.
Just spinning - no work being done.

Re: Runaway WUs

Posted: Thu Jun 02, 2011 9:51 pm
by Paratima
Two more tasks found running for 3+ hours, using no CPU time, just idling.
Getting REALLY tired of having to check all my machines every few hours.

Is there a fix for this or am I just wasting my time reporting it?

I'll CHEERFULLY post WU ID numbers and any other desired information.

Re: Runaway WUs

Posted: Mon Jun 13, 2011 2:33 pm
by upstatelabs
Same issue here.

here is one WU that I aborted after 14hrs:

Time was running but CPU was not in use for the WU.

Re: Runaway WUs

Posted: Thu Jun 16, 2011 2:58 pm
by Tex1954
I get one of those sometimes, but lately the big problem has been the server aborting a lot of WU's and also getting overloaded. Many times I get this...

519 eon2 6/16/2011 9:53:32 AM Started upload of 985262977_14643_201901525_0_0
520 eon2 6/16/2011 9:53:32 AM Started upload of 453769022_18500_197012431_0_0
521 eon2 6/16/2011 9:53:34 AM [error] Error reported by file upload server: can't parse config file
522 eon2 6/16/2011 9:53:34 AM [error] Error reported by file upload server: can't parse config file
523 eon2 6/16/2011 9:53:34 AM Temporarily failed upload of 985262977_14643_201901525_0_0: transient upload error