No Work

eOn code for long time scale dynamics

Moderator: moderators

Post Reply
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

No Work

Post by jon »

Y'all aint sent no stuff for 12 hours, what new atoms will we be working on next?
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

36 hours guys..., will there be any new exotic molecules involved?
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

48 hours (?), are we going to work on making a process more better, or discover a new one, whats it for?
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

72 hours, is everything ok?
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

Yeah!, Back up and going.
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

Ran out of wus 0135(L) CST, 01 Dec 13.
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

YEAH!, 5 hours and going strong (from 1700(L) 1 Dec 2013), is this a new problem? The last problem (5.1 million wus) was much shorter than the previous, but it reached a seeming record of 4000 active cores (results in progress over 2) at it's peak, this one is only running at 2500 cores currently?
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

Oh, and thank you for stoking the server on your Thanksgiving Sunday, you are awesome!
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

My machines are having a hard time, or not getting wus at all.
Work Units: *1404_41_* & *8141_*
I noticed:
About 6000 Spare wus (vs about 1900 usual), decreasing about a hundred or so at 18 hour intervals over the last few days,
slower, or not at all, web page displays than usual when Server Status is clicked,
all traces gaps from 1140(L) to 1210(L), CST, 13 Jan 2014,
followed by Spare Work unit trace droping and then spiking, (believe there was one yesterday as well),
followed by another all traces gaps from 1740(L) to 1830(L), CST, 13 Jan 2014,
followed by plunge of 'FLOPS' and '# of Computers' traces begining about 1830(L),
I did manage to get one wus and a sever status update, the web page displayed the following message:
"Warning: mysql_pconnect(): Too many connections in /home/eon/projects/eon2/html/inc/db.inc on line 39 The database server is not accessible".
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

One of my Machines managed to upload and report it's two wus (it's a single core machine),
then got issued 500598096_1500721 & 500598096_1500724,
each ran for 2 to 3 seconds and gave a 'Computational error',
Event Log says for the last wus (first was similar):
1/13/2014 9:34:45 PM | eon2 | Computation for task 500598096_1500724_0 finished
1/13/2014 9:34:45 PM | eon2 | Output file 500598096_1500724_0_0 for task 500598096_1500724_0 absent
1/13/2014 9:38:34 PM | eon2 | Sending scheduler request: To fetch work.
1/13/2014 9:38:34 PM | eon2 | Reporting 2 completed tasks, requesting new tasks for CPU
1/13/2014 9:38:36 PM | eon2 | Scheduler request completed: got 0 new tasks
1/13/2014 9:38:36 PM | eon2 | Server can't open database,
'Server can't open database' has been a reoccurring message since about 1830(L)
and the web page for Sever Status will still not display.
Oops, Sever Status web page just displayed, been waiting on it since last message, it says:
Results ready to send 1,054
Results in progress 4,525
Workunits waiting for validation 3,508
Workunits waiting for assimilation 3
Workunits waiting for deletion 1
Results waiting for deletion 0
Transitioner backlog (hours) 1.
Is *1404_* a wus series we worked on before?
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

The Server quit reporting 'Server can't open database' about 2230(L) and now just says it does not have any work,
the latest (as of this message, it took a while to display the web page) server status says:
Results ready to send 0
Results in progress 2,603
Workunits waiting for validation 380
Workunits waiting for assimilation 3
Workunits waiting for deletion 0
Results waiting for deletion 0
Transitioner backlog (hours) 0.
Is *8096_15xxxxx* an old wus series with a new name, as the first wus's started with 1.5 million?
jon
Posts: 16
Joined: Sun Nov 10, 2013 7:38 pm

Re: No Work

Post by jon »

Yeah!, 5 hours and going strong (from 1020(L) 15 Jan 2014), Thank You!
Some i7's are at the top of the list, vs the usual 1,2,3,... of the huge 48 core Opterons for the last problem, (series of wus)?
graeme
Site Admin
Posts: 2291
Joined: Tue Apr 26, 2005 4:25 am
Contact:

Re: No Work

Post by graeme »

Jon, thanks for the note. We were a little slow fixing things this time, but yes, we're back up.
Post Reply