Random fpops_est

eOn code for long time scale dynamics

Moderator: moderators

Post Reply
Ananas
Posts: 8
Joined: Sun Sep 12, 2010 8:04 am

Random fpops_est

Post by Ananas »

The fpops_est (used to calculate the estimated runtime) seems to be quite random here :

Code: Select all

    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>156749973470.000000</rsc_fpops_est>
    <rsc_fpops_est>151272401416.000000</rsc_fpops_est>
    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>161091959979.000000</rsc_fpops_est>
    <rsc_fpops_est>1706232249440.000000</rsc_fpops_est>
This - together with the extremely short deadline - often forces BOINC into panic mode,
i.e. EDF / Earliest Deadline First.

24 hours is the critical lower deadline limit for the panic mode, 6 hours more and more
constant fpops_est values would avoid this "selfishness" of the eOn workunits.
chill
Posts: 96
Joined: Tue Jul 28, 2009 9:04 pm

Re: Random fpops_est

Post by chill »

The rsc_fpops_est will vary per wu as some wus belong to different simulations and each simulation has a simple algorithm for guessing the average fpops per wu.

I have increased the delay bound to 30 hours. I hope this helps with the panic mode.
Ananas
Posts: 8
Joined: Sun Sep 12, 2010 8:04 am

Re: Random fpops_est

Post by Ananas »

Yes, thanks - that helps, no panic mode anymore :-)

About the fpops_est thing :

The results usually take between 1.5 and 5 minutes on my box, but now and then one with a really low fpops_est value must occur, causing the estimated runtime (roughly derived from : fpops_est / duration correction factor / benchmark) to shoot up to something like 35 hours (I have even seen 128 hours once).

The BOINC handler for this DCF works like this :

- result runs longer than calculated from benchmark result and fpops_est => the new DCF is calculated directly from this one result

- result runs a bit shorter than calculated from benchmark result and fpops_est => the new DCF is influenced by this result by 10%

- result runs much shorter than calculated from benchmark result and fpops_est => the new DCF is influenced by this result by just 1%

So if fpops_est is way too short just once, it takes tons of results to fix the DCF and bring it back to a correct value - this influences the client side cache handler as well as the work scheduler.
Ananas
Posts: 8
Joined: Sun Sep 12, 2010 8:04 am

Re: Random fpops_est

Post by Ananas »

Currently it works like a charm, no jumpy estimated runtimes anymore and no panic mode :-)
Post Reply