Page 1 of 1

NEB running on multiple nodes

Posted: Thu Feb 27, 2014 8:21 pm
by ng00822
Hi guys,

I am currently doing an NEB run with one image/node, where each node has 16 cores (2 sockets, 8 cores/socket). I found for normal VASP optimization runs that using KPAR=2 and NPAR=1 seems to work best, along with "-bysocket -bind-to-socket" mpirun time parameters (the mpirun parameters may be superfluous, still gathering data...). When using the same settings on VTST, certain images have nearly equal performance, while others are much worse (DAV iterations hit worse than RMM). I think the problem may lie on the nodes itself, is there a way to tell which node each image was assigned to? Do you have any recommendations on any mpi-specific parameters to use (or not use)?

Thanks!

Re: NEB running on multiple nodes

Posted: Sun Mar 02, 2014 12:35 am
by ng00822
I think I narrowed the problem to bad behavior by sge_execd, part of the UGE/SGE cluster management system. It is causing a ~30x slowdown during the DAV iterations, ~8x slowdown during RMM-DIIS. Need to get that investigated.