NEB running on multiple nodes

ng00822 · Post by **ng00822** » Thu Feb 27, 2014 8:21 pm

Hi guys,

I am currently doing an NEB run with one image/node, where each node has 16 cores (2 sockets, 8 cores/socket). I found for normal VASP optimization runs that using KPAR=2 and NPAR=1 seems to work best, along with "-bysocket -bind-to-socket" mpirun time parameters (the mpirun parameters may be superfluous, still gathering data...). When using the same settings on VTST, certain images have nearly equal performance, while others are much worse (DAV iterations hit worse than RMM). I think the problem may lie on the nodes itself, is there a way to tell which node each image was assigned to? Do you have any recommendations on any mpi-specific parameters to use (or not use)?

Thanks!

ng00822 · Post by **ng00822** » Sun Mar 02, 2014 12:35 am

I think I narrowed the problem to bad behavior by sge_execd, part of the UGE/SGE cluster management system. It is causing a ~30x slowdown during the DAV iterations, ~8x slowdown during RMM-DIIS. Need to get that investigated.

UT theoretical chemistry code forum

NEB running on multiple nodes

NEB running on multiple nodes

Re: NEB running on multiple nodes