Dear Sir,
When I use c-NEB, the job interrupts abnormally, and the error message is :
p0_8544: p4_error: interrupt SIGx: 15
bm_list_8545: p4_error: interrupt SIGx: 15
p0_8544: (864031.140625) net_send: could not write to fd=4, errno = 32
What's wrong with it?
Many thanks!
NEB interrupts abnormally
Moderator: moderators
Re: NEB interrupts abnormally
This looks like an mpi error.
Is the job running normally and then timing out? You can get errors like this if your queueing system starts killing processes when the wall-clock time limit is reached.
Or does it crash at a particular point in the calculation. If it is the latter, and it looks like an issue with our code, I would like to investigate it further.
Is the job running normally and then timing out? You can get errors like this if your queueing system starts killing processes when the wall-clock time limit is reached.
Or does it crash at a particular point in the calculation. If it is the latter, and it looks like an issue with our code, I would like to investigate it further.
Re: NEB interrupts abnormally
Dear Sir,
Thanks for your kind reply!
Here is another question. Sometimes, my NEB would interrupt after several steps, and the error message is:
rm_l_2_7375: p4_error: interrupt SIGx: 15
rm_l_2_7375: (2015.175781) net_send: could not write to fd=5, errno = 32
bm_list_7370: p4_error: interrupt SIGx: 15
rm_l_1_7374: p4_error: interrupt SIGx: 15
rm_l_1_7374: (2015.691406) net_send: could not write to fd=5, errno = 32
After resubmission, sometimes, the job can continue several steps, but sometimes, it quits after only one step.
What is wrong with it? How to solve this problem?
Many many thanks!
Thanks for your kind reply!
Here is another question. Sometimes, my NEB would interrupt after several steps, and the error message is:
rm_l_2_7375: p4_error: interrupt SIGx: 15
rm_l_2_7375: (2015.175781) net_send: could not write to fd=5, errno = 32
bm_list_7370: p4_error: interrupt SIGx: 15
rm_l_1_7374: p4_error: interrupt SIGx: 15
rm_l_1_7374: (2015.691406) net_send: could not write to fd=5, errno = 32
After resubmission, sometimes, the job can continue several steps, but sometimes, it quits after only one step.
What is wrong with it? How to solve this problem?
Many many thanks!