Author Topic: Parallel job problem  (Read 4713 times)

c00jsh00

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
Parallel job problem
« on: June 07, 2012, 04:31:55 am »
Hi,

I have difficulties in running parallel Turbomole jobs in MPI mode, I got the following error messages:

 chem@alps5:/work/chem/P7H3.SCF.E1> dscf
Parallel program dscf_mpi will be taken out of the TURBODIR directory.
 dscf ended normally
STARTING dscf ON 4 PROCESSORS!
RUNNING PROGRAM /pkg/chem/turbomole64/bin/x86_64-unknown-linux-gnu_mpi/dscf_mpi.
PLEASE WAIT UNTIL dscf HAS FINISHED.
Look for the output in slave1.output.
dscf_mpi: Rank 0:2: MPI_Init: psm_ep_connect() failed
dscf_mpi: Rank 0:2: MPI_Init: Internal Error: Processes cannot connect to rdma device
dscf_mpi: Rank 0:3: MPI_Init: psm_ep_connect() failed
dscf_mpi: Rank 0:1: MPI_Init: psm_ep_connect() failed
dscf_mpi: Rank 0:4: MPI_Init: psm_ep_connect() failed
dscf_mpi: Rank 0:3: MPI_Init: Internal Error: Processes cannot connect to rdma device
dscf_mpi: Rank 0:1: MPI_Init: Internal Error: Processes cannot connect to rdma device
dscf_mpi: Rank 0:4: MPI_Init: Internal Error: Processes cannot connect to rdma device
dscf_mpi: Rank 0:0: MPI_Init: psm_ep_connect() failed
dscf_mpi: Rank 0:0: MPI_Init: Internal Error: Processes cannot connect to rdma device
MPI Application rank 2 exited before MPI_Finalize() with status 1
No file slave1.output found?


I have set the following environment variables:

PARA_ARCH=MPI
PARNODES=4
HOST_FILE=./hosts
OMP_NUM_THREADS=1

Am I missing something?   Please help


JS

where file hosts has the following contents:

alps5
alps5
alps5
alps5

"alps5" is the name of the computing node I am on.