Opened 2 years ago

Closed 2 years ago

#2265 closed help (fixed)

Postprocessing error in NEMO/CICE

Reported by: apm Owned by: ros
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version:

Description

[Ros note: Created from email]

I have started getting a new failure mode with my NEMO/CICE model running on Archer. The latetst cycle of my Rose suite u-ap795 has repeatedly failed during the postprocessing stage with this error:

cray-netcdf/4.3.2(34):ERROR:102: Tcl command execution failed: conflict cray-netcdf-hdf5parallel
 
[FAIL]  main_pp.py - Error during import of model CICE
No module named netCDF4
[FAIL] Terminating PostProc...
[FAIL] main_pp.py nemo cice # return-code=1
Received signal ERR
cylc (scheduler - 2017-09-07T09:13:14Z): CRITICAL Task job script received signal ERR at 2017-09-07T09:13:14Z
cylc (scheduler - 2017-09-07T09:13:14Z): CRITICAL failed at 2017-09-07T09:13:14Z

This job, as far as I know, uses exactly the same postprocessing scripts as my other suites on Monsoon (e.g.. u-ao868), which are proceeding with no problem at all. I can’t see how the scripts might have changed mid-run?

Thanks,
Alex

Change History (4)

comment:1 Changed 2 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Summary of responses so far:

Hi Alex,

It looks like you have edited your .bashrc file recently on ARCHER to add module load cray-netcdf-hdf5parallel. This conflicts with the modules required by the postprocessing.

Cheers,
Ros.

—-
Hi Ros,

I was having trouble reading some of my forcing fields (which turned out to have been corrupted here at NOC before I copied them to Archer), and was wondering whether I had the right version of NetCDF set up. I don’t remember which version I commented out, though - I have these:

#module load cray-hdf5-parallel/1.8.13
#module load cray-netcdf-hdf5parallel/4.3.2

Which would be the correct NetCDF module for the postprocessing script?

—-
Hi Alex,

You can't have either of them set as the nco module loads cray-netcdf/4.3.2 which will conflict with any of cray-*-parallel modules.

—-
Hi Ros,

I commented out those lines in my .bashrc, but my postprocessing still fails. Could you have another look at my .bashrc to check everything is as it should be?

Error in the postproc log:

[FAIL]  main_pp.py - Error during import of model CICE
         No module named netCDF4
[FAIL] Terminating PostProc...
[FAIL] main_pp.py nemo cice # return-code=1

This is without any module load commands in ~.bashrc.

comment:2 Changed 2 years ago by ros

Hi Alex,

In the suite.rc file can you please try changing the module load line for postproc to be:

module load nco; module load anaconda/2.2.0-python2; ulimit -s unlimited; module list

It looks like the new ARCHER anaconda module does not have the python netCDF4 package installed. I'm already in conversation with ARCHER about the anaconda modules so will ask them to fix this too.

Cheers,
Ros.

comment:3 Changed 2 years ago by apm

Hi Ros,

The postprocessing is now running, and didn't crash immediately, so I guess that means your fix worked!

Many thanks,

Alex

comment:4 Changed 2 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.