Opened 4 years ago

Closed 4 years ago

#1894 closed help (fixed)

ARCHER compilation related to ticket #1877

Reported by: Leighton_Regayre Owned by: ros
Component: ARCHER Keywords:
Cc: Platform: ARCHER
UM Version: 8.4

Description

I've had difficulty compiling jobs which compiled without issue at the start of the year.

I've read the solution described in ticket #1877 and subsequently added this line to my ~/.profile file:

. /etc/bash.bashrc > /dev/null 2>&1

The remainder of the advised addition was already included in my .profile. I sourced the .profile file and attempted to compile again, unsuccessfully.

The compilation issue looks to produce the same error messages as Nick experienced in ticket #1877.

Thank-you,

Leighton.

Change History (12)

comment:1 Changed 4 years ago by Leighton_Regayre

  • Component changed from UM Model to ARCHER
  • Platform set to ARCHER
  • UM Version changed from <select version> to 8.4

comment:2 Changed 4 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Leighton,

Please could you give us details of the job id(s) that are exhibiting this problem so that we can take a look. And also point us to the output file that contains the error messages - I had a quick look in your ~/output directory but couldn't find anything.

Regards,
Ros.

comment:3 Changed 4 years ago by Leighton_Regayre

Hi Ros,

xmcnn-161112220 initially caused the compilation issue despite being a small change to an existing job. I then copied the original job (which worked at the start of the year) to xmcno-161114356. Both listed jobs have the same compilation issue.

The output is now in my output directory but does not contain the error as printed to screen (which was identical to the content described in ticket 1877.

Thanks,

Leighton.

comment:4 Changed 4 years ago by ros

Hi Leighton,

I suspect the problem is that you are using an incompatible version of the cray-netcdf module. ARCHER updated this module recently and you have to use a specific version.

Is there a reason why you are not using our standard compiling environment which makes sure that the correct combination of modules is loaded? You have the hand-edit ~grenville/umui_jobs/hand_edits/remove_loadcomp.ed included which is stripping out that standard environment. This hand-edit should only be used if you need to use a different version of the cray compiler to the default and are happy to manually set other required module versions. If you turn this hand-edit off I think it will find the required .mod files ok.

Also is there a reason why you are compiling interactively and not in the serial queues? I see that you have Luke's manual_comp_ARCHER.ed hand-edit which is used for the training course enabled. We would recommend that you turn off this hand-edit too.

Regards,
Ros.

comment:5 Changed 4 years ago by grenville

Leighton

Just try turning it off (I can't recall why it's in your job).

Grenville

comment:6 Changed 4 years ago by Leighton_Regayre

Hi Ros, Genville,

I've removed the hand-edit and also removed a line in my .bashrc for laoding the cray netcdf module.

The job xmcnn-162123323 still doesn't compile.

Leighton.

comment:7 Changed 4 years ago by ros

Hi Leighton,

It hasn't rebuilt everything. Please turn on "Force full build" in window FCM Configuration → FCM Extract and build directories.

Also if you are going to continue compiling on the command line please direct the error messages to the output file too otherwise it's impossible for us to diagnose any problems ( ./umuisubmit_compile > name_of_outfile 2>&1 )

Cheers,
Ros.

comment:8 Changed 4 years ago by Leighton_Regayre

Thanks Ros,

Switching on "Force full build" has allowed the compilation to complete but the reconfiguration failed with message:
'Failed in qsrecon in job xmcnn'

Output file for compilation is:
/home/n02/n02/lre/output/xmcnn-162132141_outfile (produced using your suggested command)

and for the reconfiguration:
home/n02/n02/lre/output/xmcnn000.xmcnn.d16162.t132203.rcf.leave

Cheers,

Leighton.

comment:9 Changed 4 years ago by Leighton_Regayre

Further to this,

there were error messages in the compilation relating to cray-mpich:

cray-mpich/7.2.6(29):ERROR:150: Module 'cray-mpich/7.2.6' conflicts with the currently loaded module(s) 'cray-mpich/7.1.1'
cray-mpich/7.2.6(29):ERROR:102: Tcl command execution failed: conflict cray-mpich

I'm not sure what these refer to or why the conflicts arise.

Leighton

comment:10 Changed 4 years ago by ros

Hi Leighton,

If you look further down the .rcf.leave file you will see that it couldn't find the recon exec.

aprun: file /work/n02/n02/lre/um/xmcnn/bin/qxreconf not found

You need to compile the reconfiguration exec.

Cheers,
Ros.

comment:11 Changed 4 years ago by Leighton_Regayre

Hi Ros,

Thanks very much. That was the problem.

Leighton.

comment:12 Changed 4 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.