Opened 9 months ago

Closed 5 months ago

#2134 closed help (fixed)

Unrecognised flag failure on ARCHER during "fcm build" to generate RADAERv2 small executables for v7.3

Reported by: gmann Owned by: gmann
Priority: normal Component: UM Model
Keywords: ukca Cc: ee10hp, grenville
Platform: ARCHER UM Version: 7.3

Description

Dear NCAS-CMS helpdesk,

Yesterday, I have been working to generate the necessary
4 small executables (qxsetup, qxcombine, qxhistrep, qxpickup)
which are required to be "pre-built" (on the target machine
architecture, ARCHER) before v7.3 UM-UKCA simulations can run
with RADAERv2 translating the GLOMAP aerosol properties into
3D aerosol optical properties (Qscat, Qext, asymetry) to enable
their radiative effects to be enacted in the atmosphere model.

This required the execution (on PUMA) of 4 "build" scripts (one
for each of those required executables such as build_qxsetup.sh)
which first extract the required from the FCM branch on PUMA
(in this case v7.3_HG3r2_mergCJ_nprim_Radv2) to
$HOME/um/um_extracts/myqx*_build/ and then copy over to
ARCHER (to /work/n02/n02/<user>).

Once those PUMA extract-and-copy scripts have been executed, the
final step required to generate the 4 files (qxsetup, qxcombine
qxhistrep and qxpickup) is then to (on ARCHER) actually run an
"fcm build" command from within the myqxsetup_build directory
(for example) which should, as explained by Mohit Dalvi
from his instructions [based on what Nicolas Bellouin did],
then produce the requireed corresponding small executable in
the bin sub-directory (in this case the file is qxsetup).

The syntax advised by Mohit was as follows:

fcm build -j 4 > bld.out 2>&1

so that the file "bld.out" then contained the log file on
how the build progressed.

I did this last night and it seemed to begin to progress fine,
the "fcm build" being successfully recognised.

However, the command failed reporting that the "-P" flag on the
compilation was not recognised.

I expect this error is either a case of me needing to run
some initial command to enable some different functionality
of the compiler from the command line, or alternatively it
may be that the build will only work if it is submitted to
the serial nodes with the required headers and
environment-variables set as needed.

I have pasted below the output from the bld.out file.

Is it just a case of setting something differently in
my .profile for a module to be loaded or similar to
enable the required functionality of the compiler?

Or perhaps a different compiler operation is set by
default at the command line compared to running in
the serial queue with the settings as specified
within the UM scripts?

Thanks a lot for your help with this,

Regards
Graham Mann

Build command started on Tue Mar 28 23:37:39 2017.
→Parse configuration: start
Config file (bld): /fs2/n02/n02/gmann/myqxsetup_build/cfg/bld.cfg
→Parse configuration: 4 seconds
→Setup destination: start
Destination: gmann@eslogin006:/fs2/n02/n02/gmann/myqxsetup_build
→Setup destination: 0 second
→Setup build: start
→Setup build: 97 seconds
→Pre-process: start
No. of files scanned for PP dependency: 2889
ftn-2105 crayftn: ERROR in command line

"-P" is an invalid command-line option.

[FAIL] ftn -P -E -DC_LOW_U=c_low_u -DFRL8=frl8 -DLITTLE_END=little_end -DC_LONG_INT=c_long_int -DMPP=mpp -DLINUX=linux -DXT4=xt4 -DSETUP=setup -DUTILHIST=u
tilhist -DUTILS=utils -I/fs2/n02/n02/gmann/myqxsetup_build/inc rcf_parvars_mod.F90 failed (1) at /fs2/y07/y07/umshared/software/fcm-2016.12.0/bin/../lib/FC
M1/BuildSrc.pm line 751

Change History (9)

comment:1 Changed 8 months ago by grenville

Graham

The bld.cfg file refers to /work/n02/n02/hum/gcom/pathscale/gcom3.1/lib — so must be quite old (I'm guessing) — when did you last do this?

Grenville

comment:2 Changed 8 months ago by gmann

Hi Grenville

I just tried to reply to this message directly from Outlook
but it bounced with not recognising the noreply@ceda email address.

I was expecting a direct email reply to be translated into a Wiki-page
reply-post as this seems to work OK for the Redmine query-tracking
software system we use for GLOMAP queries in Leeds.

But that doesn't seem to work for the Trac system (or not when I tried it
anyway)

So I just forwarded that reply directly to you (see below).

And I'm posting it here with traceability considerations in mind.

Cheers
Graham


From: Graham Mann
Sent: 04 April 2017 12:24
To: 'NCAS Computational Modelling Services' <no-reply@…>
Cc: Dalvi, Mohit (mohit.dalvi@…) <mohit.dalvi@…>
Subject: RE: [NCAS Computational Modelling Services] #2134: Unrecognised flag failure on ARCHER during "fcm build" to generate RADAERv2 small executables for v7.3

Hi Grenville,

OK. Well the build scripts that I explained (in the NCAS-CMS helpdesk ticket) were put together by Mohit were done in 2012.

There are 4 of these,

1) build_qxsetup.sh
2) build_qxpickup.sh
3) build_qxhistrep.sh
4) build_qxcombine.sh

The originals of these (as used in Nov 2012) can be found on PUMA alongside the updated ones which have the extension "_ARCHER" in the directory "/home/gmann/test/"

I realise that was the HECToR phase 3 era, but I had thought the various paths were still
in the same place underneath /work/n02/n02

The only change I made to use them on ARCHER was to change the UM_RHOST environment variable

46c46,47
< export UM_RHOST=phase3.hector.ac.uk
—-

#export UM_RHOST=phase3.hector.ac.uk
export UM_RHOST=login.archer.ac.uk

Re: compilers I checked this and could see in the build_qxsetup.sh this seems to be taken care of via the "UM_MACHINE" environment variable:

export UM_MACHINE=hector-pathscale

See I checked in one of my current ARCHER UM v7.3 jobs and from what I could work out, my understanding is that I thought that even those we're now on ARCHER, the various compiler packaging etc. was still all set up via this same "hector-pathscale"
setting (from looking in the umui_jobs directory for one of my v7.3 jobs on PUMA

I basically did a grep for UM_MACHINE and found it was present in FCM_EXTR_SCR from which I made the deduction that that environment-variable was still set to "hector-pathscale" even though we're now working on ARCHER:

[12:18:07 gmann@puma xncwz]$ grep -i 'UM_MACHINE' * FCM_EXTR_SCR:export UM_MACHINE=hector-pathscale

That of course may have been a false deduction on my part….

Still all seemed to proceed to the last step where it did not recongnise that "-P"
flag when I executed the fcm_build command on ARCHER (after apparently successfully doing the build on PUMA to extract over to ARCHER).

Perhaps I have done something not quite right here or the compiler options need to be changed.

But my understanding was that the only issue was that when you run the fcm_build from the command line it (for some reason) didn't seem to understand what the "-P"
meant.

Please can you let me know if I was on the right track here — to me it felt like I'd got further that when I tried to do this back in Nov 2012 or so — at that time I don't think the fcm commands were being recognised when run on ARCHER so that last "fcm_build"
to get the RADAER files set up could not be completed.

And the fact that I got further this time suggested to me things might be different this time.

Anyway. I can provide more clarification if needed or maybe easiest to have a quick chat on the phone if any other details are needed.

Cheers
Graham

Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…

comment:3 Changed 8 months ago by grenville

Graham - I think all you need do is put it the compiler options appropriate for cce.

/home/n02/n02/grenvill/xjpgz/ummodel/cfg/bld.cfg has flags for a 7.3 job. It looks like you'll need to change the path to gcom too (as above). We have libgrib.a under /work/n02/n02/hum/lib/cce, but it's from 2013. I'd be surprised if it is needed, but I'm not familiar with your specific needs.

Grenville

Copy of related email for the record:

Hi Graham,
Cc Grenville

The build scripts were originally used for the IBM and worked out-of-the-box in our internal system up to vn7.7, so I am not aware of the FCM settings or any changes required to these. The executables were then directly usable on MONSooN, being an identical HPC system.
I believe the HecTOR build was actually done by NCAS-CMS at that time.

As mentioned, the purpose of re-building the executables was to allow changes in namelists read by these executables, to pass RADAER filenames to the model.
At UM9.x the filenames themselves were moved to the main model (RUN_UKCA) namelist to be read at runtime and I have recently replicated this change for a vn7.3 job ported to XSC-C (http://puma.nerc.ac.uk/trac/UM/log/UM/branches/dev/mdalvi/vn7.3_ukca_ph_test_xcsc), thus removing the need for re-building these.

—-
Mohit

comment:4 Changed 8 months ago by gmann

Hi Grenville,

Thanks a lot for this.

I followed your advice and edited the bld.cfg file at:

/work/n02/n02/gmann/myqxsetup_build/cfg/bld.cfg

to make all the compiler options (and the gcom path) match those as set in your:

/home/n02/n02/grenvill/xjpgz/ummodel/cfg/bld.cfg

I then proceeded to run the "fcm build" command again with those compiler
options and gcom path updated as required for the cce compiler.

That then seemed to be working as it no longer complained about not
recognising the "-P" flag.

So that part seems to have worked — great!

However, it's still not quite working as there is a gmake error
message which says "No rule to make target qxsetup', needed by all'. Stop."

I looked in the Makefile I can see it says "FCM_BLD_TARGETS = qxsetup" and
"all : $(FCM_BLD_TARGETS)" so the make all seems at least to be pointing to
qxsetup.

But for some reason it doesn't seem to know what to do when told to make
that "qxsetup".

Maybe there is something out of sequence or missing in the Makefile?

I don't have much experience with dealing with Makefile — but usually
you have something about objects and executables that go with the target.

Probably there is something that's not quite right here?

Perhaps the build script is not quite right to make this small executables?

Please can you take a look and see if you can see what the issue is?

Full error message is shown below.

Thanks a lot,

Cheers
Graham

gmann@eslogin001:/work/n02/n02/gmann/myqxsetup_build> fcm build                                               
Build command started on Fri Apr  7 09:58:11 2017.                                                            
->Parse configuration: start                                                                                  
Config file (bld): /fs2/n02/n02/gmann/myqxsetup_build/cfg/bld.cfg                                             
->Parse configuration: 34 seconds                                                                             
->Setup destination: start                                                                                    
Destination: gmann@eslogin001:/fs2/n02/n02/gmann/myqxsetup_build
->Setup destination: 0 second
->Setup build: start
->Setup build: 2 seconds
->Pre-process: start
No. of pre-processed files: 2232
->Pre-process: 185 seconds
->Scan dependency: start
No. of files scanned for dependency: 2931
/fs2/n02/n02/gmann/myqxsetup_build/Makefile: updated
->Scan dependency: 98 seconds
->Generate Fortran interface: start
->Generate Fortran interface: 1 second
->Make: start
gmake: *** No rule to make target `qxsetup', needed by `all'.  Stop.
gmake -f /fs2/n02/n02/gmann/myqxsetup_build/Makefile -j 1 -s all failed (2) at /fs2/y07/y07/umshared/software/fcm-2016.12.0/bin/../lib/FCM1/Build.pm line 611
->Make: 0 second
->TOTAL: 320 seconds
Build failed on Fri Apr  7 10:03:31 2017.

comment:5 Changed 8 months ago by gmann

I looked up on the web the exact meaning of this error message,
and see that the website:

https://www.gnu.org/software/make/manual/html_node/Error-Messages.html

explains that this "No rule to make target xxx, needed by yyy" means:

This means that make decided it needed to build a target, but then 
couldn’t find any instructions in the makefile on how to do that, 
either explicit or implicit (including in the default rules database). 

If you want that file to be built, you will need to add a rule to 
your makefile describing how that target can be built. Other 
possible sources of this problem are typos in the makefile (if 
that file name is wrong) or a corrupted source tree (if that file 
is not supposed to be built, but rather only a prerequisite). 

So it looks like there should be some "rules" for how to make the target
executable "qxsetup" in the Makefile that somehow have not appeared?

Do you know what it could be that could be missing?

I will also check Mohit's extract script that is run from PUMA

Last edited 8 months ago by gmann (previous) (diff)

comment:6 Changed 8 months ago by gmann

Checking Mohit's script "build_qxsetup_ARCHER.sh" that I ran on PUMA, I see
this has the section:

# Setup where to find the container template file in FCM

export UM_SVN_URL=svn://puma/UM_svn/UM/trunk
export UM_SVN_BIND=svn://puma/UM_svn/UM/branches/dev/um/VN7.3_machine_cfg/src/configs/bindings
export UM_CNTR=$UM_SVN_BIND/container.cfg

I'm wondering if the "rules" for these UM small executables (qxsetup, qxcombine etc.)
might be set there in that VN7.3_machine_cfg

And I notice there is a capital "VN" there.

Not sure whether that is a problem or not but I know that at
some point there was that migration from branches having
capital "V" and capital "N" to lower case "v" and lower case "n".

Probably that's not the issue but it might be something like this
I'm thinking — see on PUMA the 4 build scripts are at:

/home/gmann/test/build_qxsetup_ARCHER.sh
/home/gmann/test/build_qxpickup_ARCHER.sh
/home/gmann/test/build_qxcombine_ARCHER.sh
/home/gmann/test/build_qxhistrep_ARCHER.sh

and the 4 generated .cfg files are at:

/home/gmann/test/my_qxsetup.cfg
/home/gmann/test/my_qxpickup.cfg
/home/gmann/test/my_qxcombine.cfg
/home/gmann/test/my_qxhistrep.cfg

comment:7 Changed 8 months ago by gmann

Email-exchanges with Mohit and Grenville helped solve
the problem.

Whether or not the problem would have been solved without
the involvement of both MO and NCAS-CMS staff is an
interesting question, as is whether or not email or Wiki
type communication is more effective at identifying or
encouraging interaction or co-operation.

Anyway — see below for what eventually identified the
source of the problem.

Cheers
Graham


From: Graham Mann G.W.Mann@…
Sent: 10 April 2017 14:45
To: 'Grenville Lister'
Cc: Dalvi, Mohit
Subject: RE: RADAERv2 small executables for v7.3

Hi Grenville,

Looking again at the error message I pasted into "comment 4" on the NCAS-CMS helpdesk thread:

http://cms.ncas.ac.uk/ticket/2134#__msie303:comment:4

I see that the actual error message says:

gmake: * No rule to make target qxsetup', needed by all'. Stop.

From what I typed this morning, I think I have now learned what that means.

I'm thinking it likely that the fact that I had not added in the "SETUP=setup" to the fppkeys may well have been the main reason the small-executable-build failed.

It's seems highly likely to me that error message is entirely consistent with the absence of that SETUP=setup compiler flag/qualifier.

Anyway, I will try following exactly what you did as that seems to have worked.

Thanks again for your help,

Cheers
Graham

Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…


From: Graham Mann
Sent: 10 April 2017 11:26
To: 'Grenville Lister' <grenville.lister@…>
Cc: 'Dalvi, Mohit (mohit.dalvi@…)' <mohit.dalvi@…>
Subject: RE: RADAERv2 small executables for v7.3

Oh there was an 8th one as well which was that in the ldflags I noticed that Grenville your build had "-l_grib" which mine didn’t.

Thanks again

Cheers
Graham


From: Graham Mann
Sent: 10 April 2017 11:24
To: 'Grenville Lister' <grenville.lister@…>
Cc: Dalvi, Mohit (mohit.dalvi@…) <mohit.dalvi@…>
Subject: RE: RADAERv2 small executables for v7.3

Hi Grenville,

Thanks a lot for this.

I just went through your updated version of the bld.cfg file and compared it against the one I had tried to update to match the one you pointed me to from the xjpgz job.

I could see that my attempt was not too far off, with most of the changes I had implemented also being implemented in your revised version of the bld.cfg

But I can see that there were still a few differences, one of which (I guess) will explain why my attempt at producing the qxsetup executable (by running "fcm_build") did not work.

I thought it worth jotting these down here for reference in a list.

Anyway I'm assuming that it is one of these that was causing the issue, and I expect when I proceed to make those adjustments to the bld.cfg that particular build command will be able to do what I'm hoping it can do effectively and produce the required executable.

I'll try that a bit later and will let you know how it goes (I have a meeting at 11.30).

Many thanks for your help with this over the weekend — looks like this may well be sorted now.

Cheers
Graham

1) fppkeys — I could see that you had added "SETUP=setup" which looks like it is pretty

important to include for the required functionality in the qxsetup executable.
I had not included that extra flag.

2) fppkeys — I had matched exactly the settings from the xjpgz bld.cfg with the following

keys (or keywords?) also specified to be set:

a) CONTROL=control
b) REPROD=reprod
c) ATMOS=atmos
d) GLOBAL=global
e) BREXIT=brexit (sorry that's just my joke — not really)
f) A04_ALL=a04_all
g) A01_3A=a01_3a (and loads of others like that).

3) fppflags — I'd add in the "-traditional -cpp" option but that wasn't present in yours

4) ldflags — I'd implemented the -lgcom_buffered_mpi from the xjpgz job whereas

in your's you had specied it as -lgcom_serial

5) ldflags — I'd put the path for gcom as -L ${UMDIR}/lib/cce whereas you had put

the path explicitly (no env variable) as -L /work/n02/n02/hum/lib/cce

This one is probably not important as the UMDIR should point to that same place I guess.

6) cc compiler — I'd changed this from "cc" to "gcc" as was the case in xjpgz but in your

bld.cfg you had not changed that.

7) I'd put in some Tab symbols in mine which may have caused an issue (or perhaps not).


From: Grenville Lister grenville.lister@…
Sent: 09 April 2017 10:47
To: Graham Mann <G.W.Mann@…>
Cc: Dalvi, Mohit (mohit.dalvi@…) <mohit.dalvi@…>
Subject: Re: RADAERv2 small executables for v7.3

Graham

I don't confess to really follow all this, but I created a qxsetup with the configuration in

/work/n02/n02/grenvill/myqxsetup_build

which was generated by your script build_qxsetup_ARCHER.sh.

I changed a couple of things in
/work/n02/n02/grenvill/myqxsetup_build/cfg/bld.cfg

a few paths… and

tool:: fpp cpp
tool::fppflags -C -P
tool::fflags -e m -h noomp -s real64 -s integer64
-hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc

tool::ldflags -L. -L
/work/n02/n02/hum/gcom/cce/gcom3.8/lib -Wl,—warn-unresolved-symbols -Wl,-z,muldefs -s real64 -s integer64 -lgcom_serial -L /work/n02/n02/hum/lib/cce -lgrib

and commented out a bunch of exisitng tool::fflags.

There were a lot of preprocessor warnings, which I didn't investigate.

Best

Grenville

On 04/07/17 15:05, Graham Mann wrote:

Hi Grenville,

Further to my NCAS-CMS helpdesk query, I also asked Mohit about this
as he wrote those scripts.

He has spotted that (as I had suspected) the Makefile generated by the
build script was missing an important set of object file linking commands.

I will try to enact the way forward suggested by Mohit (to the best of
my
ability) but the relationship between the qx* executables and the
required object files is not immediately apparent to me.

I will do a diff between the Makefile from that
hum/vn7.3/cce/exec_build/ directory that Mohit is pointing me to and
see if I can figure out what the top-level paths are that he is explaining need to be changed.

If I'm understanding correctly (Mohit please reply if I'm getting this
wrong at all) the actual script that does the extract is missing the .f90 files required by the "CPP step".

If you have any pointers as to why this might be (FCM path wrong or out of date?).

I'm thinking it is almost certainly some issue in the FCM branches
specified in the 4 build scripts on PUMA:

/home/gmann/test/build_qxsetup_ARCHER.sh
/home/gmann/test/build_qxpickup_ARCHER.sh
/home/gmann/test/build_qxcombine_ARCHER.sh
/home/gmann/test/build_qxhistrep_ARCHER.sh

I will look into this but if you spot anything, please do let me know.

Thanks a lot,

Cheers
Graham

Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…


From: Dalvi, Mohit mohit.dalvi@…
Sent: 07 April 2017 13:52
To: Graham Mann <G.W.Mann@…>
Subject: RE: [NCAS Computational Modelling Services] #2134:
Unrecognised flag failure on ARCHER during "fcm build" to generate
RADAERv2 small executables for v7.3

Hi Graham,

As per make/gmake rules, the Makefile would need an additional line saying:

qxsetup : file1.o file2.o incfile.h, etc..

which tells it which files are to be compiled to form part of the qxsetup executable (See /work/n02/n02/hum/vn7.3/cce/exec_build/qxsetup/Makefile). I am not sure why that line is missing in your Makefile/ build-setup.
Also, the myqxsetup_build/ppsrc/UM/utility/qxsetup/*.f90 files are empty, which means the CPP step has not taken place at all.

You could try to copy the above hum/vn7.3/cce/exec_build/qxsetup/Makefile over, change the top-level paths and just run 'gmake' in the folder.

Apart from this, I am not familiar enough with the FCM build process to understand what is going on. This is why I decided to back-port the RADAER namelist changes for Steve, rather than fiddle with the small executable building on Monsoon2.


Mohit


From: Graham Mann G.W.Mann@…
Sent: 07 April 2017 10:57
To: Dalvi, Mohit
Subject: FW: [NCAS Computational Modelling Services] #2134:
Unrecognised flag failure on ARCHER during "fcm build" to generate
RADAERv2 small executables for v7.3

Hi Mohit,

I know you advised to try implementing the "new way" of linking to the RADAER pcalc and LUTs but I still felt it important to try to get this initial approach to use the built small executables to work first on ARCHER.

In my original message I came across a problem with the "fcm build"
command on ARCHER not recognising the flag "-P"

Grenville explained in a reply yesterday that he thought I just needed to change the compiler options and gcom path in the bld.cfg file for each of the 4 required executables and it should work OK then (see his message below).

I did that by editing the bld.cfg and matching the flags for the cce compiler as in Grenville's job xjpgz, as at:

/home/n02/n02/grenvill/xjpgz/ummodel/cfg/bld.cfg

When I re-ran the "fcm build" from inside the myqxsetup_build directory on ARCHER it seemed to be progressing OK but eventually fell over with a different error.
It's giving a gmake error "No rule to make target 'qxsetup', needed by 'all'. Stop."

I'm not sure whether this is because I edited the bld.cfg file directly or if there is something missing within the original "build_qxsetup.sh" that I ran on PUMA, see:

/home/gmann/test/build_qxsetup_ARCHER.sh

Maybe something is not quite right in the "my_qxsetup.cfg" that is generated first on PUMA:

/home/gmann/test/my_qxsetup.cfg

Please can you have a quick look at this and let me know if you can see anything.

Also please read what I have written in the NCAS-CMS helpdesk ticket which includes the full log from the fcm build on ARCHER:

http://cms.ncas.ac.uk/ticket/2134#__msie303:comment:6

Thanks a lot for your help with this.

Cheers
Graham

Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…

comment:8 Changed 8 months ago by gmann


From: Graham Mann
Sent: 13 April 2017 12:10
To: 'Dalvi, Mohit'
Cc: 'Nicolas Bellouin (n.bellouin@…)'; 'Grenville Lister'
Subject: RE: Read error when reading pcalc file on ARCHER (but good news is RADAER v2 small executables for v7.3 now working)

Just to confirm, changing that format statement as in the previous post did indeed solve that particular problem and the xncwm and xncwk both now successfully progress past that previous read-fail and are apparently reading in the RADAER pcalc and LUTs successfully.

The jobs then crashed with missing STASH (I'd not included a couple of requests in the RADAER hand-edit [actually for the GLOMAP-dust-extension rather than the GLOMAP-nitrate-extension] but those are added now and at least we can now "close" this issue with the small executables and the reading of the pcalc file.

I will update the NCAS-CMS ticket with this email-train.

Cheers
Graham

Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…


From: Graham Mann
Sent: 13 April 2017 09:28
To: 'Dalvi, Mohit'
Cc: 'Nicolas Bellouin (n.bellouin@…)'; 'Grenville Lister'
Subject: RE: Read error when reading pcalc file on ARCHER (but good news is RADAER v2 small executables for v7.3 now working)

Hi Mohit,

OK so I tried using (my job xncwk) the _new versions of the RADAER LUTs in the directory you suggested:

/work/n02/n02/ukca/spectral/radv2

but unfortunately the job still crashed with exactly the same error as in my original job xncwm which used the standard versions in that same directory.

As I'd written in my email, I was recalling that there may have been some change to the format statement that had to be made (something to do with the difference in the compiler on the two machines) — I just had a look back through my emails and found that back in December 2012 I'd raised an NCAS CMS helpdesk query about this

http://cms.ncas.ac.uk/ticket/991

You have to scroll down to see this pcalc issue — see comment 11 within this ticket.
I've pasted that in full below for info but basically I explained that to get the model to run on ARCHER I had to change the format statement '(37x,3(i,1x))' to instead say '(37x,1i3,1x,1i3,1x,i)' —- I had thought this was actually the compiler not liking the 3 around the i,1x in brackets with it needing it to be set out there in full.

But actually, I see there now that in fact there it's not just a difference in the interpretation of that statement — it's more than that — the format has actually Been changed to give more space (3-characters) for each integer whereas in the other statement it just says "I" — maybe the default is for 3 characters on one machine and not on the other — that's probably the distinction.

Anyway — I think this means then that I will need to make an actual change to the branch to implement this.

I will also refer to the branch that I implemented this into when I was originally trying to get this model ported to HECToR back in December 2012.

I'll keep you posted with how this goes.

Cheers
Graham


Hi Willie,

To update on the above.

I fixed the issue with the read statement within RADAER — there's a line of code reading from that pcalc file that seems to fail on HECToR whereas it completed OK on MONSOON. The failure is when trying to read a line on the RADAER code that specifies the resolution of the UKCA look-up tables for the Mie calculations like:

UKCA accum. aerosol LUT dimensions: 51, 51, 51

which are in the file:

/work/n02/n02/gmann/RADAERv2/pcalc_hadgem_v2.ukca

There is a formatted read statement in "ukca_radaer_read_precalc.F90" to do this as below with the formatting set as '(37x,3(i,1x))'.

Basically, that format statement seems to read OK on MONSOON but not on HECToR.

I changed that format statement in that line in that routine in my branch on PUMA to be more explicit — instead set to be '(37x,1i3,1x,1i3,1x,i)'.

That then reads in the info correctly on MONSOON and the job proceeds past there…..

However, unfortunately the job then crashes with a segmentation fault in Atm_Step (before even reaching UKCA (i.e. before reaching the call to UKCA_MAIN).

Willie — do you have any idea what could be causing the crash here — I guess it could be memory issue. This job is only running on 32 processors so I guess each core will be operating on quite a large sub-domain so the memory footprint could be higher.

Thanks for any help you can give here,

Cheers

Graham


Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…


From: Graham Mann
Sent: 12 April 2017 12:10
To: Dalvi, Mohit
Cc: Nicolas Bellouin (n.bellouin@…); Grenville Lister
Subject: RE: Read error when reading pcalc file on ARCHER (but good news is RADAER v2 small executables for v7.3 now working)

Unfortunately ARCHER is down for maintenance this morning so I will do this tomorrow (or possibly tonight).

But sounds like this is basically sorted now (fingers crossed).

Thanks all for your help with this.

Cheers
Graham


From: Dalvi, Mohit mohit.dalvi@…
Sent: 11 April 2017 10:13
To: Graham Mann <G.W.Mann@…>
Subject: RE: Read error when reading pcalc file on ARCHER (but good news is RADAER v2 small executables for v7.3 now working)

In fact, they seem to have been subsequently moved from Karthee's area to "/work/n02/n02/ukca/spectral/radv2".

—-
M


From: Graham Mann
Sent: 12 April 2017 07:48
To: Dalvi, Mohit <mohit.dalvi@…>
Cc: Nicolas Bellouin (n.bellouin@…) <n.bellouin@…>; Grenville Lister <grenville.lister@…>
Subject: Re: Read error when reading pcalc file on ARCHER (but good news is RADAER v2 small executables for v7.3 now working)

Hi Mohit
Great — thanks a lot.
I was out of the office all day yesterday but will try that this morning.
Cheers
Graham

On 11 Apr 2017, at 10:14, Dalvi, Mohit <mohit.dalvi@…> wrote:

Hi Graham,

Yes, this might be due to formatting in the text files that the Cray compiler does not like.
If I remember correctly, someone called Karthee had created copies of RADAER files to match the format, so see if there is still an area /work/n02/n02/karthee and the files still available. You can diff the default and fixed versions to make sure the difference is only in formatting and not actual values.

Cheers
—-
Mohit


From: Graham Mann G.W.Mann@…
Sent: 11 April 2017 08:36
To: Dalvi, Mohit; Nicolas Bellouin (n.bellouin@…)
Cc: Grenville Lister
Subject: Read error when reading pcalc file on ARCHER (but good news
is RADAER v2 small executables for v7.3 now working)

Hi Mohit,
Cc: Nicolas Grenville

Nicolas — I've cc'd you in on this one — I've been making progress
to get the
v7.3 nitrate-extended UM-UKCA job (with RADAERv2) working on ARCHER.
The job is basically the same as the one Steve Turnock used for his 1960-2010 hindcast except that it now has the extension of GLOMAP to simulated nitrate and ammonium aerosol changes — so will have an improved representation of the regional aerosol radiative forcings evolution over that time-period.

Nicolas — I am now encountering a read-error when reading the pcalc and I think this is some subtle issue about the FORTRAN format statement or the way the compiler interprets that differently on ARCHER than MONSOON.

Mohit — thanks for spotting that yesterday — that was indeed the reason why the "fcm build" was not able to make the target qxpickup/combine/histreport.

I had made the change to the line for the cppkeys (changing SETUP=setup to PICK=pick) but I had not spotted that there was also a 2nd line that needed updating to similarly change the option for the fppkeys.

Once I did that the "fcm build" *did* then successfully build the required executables.

So last night I proceeded to copy these ARCHER-built executables into the directory that the script-insert then copies-in to the user's run directory when the job is run.

I had previously been running job xncwm which was failing when it tried to run the small executables that I'd copied over from MONSOON (incorrect architecture).

Now that I've got the right small executables in that directory (that were built on the right architecture) then this should proceed past this point.

Last night I re-submitted that xncwm job and sure enough it then proceeded past that point and is now reaching the point where it reads in the pcalc files for RADAER.

So this is good news — this is progress and it is now proceeding to get to the point where it needs to read the information from the pcalc and LUTs for RADAER.

However, the bad news is that job is now failing when it is trying to read the RADAER pcalc file — the error it gives is:

lib-4190 : UNRECOVERABLE library error A numeric input field contains
an invalid character.
Encountered during a sequential formatted READ from unit 162 Fortran unit 162 is connected to a sequential formatted text file:

"/work/n02/n02/gmann/ANCILS/RADAER/LUTs/pcalc_hadgem_v2.ukca"

Current format: (37x,3(i,1x))


There is also a warning that seems to be be complaining there is no data for aerosols:

* warning: the sw spectrum contains no data for aerosols.
* warning: the lw spectrum contains no data for aerosols.

I think the fact that there is continuity in my NCAS position means that I have the sufficient experience to recognise and remember that this error is due to a subtle distinction between the way FORTRAN formatted read statements are interpreted by the Cray compiler on ARCHER compared to that on MONSOON.

My recollection is that we had to change the pcalc file removing one column of spaces (or maybe it was to change the format statement syntax I can't quite remember).

Mohit, Nicolas —- can you remember this issue — I will look back at my emails later today and try to find the issue but I remember having to create a different version of the pcalc when running the model on ARCHER or MONSOON.

Does this ring a bell with you Mohit, Nicolas?

It's something to do with this format statement (37x,3(i,1x)) I think?

Thanks a lot for your help,

Cheers
Graham

Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…


From: Graham Mann
Sent: 10 April 2017 17:36
To: Dalvi, Mohit
Cc: Grenville Lister
Subject: Re: Works for qxsetup but not yet for qxpickup, qxcombine and
qxhistrep (RADAER v2 small executables for v7.3)

Hi Mohit
Ok great — thanks a lot.
I will try that and hopefully that will then solve the issue.
I'll let you know how it goes.
Many thanks for your help.
Cheers
Graham

On 10 Apr 2017, at 17:20, Dalvi, Mohit <mohit.dalvi@…> wrote:

Hi Graham,

In your myqxpickup(combine)_build/cfg/bld.cfg the "tool::fppkeys" lines still contains SETUP=setup. The PICK, COMB keywords should probably be added at this line?

This setting in the Fortran PreProcessor? (fpp) command is related to the "if defined" line at the top of the main program of the corresponding executables (http://puma.nerc.ac.uk/trac/UM/browser/UM/trunk/src/utility/qxpickup/pickup.F90?rev=1678).
The code will be read/ extracted only if the fpp command options match the 'if defined' statement.

Cheers
—-
Mohit


From: Graham Mann G.W.Mann@…
Sent: 10 April 2017 17:04
To: 'Grenville Lister'
Cc: Dalvi, Mohit
Subject: Works for qxsetup but not yet for qxpickup, qxcombine and
qxhistrep (RADAER v2 small executables for v7.3)

Hi Grenville,
Cc: Mohit,

OK well I managed to successfully create the qxsetup executable by matching the bld.cfg that you showed worked (which had the SETUP=setup flag that was not in the one I was trying).

However, whereas I had thought I could just proceed by following the same procedure again and make the equivalent change to the bld.cfg for the qxpickup, qxcombine and qxhistrep (matching the differences between the bld.cfg files as they get extracted when you run the build script on PUMA) that did not seem to work (for me at least).

It is giving the same type of error I was getting before (doesn't
seem to be picking up the PICK=pick for qxpickup, the COMB=comb for
qxcombine and HPRT=hprt for qxhistrep

Gives error (for qxpickup generation) as "no rule to make target
"qxpickup" — see log below

Grenville — please can you try making up these other 3 executables
(I have tried but seem to have
failed) —- there's just the other 3 needed which are qxpickup, qxcombine and qxhistrep.

I tried to do this carefully (even trying a clean extract from PUMA again) but clearly something has not quite worked — for some reason it just doesn't know what to do to build the other 3 executables.

The log from the failing build for qxpickup is shown below and it looks like it must be something simple with the compiler flags not recognising PICK=pick for some reason.

Please can you have a go at this (in the same way as you did for qxsetup) and see if you get the same error as me (maybe something not quite set up right or I missed something important?).

Thanks a lot for your help,

Cheers
Graham

As well as making the above change to the PICK/COMB/HPRT options,
I've also made the changes so that "exe_name" is set to pickup rather
than setup and "target" to pickup rather than setup

With the option for PICK=pick added in instead of SETUP=setup I thought this was going to work.

See the "fcm build" command worked fine for qxsetup (as it had for you) like this:

gmann@eslogin004:/work/n02/n02/gmann/myqxsetup_build> fcm build Build command started on Mon Apr 10 16:10:07 2017.
→Parse configuration: start
Config file (bld): /fs2/n02/n02/gmann/myqxsetup_build/cfg/bld.cfg
→Parse configuration: 8 seconds
→Setup destination: start
Destination: gmann@eslogin004:/fs2/n02/n02/gmann/myqxsetup_build
→Setup destination: 0 second
→Setup build: start
→Setup build: 245 seconds
→Pre-process: start
No. of files scanned for PP dependency: 2889 No. of pre-processed
files: 2232
→Pre-process: 299 seconds
→Scan dependency: start
No. of files scanned for dependency: 2931
/fs2/n02/n02/gmann/myqxsetup_build/Makefile: updated
→Scan dependency: 29 seconds
→Generate Fortran interface: start
→Generate Fortran interface: 1 second
→Make: start
ftn -o setup.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -e m -h noomp
-s real64 -s integer64 -hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/utility/qxsetup/setup.f90
gcc -o pio_data_conv.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -O3
-c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/control/c_code/pio_data_c
o nv.c gcc -o pio_io_timer.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc
-O3 -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/control/c_code/pio_io_tim
e r.c gcc -o portio2a.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -O3
-c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/control/c_code/portio2a.c
ftn -o ereport.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -e m -h
noomp -s real64 -s integer64 -hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/control/misc/ereport.f90
ftn -o um_fort_flush.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -e m
-h noomp -s real64 -s integer64 -hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/control/misc/um_fort_flus
h
.f90 ftn -o initchst.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -e m
-h noomp -s real64 -s integer64 -hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/control/top_level/initchs
t
.f90 ftn -o readmhis.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -e m
-h noomp -s real64 -s integer64 -hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/utility/qxsetup/readmhis.
f
90 ftn -o temphist.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -e m -h
noomp -s real64 -s integer64 -hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/control/top_level/temphis
t
.f90 ftn -o writftxx.o -I/fs2/n02/n02/gmann/myqxsetup_build/inc -e m
-h noomp -s real64 -s integer64 -hflex_mp=intolerant -I
/work/n02/n02/hum/gcom/cce/gcom3.8/archer_cce_mpp/inc -c
/fs2/n02/n02/gmann/myqxsetup_build/ppsrc/UM/utility/qxcombine/writftx
x
.f90
ar: creating
/fs2/n02/n02/gmann/myqxsetup_build/tmp/libfcmqxsetup.a
ftn -o qxsetup /fs2/n02/n02/gmann/myqxsetup_build/obj/setup.o
-L/fs2/n02/n02/gmann/myqxsetup_build/lib -lfcmqxsetup -L. -L
/work/n02/n02/hum/gcom/cce/gcom3.8/lib -Wl,—warn-unresolved-symbols
-Wl,-z,muldefs -s real64 -s integer64 -lgcom_serial -L
/work/n02/n02/hum/lib/cce -lgrib
→Make: 28 seconds
→TOTAL: 610 seconds
Build command finished on Mon Apr 10 16:20:17 2017.

But when I ran the "fcm build" with the bld.cfg updated to match the one used for qxsetup (but the PICK=pick added to the fppkeys and the exe_name and target changed to "pickup"
rather than "setup" I got the "no rule to make target "qxpickup" as:

gmann@eslogin004:/work/n02/n02/gmann/myqxpickup_build> fcm build Build command started on Mon Apr 10 16:26:54 2017.
→Parse configuration: start
Config file (bld): /fs2/n02/n02/gmann/myqxpickup_build/cfg/bld.cfg
→Parse configuration: 9 seconds
→Setup destination: start
Destination: gmann@eslogin004:/fs2/n02/n02/gmann/myqxpickup_build
→Setup destination: 0 second
→Setup build: start
→Setup build: 107 seconds
→Pre-process: start
No. of files scanned for PP dependency: 2889 No. of pre-processed
files: 2232
→Pre-process: 365 seconds
→Scan dependency: start
No. of files scanned for dependency: 2931
/fs2/n02/n02/gmann/myqxpickup_build/Makefile: updated
→Scan dependency: 31 seconds
→Generate Fortran interface: start
→Generate Fortran interface: 1 second
→Make: start
gmake: * No rule to make target qxpickup', needed by all'. Stop.
gmake -f /fs2/n02/n02/gmann/myqxpickup_build/Makefile -j 1 -s all
failed (2) at
/fs2/y07/y07/umshared/software/fcm-2016.12.0/bin/../lib/FCM1/Build.pm
line 611
→Make: 0 second
→TOTAL: 513 seconds
Build failed on Mon Apr 10 16:35:27 2017.


From: Graham Mann
Sent: 10 April 2017 14:45
To: 'Grenville Lister' <grenville.lister@…>
Cc: 'Dalvi, Mohit (mohit.dalvi@…)'
<mohit.dalvi@…>
Subject: RE: RADAERv2 small executables for v7.3

Hi Grenville,

Looking again at the error message I pasted into "comment 4" on the NCAS-CMS helpdesk thread:

http://cms.ncas.ac.uk/ticket/2134#__msie303:comment:4

I see that the actual error message says:

gmake: * No rule to make target qxsetup', needed by all'. Stop.

From what I typed this morning, I think I have now learned what that means.

I'm thinking it likely that the fact that I had not added in the "SETUP=setup" to the fppkeys may well have been the main reason the small-executable-build failed.

It's seems highly likely to me that error message is entirely consistent with the absence of that SETUP=setup compiler flag/qualifier.

Anyway, I will try following exactly what you did as that seems to have worked.

Thanks again for your help,

Cheers
Graham

Dr. Graham Mann, NCAS Senior Research Scientist
Institute for Climate & Atmospheric Science T: +44 0113 3431660
Room 10.108, School of Earth & Environment F: +44 0113 3435259
University of Leeds, Leeds, LS2 9JT, U.K. E: G.W.Mann@…

comment:9 Changed 5 months ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.