Opened 5 weeks ago

Last modified 4 weeks ago

#2967 accepted help

missing fluxnet sites

Reported by: awright Owned by: pmcguire
Component: JULES Keywords: missing fluxnet sites, Rose/Cylc
Cc: Platform: JASMIN
UM Version:

Description

Hello Patrick,

I run my suite (roses/BC_all_JulesStandard_NOTRIFFID_NEW) for all fluxnet sites, and I dont get any errors in the run. I look at my output files (/work/scratch/azin/fluxnet/run11a/jules_output/BC_all_JulesStandard_NOTRIFFID_NEW) and some of the fluxnet sites are missing from the output such as : AT-Neu, CN-Dan, GF-Guy, US-Los, US-Wkg. I look into (/home/users/azin/cylc-run/u-bh383-JulesStandard?-NOTRIFFID/log/job/1) and there are no files corresponding to the missing sites. I ran a similar suite last night too and had some missing sites, but they were different from the missing sites I get now. These are the suites I had run a few months ago successfully. Would you know what the problem can be?
All the best,

Azin

Attachments (2)

Screen Shot 2019-07-26 at 13.37.32.png (257.0 KB) - added by awright 4 weeks ago.
error message
Screen Shot 2019-07-26 at 13.37.52.png (28.3 KB) - added by awright 4 weeks ago.
address

Download all attachments as: .zip

Change History (10)

comment:1 Changed 4 weeks ago by pmcguire

  • Keywords sites, Rose/Cylc added; sites removed

comment:2 Changed 4 weeks ago by pmcguire

  • Status changed from new to accepted

comment:3 Changed 4 weeks ago by pmcguire

Hi Azin
Do you see those missing sites listed in the JULES section of the Cylc GUI when you run it?
Do they turn red after all the other sites have turned green when they finished running?
If they are there and if they turn red, what do you see in the error logs when you right click on the failed site?

I note that in your file:
~azin/cylc-run/u-bh383-JulesStandard-NOTRIFFID/log/suite/log.20190725T112150+01
there are some 'task not killable' failure messages listed for the missing sites.
But for the old suite:
~azin/cylc-run/u-bh383-JulesStandard-NOTRIFFID/log/suite/log.20190510T171146+01
there are no such failure messages.

Does this help?
Patrick

Changed 4 weeks ago by awright

error message

Changed 4 weeks ago by awright

address

comment:4 Changed 4 weeks ago by awright

Hello Patrick,

Thank you for your answer but I still don't know why some sites fail? Was it because I tried to kill the job and it has resulted in an error ? All the sites used to run a few months ago.

None of the sites run in case of 'u-bh383-14layer', or '14layer' suites: I try to run some jules runs (I had run them a few months ago successfully) but the run fails now. I looked into cylc-run/u-bh383-14layer/log/job/1/jules_at_new_presc0/01 and the error (attached) mentioned that I needed to link my code to a compatible netCDF C library in /apps/libs/netCDF/intel14/4.3.2/
I tried to run the suite again, and this time nothing has been created (I only have an empty directory here: /home/users/azin/cylc-run/u-bh383-14layer/log.20190726T124150Z/suite). JULES compiles successfully but does not run at all.
All the best,

Azin

comment:5 Changed 4 weeks ago by pmcguire

Hi Azin
About your second problem, I ran a copy of your suite. The copy is in:
~pmcguire/roses/u-bh383-14layer
This suite also fails, with this message
[FATAL ERROR] init_soil: Error reading namelist JULES_SOIL (IOSTAT=17 IOMSG=syntax error in NAMELIST input, unit 1, file /work/scratch/pmcguire/cylc-run/u-bh383-14layer/work/1/jules_cn_sw2_presc0/./jules_soil.nml

Your jules_soil namelist is not set up right. I don't know how the JULES interprets inline comments with the # symbols in the middle of the line.

/work/scratch/pmcguire/cylc-run/u-bh383-14layer/work/1/jules_cn_sw2_presc0/./jules_soil.nml

Patrick

comment:6 Changed 4 weeks ago by awright

Hello Patrick,

Thank you, I will have a look at the # symbols.
How can I run all fluxnet sites in one go, like I used to? How can I stop some sites from failing, e.g. regarding BC_all_JulesStandard_NOTRIFFID_NEW?

Many thanks,

Azin

comment:7 Changed 4 weeks ago by pmcguire

Hi Azin
I got rid of your # comments in the jules_soil section, and the JULES runs are running now. The new copy is in:
~pmcguire/roses/u-bh383-14layer2.

I ran it from the jasmin_cylc virtual machine.

According to the Cylc GUI window, several of the JULES tasks have already finished, some are still running, and a few are still waiting to start running. It took a while for some of the JULES jobs to go from the queued state to running state. But the US_Los log files are there, for example.

Feel free to look at my cylc-run directory for this suite ~pmcguire/cylc-run/u-bh383-14layer2, and after the suite finishes, maybe there will be more info about the US_Los JULES task, for example.

I can't reproduce the error you were having with the other suite with US_Los, for example. I would suggest that if you have a failure in the future, that you just try again. Or at least study, the Cylc GUI Window, like I suggested, to see why any of the sites failed.

When I do a diff -r ~azin/roses/u-bh383-JulesStandard-NOTRIFFID ~pmcguire/roses/u-bh383-14layer2, I don't see any obvious reasons for why the u-bh383-JulesStandard-NOTRIFFID suite failed for some of the sites, when the new 14 layer suite doesn't fail. Keep trying! It could just be some machine failure on that particular day.
Patrick

comment:8 Changed 4 weeks ago by awright

Hello Patrick,
Thank you very much for your help!
All the best,

Azin

Note: See TracTickets for help on using tickets.