Opened 9 months ago

Last modified 11 days ago

#2527 accepted task

audit/profiling of JULES on CEDA JASMIN

Reported by: pmcguire Owned by: pmcguire
Priority: normal Component: JULES
Keywords: JULES, audit, profiling, CEDA JASMIN Cc:
Platform: Other UM Version:

Description

Can you work with the POP service people to do an audit and profiling of JULES on CEDA JASMIN?

Change History (9)

comment:1 Changed 9 months ago by pmcguire

The POP service person has been working on this. He can run JULES on CEDA JASMIN with their profiling tools. Last I heard from him was in March. I will contact the CEDA JASMIN people to see if they can investigate the delay.

comment:2 Changed 9 months ago by pmcguire

  • Status changed from new to accepted

comment:3 Changed 9 months ago by pmcguire

I contacted Fatima Chami at CEDA. She contacted POP. And then Judit Gimenez from POP wrote back and said that they can't reach Martin either. So Judit proposed that somebody else from their consortium start this JULES on JASMIN study again from scratch in 2-3 weeks. I accepted their proposal.

comment:4 Changed 5 months ago by pmcguire

Judit Gimenez at POP is continuing to work on the study of the JULES global suite on CEDA JASMIN.

comment:5 Changed 4 months ago by pmcguire

Judit Gimenez at POP was successful in getting a POP-instrumented gridded JULES run working on CEDA JASMIN.

comment:6 Changed 2 months ago by pmcguire

Judit Gimenez at POP has delivered a preliminary profiling analysis of JULES on CEDA JASMIN. We plan to discuss this by phone soon.

comment:7 Changed 5 weeks ago by pmcguire

I discussed the situation with Judit Gimenez from POP several weeks ago. We decided that it would be best to repeat the analysis with a run over the whole globe instead of only over the UK.

comment:8 Changed 11 days ago by pmcguire

Hello Patrick,

After some unsuccessful tests, on Monday I submitted a job that does not
instrument the application, just to check if that way also get the
coredumps. In fact, I do not get the coredumps, but seems it does not
work either.

I had a look on the logs, but not sure what it is the problem.

I saw in the log that it fails:

[judit@jasmin-cylc 01]$ pwd
/home/users/judit/cylc-run/u-ar790-16/log/job/1/jules/01

[judit@jasmin-cylc 01]$ more job-activity.log
[jobs-submit ret_code] 0
[jobs-submit out] 2019-03-04T13:14:56Z|1/jules/01|0|9098241
2019-03-04T13:14:56Z [STDOUT] Job <9098241> is submitted to queue
<par-multi>.
[(('event-mail', 'failed'), 1) ret_code] 0

But the rror does not give me any clue.. maybe I am looking in a wrong file?

[judit@jasmin-cylc suite]$ pwd
/home/users/judit/cylc-run/u-ar790-16/log/suite
[judit@jasmin-cylc suite]$ tail log.20190304T130657Z
2019-03-04T13:14:02Z INFO - [fcm_make.1] -(current:running)(polled)
started at 2019-03-04T13:06:59Z
2019-03-04T13:14:55Z INFO - [fcm_make.1] -(current:running)> succeeded
at 2019-03-04T13:14:55Z
2019-03-04T13:14:56Z INFO - [jules.1] -submit-num=1, owner@host=localhost
2019-03-04T13:14:57Z INFO - [jules.1] -(current:ready) submitted at
2019-03-04T13:14:56Z
2019-03-04T13:14:57Z INFO - [jules.1] -health check settings: submission
timeout=None, polling intervals=PT1M,…
2019-03-04T13:14:59Z INFO - [jules.1] -(current:submitted)> started at
2019-03-04T13:14:57Z
2019-03-04T13:14:59Z INFO - [jules.1] -health check settings: execution
timeout=None, polling intervals=PT1M,…
2019-03-04T13:15:05Z CRITICAL - [jules.1] -(current:running)>
failed/EXIT at 2019-03-04T13:15:04Z
2019-03-04T13:15:05Z CRITICAL - [jules.1] -job(01) failed
2019-03-04T13:15:06Z WARNING - suite stalled
[judit@jasmin-cylc suite]$

Thanks!

judit

comment:9 Changed 11 days ago by pmcguire

Hi Judit
For this rose/cylc suite, we have redirected the main error logs to the /work/scratch/judit/logs directory. One of your log files ( /work/scratch/judit/logs/err_16.txt ) says that you have an "Error reading namelist JULES_OUTPUT". I suspect this is because you wanted to disable the NETCDF output, which is what I think you did. Maybe it's not possible to disable the output completely. Maybe you can disable only certain variables in the output, if that would help. Or maybe there is another way to write the code to disable the NETCDF output completely.
Patrick McGuire?

Last edited 11 days ago by pmcguire (previous) (diff)
Note: See TracTickets for help on using tickets.