#1988 closed help (answered)

archiving of log files in cylc-run directory

Reported by: marcus Owned by: ros
Priority: normal Component: UM Model
Keywords: cylc-run, log files, archiving Cc:
Platform: ARCHER UM Version: 10.4

Description

Hi, for each suite there are two versions of the cylc-run directory, one on Puma and one on ARCHER. I am trying to understand what happens to these files and where to find specific information.

  1. Puma seems to have a more complete set of log files, i.e. the directory structure in ~/cylc-run/{suite-name}/log/ contains more files, many of which are compressed into tar archives. ARCHER's version (linked to /work/n02/n02) has the model output under ~/cylc-rn/{suite-name}/share/ but not all of the log files. If I want to backup the information within the cylc-run directory at the end of the experiment to the RDF, should I backup the Puma version or the ARCHER version or both?
  1. Why does the job.status file get deleted when archiving with tar? The tar.gz files don't seem to contain this file any longer. This small file allows me to conveniently verify times of submission, initialisation and completion of the job, i.e. I can find out how long the job was held in the queue. Is there any other file which would let me track the submission or queue time of each cycle after the experiment has completed?
  1. What determines the timing when the ~/cylc-run/{suite-name}/log/ directory gets archived into a tar-file? Does this happen after a specific number of cycles and can this number be set somewhere in Rose?

Many thanks,
Marcus

Change History (7)

comment:1 Changed 22 months ago by ros

Hi Marcus,

I'll answer each point in turn..

  1. The log directories on PUMA contain all the log files from ARCHER (they are copied over to PUMA when a task finishes running) plus the log files from the tasks that are run on PUMA so also fcm_make log files and also contains the job-activity log file. The suites are controlled from PUMA so that's why there some extra files on PUMA. MONSooN has a shared filesystem between exvmsrose & HPC so that's why all the files are in one place over there. The cylc-run directory on ARCHER contains all the log files and output data from all the tasks that are run on ARCHER. In answer to your question about which to archive it depends on what you are trying to save.
  1. All my tarred up log files still contain the job.status file. Nothing gets deleted from the log directories to my knowledge.
  1. The log files are tarred up when you run the suite again. (rose suite-run). You can stop it tarring up the log directories by using the option --no-log-archive, but beware the directories can sometime be quite large depending on your suite. You can find more information on --no-log-archive option in the Rose documentation: http://metomi.github.io/rose/doc/rose-single-page.html

Hope that helps.
Regards,
Ros

comment:2 Changed 22 months ago by marcus

Hi Ros,
This is really helpful to know. The only issue to understand is why the job.status file gets deleted in my runs. See, for instance, on puma

/home/marcus/cylc-run/u-ag352/log/job-19890901T0000Z.tar.gz

This file does no longer contain job.status. I wonder why, is this a setting in my configuration files?
Many thanks,
Marcus

comment:3 Changed 22 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Marcus,

Ok. Sorry. We're talking about archiving of different log directories. I thought you meant the archiving of the whole ~/cylc-run/log directory not the ~/cylc-run/log/job sub-directories. These are controlled by different parts - the former is cylc the latter is a standard Rose app called rose_prune.

The tarring up of the job directory is controlled by the housekeeping app. The job directory is tarred up after the number of cycles specified by the variable archive-logs-at in the window housekeeping → prune There is no way to change what files it archives as far as I know. I'll talk to the Rose guys and see what it's supposed to do with the job.status file - the documentation doesn't note any exclusions.

Regards,
Ros.


comment:4 Changed 22 months ago by ros

Hi Marcus,

I've just been looking at the source code and it specifically excludes the job.status file. Why this is done I don't know yet. You could also extract the information, although a little more complicated from the job-activity.log

Cheers,
Ros

comment:5 Changed 22 months ago by marcus

OK, thank you Ros. I will have a look at the job-activity.log file (which I find very complicated to understand) in order to see where I can retrieve it. It seems odd that such a small file like job.status gets deliberately deleted. It can't be to save space, surely?

Many thanks,
Marcus

comment:6 Changed 22 months ago by ros

Hi Marcus,

I did talk to the Met Office, and they were going to check that functionality - I've yet to hear back. They did say that file is used by the cylc to restart and other things so it may be that it can't be archived because of this.

Regards,
Ros.

comment:7 Changed 19 months ago by ros

  • Resolution set to answered
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.