Opened 6 months ago

Last modified 5 months ago

#3146 assigned help

Archiving data from Monsoon to JASMIN

Reported by: simon.tett Owned by: david
Component: UM Model Keywords:
Cc: Platform: JASMIN
UM Version: 10.0

Description

Hi,

is there a guide to what needs to be done to set things up to use the automatic archiving system that moves data from monsoon to JASMIN?

ta

(model version is a guess as I can't log into monsoon right now…)

Simon

Change History (23)

comment:2 Changed 6 months ago by simon.tett

Hi Grenville,

thanks a lot. After some quick reading I think I need some ssh magic…

"Set up ssh-key to connect to JASMIN

Some setup is required to enable non-interactive authentication from NEXCS to JASMIN. Please contact cms_support@… for details. "

I inherited the job from Christoph who, I assume, had done the ssh magic needed… So what is the incantation needed…

Simon

comment:3 Changed 6 months ago by simon.tett

Hi,

with help from Ros I am set up. To user xfer2 jasmin ned an ip address — what IP address should I give them?

Simon

comment:4 Changed 6 months ago by ros

Hi Simon,

IP sent by email

Cheers,
Ros.

comment:5 Changed 5 months ago by simon.tett

All setup and transfer now working — and once the ice-impact workspace is created I will move to using that rather than edin_cesd. But pp data is being transfered. Is there a way of making & transfering netcdf files rather than pp files?

ta
Simon

comment:6 Changed 5 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Simon,

It is possible to configure the suite to output streams in NetCDF direct from the UM rather than pp (except for the climate meaning).

I then have a branch to the post-processing which will archive the netcdf files and transfer them, however, to use this in your suite you would need to upgrade the post-processing app.

If you want to try this out I can point you to instructions, otherwise you will need to manually convert the transferred pp files to netcdf.

Cheers,
Ros

comment:7 Changed 5 months ago by simon.tett

Hi Ros,

except the climate meaning.. That is a pain! because the climate meaned output is what I normally use… Though I don't understant UM10.X well enough to know if climate meaning is what I think it is. I'd normally want monthly, seasonal & annual means….

How come the climate meaning doesn't produce netcdf???

Simon

Last edited 5 months ago by simon.tett (previous) (diff)

comment:8 Changed 5 months ago by ros

Hi Simon,

Climate meaning at 10.x is exactly what it was at older UM versions.

Climate Meaning as produced by the UM was never in the plan to be modified to output as NetCDF for multiple reasons; not least the Met Office's plans to remove climate meaning from the UM and put it into postproc. This only happened at the latest postproc release. We have modified postproc to allow archiving of NetCDF files, but due to higher priority work, like ARCHER2 preparation, we just have not yet had the resources to implement climate meaning of NetCDF files within post-processing. This will be done as soon as is practicable.

Regards,
Ros.

comment:9 Changed 5 months ago by simon.tett

Hi Ros,

thanks — guess I will have to use PP data then — yes can understand complexity of this and why you prioritise archer2 transition. Are there tools on Jasmin to convert from pp to netcdf? I guess I could just read it using iris and write it out again as NetCDF.

Simon

comment:10 Changed 5 months ago by simon.tett

I have it all working with data going to my new groupspace on JASMIN! But I changed the run to run in 1 year chunks and the archiving ran out of time… The wallclock time looks to be set to 3600 seconds = 1 hour. How do I increase the wall clock time?

Though 1 hour to convert all data to PP and transfer 6.1 Gbytes to Jasmin seems rather slow. Or should I reduce my cycle time to 3 months?

Job is bo595

Simon

comment:11 Changed 5 months ago by ros

Hi Simon,

In the site/MONSooN.rc file change the execution time limit in the [[PPTRANSFER_RESOURCE]] section.

Then reload the suite (rose suite-run --reload)

Cheers,
Ros.

comment:12 Changed 5 months ago by simon.tett

Hi Ros,

thanks — back to editing files I see… And once I have done the —reload then I do rosie go and resubmit?? How do I put the suite back into revision control so when I copy it that will persist. fcm commit ??

I am rerunning the pp_transfer. Before doing anything else. Will report back if have trouble.

Simon

comment:13 Changed 5 months ago by ros

Hi Simon,

rose suite-run --reload loads the changes into the already running suite so no need to do anything else to it apart from retriggering the failed pptransfer task.

Yes you commit changes to a suite in exactly the same way as for UM branches, ie. fcm commit

Cheers,
Ros.

comment:14 Changed 5 months ago by grenville

Hi Simon (re comment 9),

cfa for pp→netcdf conversion is available on jasmin. Just set up your environment as follows:

export PATH=/home/users/ajh/anaconda3/bin:$PATH
ln -s /home/users/ajh/cfplot_data ~

Note, that this uses cf-python version 3.
Hope that helps.

comment:15 Changed 5 months ago by simon.tett

Hi Ros,

thanks — transfers seem to be happening though one seems to have stopped sending after pushing across 11 ppa files…

To the next stage — converting to netcdf. So I grabed my old archer way of doing it:

cfa —reference_datetime='1750-1-1' —unsqueeze —single -f 'NETCDF4' —no_aggregation —outfile=bo595a.py19891201.nc bo595a.py19891201.pp

and got an assertion error (see below)
Anything I should be doing first? Some conda magic??

[tetts@jasmin-sci4 19890901T0000Z]$ cfa —reference_datetime='1750-1-1' —unsqueeze —single -f 'NETCDF4' —no_aggregation —outfile=bo595a.py19891201.nc bo595a.py19891201.pp
Traceback (most recent call last):

File "/home/users/ajh/anaconda3/bin/cfa", line 9, in <module>

import cf

File "/home/users/ajh/anaconda3/lib/python3.7/site-packages/cf/init.py", line 134, in <module>

import cfunits

File "/home/users/ajh/anaconda3/lib/python3.7/site-packages/cfunits/init.py", line 36, in <module>

from .units import Units

File "/home/users/ajh/anaconda3/lib/python3.7/site-packages/cfunits/units.py", line 212, in <module>

assert(0 == _ut_unmap_symbol_to_unit(_ut_system, _c_char_p(b'Sv'), _UT_ASCII))

AssertionError?
Simon

comment:16 Changed 5 months ago by ros

  • Owner changed from ros to david
  • Status changed from accepted to assigned

comment:17 Changed 5 months ago by simon.tett

Back to jasmin transfer…
Even after having increased the time for the transfer to 2 hours it is still failing. Looking at the log file I think it used 189 seconds and from what I have on JASMIN I think the transfer is hanging.

I started a rerun of the transfer which failed with an error:
rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32)
rsync: close failed on "/gws/nopw/j04/iceimpact/stett2/u-bo595/19900901T0000Z/.bo595a.pd1991jul.pp.ksK4A4": Input/output error (5)
rsync error: error in file IO (code 11) at receiver.c(730) [receiver=3.0.6]
rsync: connection unexpectedly closed (392 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(641) [sender=3.0.4]

transfer is going to /gws/nopw/j04/iceimpact/stett2/u-bo595 and using xfer2.

Simon

comment:18 Changed 5 months ago by simon.tett

and running the rsync interactively to get files over.

Simon

comment:19 Changed 5 months ago by simon.tett

Which ran for a while — transfering 2 Gbytes then hung.. Suggests some problem with monsoon→ jasmin system rather than UM…

Simon

comment:20 Changed 5 months ago by simon.tett

And I wonder if the solution is to add the following to the rsync command:
—timeout=10 # timeout after 10 seconds of no I/O
and perhaps add an option to the job to automatically resubmit if there was a failure…

S

comment:21 Changed 5 months ago by grenville

Simon

The problem is overloading of JASMIN - users are trying to wring out every last AU on ARCHER while streaming to JASMIN, those who switched to NEXCS to avoid the ARCHER hiatus are transferring data to JASMIN, and users are getting data off the RDF.

The pptransfer app will retry automatically - saving you the bother.

Grenville

comment:22 Changed 5 months ago by david

Hi Simon,

The cfa problem is sporadic, and environment based. I'm not sure how to eradicate it properly, but setting the UDUNITS2_XML_PATH environment variable ought do sort it. see
https://ncas-cms.github.io/cf-python/installation.html#unidata-udunits-2-library for details.

Thanks, David

comment:23 Changed 5 months ago by simon.tett

Hi David,

thanks for that — the document you point me to suggests setting UDUNITS2_XML_PATH to /home/user/anaconda3/share/udunits/udunits2.xml. That path is not correct as /home/user/anaconda3/ does not exist. Can you advice me what I should set it to?

Presumably if that works I should add this to my .bashrc file

Simon

Note: See TracTickets for help on using tickets.