Opened 9 months ago

Closed 8 months ago

#2745 closed help (fixed)

Error archiving to mass

Reported by: nx902220 Owned by: um_support
Component: Archiving Keywords:
Cc: Platform: Monsoon2
UM Version: 10.5

Description

Hi,

I am running my nesting suite u-bc220 on monsoon. It runs successfully until the very last stage which is 55m nest archiving to mass.

The error message is:
[FAIL] ????????????????????????????????????????????????????????????????????????????????
[FAIL] ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
[FAIL] ? Error code: 20
[FAIL] ? Error from routine: READFLDS
[FAIL] ? Error message: READFLDS: start address of field not given
[FAIL] ? Error from processor: 0
[FAIL] ? Error number: 0
[FAIL] ????????????????????????????????????????????????????????????????????????????????

I have tried running

moo put -f -c umpp /home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um/55m.pp0 moose:/crum/u-bc220/apa.pp/20160504T0300Z_55m.pp0

in the command line and I get the same error.

The 55 m archive has worked in the past. All I can think is that I have added extra stash which means .pp0 is bigger (25 GB). Is there a file size limit?

Please can you help with this?

Thanks,

Lewis

Change History (7)

comment:1 Changed 9 months ago by grenville

Hi Lewis

Odd - it may be the file size. Try converting it to 32 bit pp - do this

/projects/um1/vn10.5/xc40/utilities/um-convpp /home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um/55m.pp0 ~/55m_um.pp0.pp

then run the moo put on ~/55m_um.pp0.pp — see what that does

Grenville

comment:2 Changed 9 months ago by nx902220

Hi Grenville,

Thank you for getting back to me. The conversion to 32-bit works. However when I do moo put on the 55m_um.pp0.pp file it fails with error message:

IO: Open: /home/d04/lblunn/55m_um.pp0.pp on unit 11
Request for 1464027307505440089 words, is not supported by a buffer of size 100
IO Error Report *
Unit Generating error= 11
—-File States —————————————
Unit 11 open on filename /home/d04/lblunn/55m_um.pp0.pp
—> File Type: 0 , Read Only: T , Write Only: F
—> Local: T AllLocal?: F Remote: F Broadcast: T
—> Local: T AllLocal?: F Remote: F Broadcast: T
—-End File States ———————————

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 24
? Error from routine: io:buffin
? Error message: Supplied buffer too small
? Error from processor: 0
? Error number: 0
????????????????????????????????????????????????????????????????????????????????

Lewis

comment:3 Changed 9 months ago by nx902220

I have done some more tests:

1) I have tried archiving a smaller 55 m file (~0.5GB) but this also fails.

lblunn@xcslc0:~/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um> moo put -f -c umpp /home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um/55m.pp1 moose:/crum/u-bc220/apa.pp/20160504T0300Z_55m.pp1
### put, command-id=668269981, estimated-cost=426696704byte(s), files=1
### /home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um/55m.pp1: converted file format.
/home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um/55m.pp1: file transfer failure.

  • task #0 (attempt 1 of 3): transfer failed (ERROR_TRANSFER).

/home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um/55m.pp1: file transfer failure.

  • task #0 (attempt 2 of 3): transfer failed (ERROR_TRANSFER).

/home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um/55m.pp1: file transfer failure.

  • task #0 (attempt 3 of 3): transfer failed (ERROR_TRANSFER).


uk.gov.meto.moose.business.command.exception.RetryableFileTransferException?: uk.gov.meto.moose.business.ftpclient.service.RetryableFtpClientException?: FTPService error: unable to login reply: 550 Requested action not taken.


put: failed (3)

2) I tried archiving 100 m file and this succeeded:

lblunn@xcslc0:~/cylc-run/u-bc220/share/cycle/20160504T0300Z/100m_um> moo put -f -c umpp /home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/100m_um/100m.pp1 moose:/crum/u-bc220/apa.pp/20160504T0300Z_100m.pp1
### put, command-id=668270940, estimated-cost=148312064byte(s), files=1
### /home/d04/lblunn/cylc-run/u-bc220/share/cycle/20160504T0300Z/100m_um/100m.pp1: converted file format.
lblunn@xcslc0:~/cylc-run/u-bc220/share/cycle/20160504T0300Z/100m_um> cd ../55m_um

The error I'm getting seems to be associated with my 55 m nest only.

3) My 55 m nest has worked in the past as can be seen from a moo ls:

lblunn@xcslc0:~/cylc-run/u-bc220/share/cycle/20160504T0300Z/55m_um> moo ls -lt moose:/crum/u-bc220/apa.pp/
F adrian.hill 0.08 GBP 3020656144 2018-11-05 01:36:58 GMT moose:/crum/u-bc220/apa.pp/20160504T0300Z_55m.pp0
F adrian.hill 0.01 GBP 319708560 2019-01-24 18:34:10 GMT moose:/crum/u-bc220/apa.pp/20160504T0300Z_ukv.pp1
F adrian.hill 0.02 GBP 886195448 2019-01-24 18:34:27 GMT moose:/crum/u-bc220/apa.pp/20160504T0300Z_ukv_dymeaa_pd015.cutout
F adrian.hill 0.01 GBP 282094904 2019-01-24 18:35:09 GMT moose:/crum/u-bc220/apa.pp/20160504T0300Z_ukv_dymeaa_pe015.cutout
……

I'm not sure if these tests help.

Thanks,

Lewis

comment:4 Changed 9 months ago by grenville

Lewis

Request for 1464027307505440089 words, is not supported by a buffer of size 100
- this looks like an endianess problem - but that file seems to have disappeared?

transfer failed (ERROR_TRANSFER) - this appears to be a new problem; please email Monsoon - they may have more information.

Grenville

comment:5 Changed 9 months ago by nx902220

Hi Grenville,

Thanks will do. I'll let you know if the issue is resolved.

Lewis

comment:6 Changed 8 months ago by willie

  • Component changed from Monsoon to Archiving
  • Platform set to Monsoon2
  • UM Version set to 10.5

comment:7 Changed 8 months ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.