Opened 3 years ago

Closed 3 years ago

#1957 closed help (fixed)

Extracting data from netCDF files

Reported by: simon.tett Owned by: um_support
Component: Other Keywords: netcdf, extraction
Cc: Platform: ARCHER
UM Version: 8.5

Description

Hi,

I am extracting data from lots of netcdf files on the ARCHER archive. These files are produced using David H's cfa library to convert field files to netCDF. Each netCDF file contains lots of diagnostics and I am using cdo to extract a single variable from multiple files into one file. This process appears to be quite slow with the timing information suggesting this is not to CPU use:

extracting toa_outgoing_longwave_flux from all files to outlw.nc
cdo select: Processed 3317760 values from 28080 variables over 120 timesteps ( 11.76s )

real 5m40.302s
user 0m9.917s
sys 0m1.860s

(In this case I am extracting data from a 120 files). Are there any optimisations I can do to speed this up?

many thanks

Simon

Attachments (1)

test_for_stett.py (232 bytes) - added by charles 3 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 3 years ago by grenville

Simon

Could we try doing this with cf-python as a comparison? Please point us to the files.

Grenville

comment:2 Changed 3 years ago by simon.tett

See /nerc/n02/n02/stett2/archive/xmvpa/apy which contains a bunch of netcdf data.
Simon

comment:3 Changed 3 years ago by grenville

Please change permissions so we can read these.

Grenville

comment:4 Changed 3 years ago by simon.tett

Done!
I think..I have changed both file and dir permissions though don't understand s bit!
Simon

comment:5 Changed 3 years ago by grenville

Not quite there, please

chmod -R g+rX /nerc/n02/n02/stett2/archive/xmvpa

comment:6 Changed 3 years ago by simon.tett

Done!
Simon

comment:7 Changed 3 years ago by grenville

Hi Simon

We are experimenting on files in /nerc/n02/n02/stett2/archive/xmvpa/apy - there were 16 yesterday, but a new one has appeared which we don't have permission to read - could you make it readable for us (we are using wild cards in our tests)

Thanks

Grenville

comment:8 Changed 3 years ago by simon.tett

Hi Grenville,

trouble with giving you a live simulation… I have copied 14 of the files to /nerc/n02/n02/stett2/archive/test

I think those should work for you.

Simon

comment:9 Changed 3 years ago by grenville

Thanks

Changed 3 years ago by charles

comment:10 Changed 3 years ago by charles

Hi Simon,

I have written a python script to use cf-python to read in the 14 test files, select toa_outgoing_longwave_flux and write out a single file. The timings for the this are:

real 0m23.567s
user 0m13.249s
sys 0m10.029s

I have attached the script to this query. Please modify the paths and filenames and try the script on your files.

Thanks,

Charles

comment:11 Changed 3 years ago by charles

The timings using cdo for the same operation on the 14 test files are:

real 0m31.618s
user 0m1.212s
sys 0m0.160s

Charles

comment:12 Changed 3 years ago by simon.tett

So not a lot in it and the cdo seems to use considerably less user/sys time.
One advantage of the cf solution is that it would be easier to loop over the various things one wants to extract from the netCDF files. (This can be thought of as a monster data transposition.)

Simon

comment:13 Changed 3 years ago by ros

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.