[ > ]

Guide to the Met Office IDL library

Jonathan Gregory

with grateful acknowledgements to Amoon Osman, Penny Boorman, Ben Booth, Jon Croome, Denise Cresswell, Roger Milton and AJ Watling for text and examples, and to Simon Tett, Mark Webb, Tremain Knight, Mark Hedley and the Met Office AVD (formerly graphics) team for collaboration on development of the library over many years

2nd April 2004, 4th October 2011


  1. Introduction
  2. Start and exit IDL
  3. Keeping a record of a session
  4. Online help
  5. Running an IDL program
  6. IDL programming tips
  7. Procedures and functions
  8. Printing and saving graphical output
  9. PP examples
  10. PP files
  11. PP fields
  12. Missing data
  13. PP input
  14. CF-NetCDF input and output
  15. 3D fields
  16. Time axes, timeseries and day numbers
  17. Making up your own PP fields
  18. PP output
  19. Graphics routines
  20. Colour
  21. Mathematical computations on fields
  22. Collapsing dimensions
  23. Statistics
  24. Some other PP processing operations
  25. Processing a lot of data
  26. Organisation of the Met Office library
  27. Met Office system variables

Appendices

  1. Alphabetical list of Met Office IDL routines
  2. Categorised list of core Met Office IDL routines
  3. Differences between PV-WAVE and IDL
  4. Definition of PP header record
  5. List of PP field codes
  6. Description of metasplit/stashsplit directories
  7. Using LISTs, ASARRs and DC_READ_ Routines in IDL7.x
  8. Support for very large PP files (over 2 Gbyte)

Download PV-Wave/IDL Library subject to licence agreement


[ < ] [ > ] Contents

Introduction


What are IDL and PV-WAVE?

IDL is a powerful programming language, supporting concepts familiar from other languages such as Fortran and C (variables of different data types, flow control with if-statements, do-loops, etc.). It was designed for data analysis and visualisation.

Unlike Fortran and C, IDL doesn't have to be compiled as a separate operation before being run. That has the advantage that you can use IDL interactively at the command line, as well as writing programs in the language. It's a bit like a Unix shell: the commands that you type interactively at the Unix command line can also be used in a Unix shell script (a program), and you don't have to compile the script before running it. This approach has the drawback that IDL is slower than properly compiled and optimised languages like Fortran and C.

PV-WAVE is a similar language to IDL. IDL is the original. The firm which originally produced PV-WAVE bought the rights to develop the language. Since then the two languages have diverged, but they are still very similar. Both languages are proprietary, meaning you have to buy a licence to use them. The Met Office used PV-WAVE throughout the 1990s, and migrated to IDL during the 2000s. In the academic community IDL is much more common than PV-WAVE.

What is the Met Office library?

The Met Office library is a collection of routines written in IDL at the Met Office, originally written in PV-WAVE in the Hadley Centre in the early 1990s, with development continuing to the present day. Its main purpose is to support analysis and visualisation of PP data produced by the Unified Model (UM). PP is a binary file format peculiar to the Met Office and the UM. It's fairly self-describing but not as easy to use or as flexible as netCDF. Support for climate data stored in netCDF following the CF convention is available in the Met Office library, although we don't expect that the UM itself will be altered soon to write out netCDF data. But being able to read netCDF obviously makes the library more generally useful, because of the large number of model intercomparison projects, such as CMIP, and observational datasets for which CF-netCDF is the data format.

The best way to find out what the Met Office library offers is to look through the Appendices containing alphabetical and categorised lists of its contents. It contains routines for the following major functional areas:

The library also includes a fairly large number of general-purpose (not PP-specific) routines which may be useful in a wider range of applications. The library is also called the "UKMO" library. This is a non-brand-compliant acronym for the organisation formerly known as the UK Meteorological Office.

The library is organised into a "core" (approximately 520 routines and 66k lines of code), maintained by the Met Office AVD team, and the Hadley Centre "applications" (another 230 routines and 35k lines), which are not maintained in principle. However the distinction is somewhat blurred.

The library works in PV-WAVE as well as IDL. To some extent it hides the differences between the two, and hence makes user programs more portable. In the Met Office, IDL is usually run with local modifications that add to and in some cases override the native language, in order that PV-WAVE programs can run without modification. The most important of these is the implementation of lists and associative arrays in IDL versions before 8.0; this extension to the IDL language is needed to support a couple of Met Office library routines and is therefore required outside the Met Office as well. Apart from this, native IDL is always used outside the Met Office. In IDL 8.0, lists and associative arrays (hashes) were introduced as part of the native language, so this Met Office language extension is no longer required.

The library can be made available to institutes having collaborations with the Met Office or analysing data from the UM. If you would like to obtain the library for use outside the Met Office, please follow the link to download it at the top of this guide. It is free of charge but there is a licence agreement.

What is the purpose of this document?

This document is designed to help a new user with little programming experience quickly learn how to use the Met Office library. It is not an introduction to IDL, although the first few sections provide some basic information. To learn the language, read the online manuals, which are comprehensive, consult the NCAS-CMS IDL guide, or seek help from colleagues! The Met Office AVD team provides an introductory course to IDL for Met Office staff.

Please send comments about this document to j.m.gregory at reading.ac.uk.

Conventions in this document

Throughout this guide, $ indicates the (Unix/Linux) operating system prompt and IDL> the IDL or PV-WAVE prompt. If the languages are different, we use WAVE> instead for PV-WAVE. We use typewriter type for program text to be typed exactly as it appears. When we refer to "IDL" it means both IDL and PV-WAVE unless otherwise stated.



[ < ] [ > ] Contents

Start and exit IDL


To start IDL or PV-WAVE, you may require special instructions for your place of work.

IDL in the Met Office IDL elsewhere
The environment for an IDL session is already established on Met Office Linux workstations. Use

$ tidl
to start IDL. The tidl command has various other options. Enter tidl -help to see them.
To start IDL with the setup required for the Met Office library, at the Unix prompt enter
$ midl
(for Met Office IDL) or some local equivalent. You might need to use an xterm window for the up- and down-arrow keys to work. Option -d will start IDL in 64-bit mode if your operating system supports it.

In Reading you need to include setup idl cgam midl in your .kshrc to make the Met Office software available. (Note that the licence is for NCAS-Climate at Reading and the Met. Department in general.)

If you want to access the English translations of stash codes (this is very useful for automatic labelling of plots, etc.) you should define the UM version you want in your Unix environment e.g.

$ export UMVERSION=4.5
before starting IDL.

At the IDL prompt, use the up-arrow key to recall the previous line entered, the down-arrow the next line in the history, -A to go the beginning of the current line, -E to the end of it, -K to erase the text from the cursor to the end of the line (these three are familiar to users of emacs), and ^string to recall the most recent line containing the string. In IDL, the system variable !edit_input defines how many previous command lines are remembered. Many other keys have special functions. Use WAVE> info,/keys or IDL> help,/keys to find out more.

In both IDL and PV-WAVE, when you start up the Met Office environment an extra startup file is executed before your own, if you have one. The UKMO startup file is $UKMO_IDL_STARTUP in the Met Office, and $UKMO_WAVE_STARTUP elsewhere (even for IDL). Your own startup file is named by the Unix environment variable IDL_STARTUP in all cases for IDL (or WAVE_STARTUP if you are using PV-WAVE).

The UKMO setup defines various other environment variables. IU_LIB is the location of the core UKMO library. In the Met Office, IDL_PP_PATH, IDL_DATA_PATH and IDL_SHARED_PATH list directories to be searched by various functions. These environment variables are called WAVE_PP_PATH, WAVE_DATA_PATH and WAVE_SHARED_PATH outside the Met Office (even for IDL). They are copied to corresponding Met Office library system variables. The IDL_PATH/WAVE_PATH in IDL and PV-WAVE respectively are used to locate IDL/PV-WAVE commands.

At the IDL> prompt you can enter IDL commands line by line at the prompt or you can use a program file. A program file is a file containing a set of IDL commands, which IDL executes line by line. It can be written either directly to a file whose name ends in .pro, or by using journal to record a session.

To end a session, type:

IDL> exit

or -D or (in PV-WAVE, but not in IDL)

WAVE> quit


[ < ] [ > ] Contents

Keeping a record of a session


To record all or part of an IDL session

Use

IDL> journal,'filename'

which saves all following commands to file filename or, if you omit the filename, it will save it to a file called wavesave.pro/idlsave.pro in the current directory. Enter

IDL> journal

to end the recording

The result is a file that contains a complete description of the IDL session, which can be rerun later as a @-command file, or as a compiled main program if you put an end statement at the end of it.

To save the definitions and values of variables

Use

IDL>SAVE, Filename = 'filename',var1,var2,/Verbose

This will save the variables into the file filename. If Filename = 'filename' is omitted, the default filename is wavesave.dat/idlsave.dat). To save all current variables, use the keyword /Variables and don't specify any by name. This is useful if you want to stop a session and resume it later.

To restore the saved variables

Enter

IDL>RESTORE, Filename = 'filename',/Verbose

The optional /Verbose keyword makes IDL give useful information about what it is saving or restoring.


[ < ] [ > ] Contents

Online help


To access IDL's online manuals enter:
IDL> ?

For help on Met Office (UKMO) routines e.g. for the routine pp_diff:

$ tidlman pp_diff # in the Met Office
$ midlman pp_diff # outside the Met Office
IDL>man,'pp_diff'
(note the quotation marks in IDL). If there is no match, you will get a list of all routines whose name or one-line description contains the specified string. This "keyword" matching can be deliberately selected with the /key keyword e.g.
$ midlman -k zonal
IDL>man,'zonal',/k

To get information about the IDL session, use the command INFO in PV-WAVE or HELP in IDL, which have many similar options. Given variable names, INFO/HELP reports their type.

IDL> a=2
IDL> help,a
A INT = 2
IDL> b=2.0
XXX> help,b
B FLOAT = 2.00000
One other nice use of INFO/HELP is that it will evaluate expressions:
IDL> help, 2 * 5 + (10 * COS(45))
FLOAT = 15.2532
PRINT would tell you the result, but INFO/HELP gives you the data type of the result as well.

If you are in the Met Office, for further help you can contact the AVD Team who are responsible for supporting IDL. They run an introductory course on IDL and maintain documentation on MetNet. There are Met Office netnews groups concerned with graphics.


[ < ] [ > ] Contents

Running an IDL program


Programs (also called routines) written in IDL and stored in files can be run by several different methods. This section applies to main programs i.e. those which don't start with a pro or function statement. (See Procedures and functions for routines other than main programs.)

Specifying which file you want to run

If you give a name with no / in it, IDL will look for a file of that name, optionally with the suffix .pro, in the colon-separated list of directories in your WAVE_PATH/IDL_PATH Unix environment variable. (The path is also stored in the IDL system variable !path.) This is the same idea as searching for Unix commands in the Unix PATH. The first file of the right name is used.

In IDL, unlike in Unix, the current directory is always assumed to be at the start of the path. In IDL (not in PV-WAVE), a directory in the path can be prefixed with +. This means all subdirectories of that directory will also be searched.

You can modify the path by changing the WAVE_PATH/IDL_PATH environment variable, for instance in your shell startup file (.dtprofile, .bashrc, .profile or whatever).

Compiled programs

To improve run-time, IDL does a kind of "compilation" of a program. PV-WAVE (but not IDL) also has an optional facility to store the compiled versions in separate files (suffix .cpr). This saves the (small) compilation time, but doesn't make the routines run any faster.

If you give a filename by itself on the command line:

IDL> filename

IDL will compile the file and run it. For this to work, the file must contain a program concluding with an end statement. Such files can also be invoked with

IDL> .run filename

In both cases, IDL compiles the file into memory, and then runs it provided it's a main program. You need the .run command to recompile a program if you alter the program file because otherwise invoking the program again just by name will execute the existing version from memory, not the new version in the file. You can use the .com command to compile a program, but not run it.

Scripts (not compiled)

If you give a filename like this:

IDL> @filename

IDL will not compile the file or store it in memory. This method is just as if you had typed in the lines of the file one by one at the command line. Files of this kind must not have an end statement. Moreover, they can't contain block constructions (if-blocks, etc.) spanning several lines unless all these lines are joined up with & $ to make them effectively one long line. This is because the same restriction applies to lines entered at the IDL prompt; all block constructions must be completed e.g.

IDL> fish=1 & if fish then begin & print,fish & print,'head' & endif
1
head

How do you decide which to use? The first (compiled) method is necessary if you want to write procedures (subroutines) or functions. It also has the advantage of supporting block constructions properly. The second (not compiled) is convenient for a script of a few lines. In either method, variables defined by the program will still be available at the command line when the program finishes.

Invoking a routine from the Unix command line

Yet another method is to specify a file to the IDL command in Unix:

$ midl filename

This will execute the file indicated (no paths are searched) like an @-file.


[ < ] [ > ] Contents

IDL programming tips


Some programming tips for PV-WAVE and IDL.


[ < ] [ > ] Contents

Procedures and functions


Procedures and functions are an integral part of the structure of IDL. Almost all the routines in the Met Office library are procedures and functions. They encourage a modularised and flexible approach to programming.

Here is an example of simple IDL procedures and functions:

pro doubleanumber,argument
argument=argument*2
end
IDL> number=46
IDL> doubleanumber,number
IDL> help,number ; info in PV-WAVE
NUMBER INT = 92
function halveanumber,argument
result=argument/2.0
return,result
end
IDL> number=92
IDL> help,number,halveanumber(number) ; info in PV-WAVE
NUMBER INT = 92
FLOAT = 46.0000

As the procedure example shows, variables which are passed in as arguments (number in the example) can have their values modified in the calling routine (it changes from 46 to 92). This is like Fortran. A difference from Fortran is that when elements of arrays (not entire arrays) are passed in as arguments, a procedure or function can't alter their values in the calling routine. For example,

  IDL> array=[3,4,5]
IDL> doubleanumber,array(1)
IDL> print,array(1) ; value has not been changed
4
IDL> doubleanumber,array
IDL> print,array ; values have been doubled
6 8 10

A single program file can contain more than one routine. If any of the routines in the file has the same name as the file, then invoking that routine will cause the all the routines in the file to be compiled (as for compiled main programs). The file can be (re)compiled using the .run command. Confusingly, .run does not execute any procedure or function; it just compiles them into memory.

If an error occurs in a routine, program control stops at the line in the source file where the error occurred by default. This means IDL is "inside" the routine still, and you can examine its local variables. Putting in a stop will also allow you do that. You can only compile a routine when IDL is at the "main level" i.e. not still "inside" any routine. To get back to the main level, enter retall in IDL. You don't have to do this in IDL because .run does retall automatically. The Met Office routines change the default to so that control goes automatically back to the main level in case of a failure. Error-trapping is controlled by the on_error statement, which is included in the comm_error @-file appearing at the start of all the UKMO routines.

Invoking a file with @ is like an "include" file in Fortran or C. This construction is used in program files to "include" blocks of lines which are common to many program files. Such files may contain whatever would compile correctly if it was explicitly typed into the program file in which it is included. Unlike when files are invoked from the command line with @, block constructions may span several lines, because the lines are going to be compiled rather than executed individually.


[ < ] [ > ] Contents

Printing and saving graphical output


IDL has a notion of a "current graphics device", which is where the pictures go. This might be your terminal (the "X" device) or some kind of file for printing (usually the PostScript device "PS"). The IDL command for changing device is set_plot.

However, the UKMO library contains more powerful routines pr for changing from X output to file, and prend for changing back again. In addition to changing the device, these also set up the hardcopy page size properly and swap foreground and background colours when necessary (white on black for screen, black on white for paper). If a printer is named, pr sets up the device in an appropriate way. (This depends on a control file having been created at your site naming the various printers.)

Examples of pr:

  IDL> pr,/ps,/land ; print to the default PostScript printer in landscape
IDL> pr,/trans,d='harold' ; print to harold on transparency (in portrait)
IDL> pr,/eps,/col ; print to a colour encapsulated PostScript file

The default monochrome PostScript printer can be defined through the Unix environment variable PSPRINTER, and the default colour PS printer as CPSPRINTER.

After changing to a printer device, you should reload the colour table if you are using one, as the printer may have a different number of colours from the X device.

Then execute the command to draw the picture e.g.

  IDL> plot,indgen(10),0.5*indgen(10)

The graphics will be written a file named wave.ps, wave.eps, etc. in the current directory.

Examples of prend:

  IDL> prend ; print
IDL> prend,/view ; preview the result before deciding whether to print it
IDL> prend,/keep ; don't delete it after printing, which is done by default
IDL> devend ; abandon the plot and go back to X without printing

Other routines for graphical output include image_grab to capture an X window as a GIF, and gif4web for creating an animated GIF from a set of single-frame GIFs or pp_contour image files.

See elsewhere for problems with printing pp_contour plots on colour PostScript printers.


[ < ] [ > ] Contents

PP examples


Here's a simple example of reading a PP field from a PP file and making a latitude-longitude plot of it. Following sections will explain the commands used in more detail.

IDL> eg=pp_assoc('$WU_DATA/example') ; eg is a "handle" to the file
IDL> temp=ppa(eg,ss(atmos=3236)) ; get a temperature file (stash code 3236)
% % PPALIST: File .../example.pp contains 2 fields
IDL> tek_color ; Load a colour table
IDL> ; Make a "block" plot, in which each gridbox is shown as a coloured
IDL> ; rectangle. If you omit /block, you'll get contour lines instead.
IDL> pp_contour,temp,base=-180,/block,title='Surface air temperature (K)'

The following version of the program makes a "prettier" plot as a colour PostScript file.

IDL> eg=pp_assoc('$WU_DATA/example')
IDL> temp=ppa(eg,ss(atmos=3236))
IDL> pr,/col,/ps,/land ; set output to landscape colour PostScript file
IDL> restore_colors,'shortbow' ; load a rainbow colour table
% RESTORE_COLOURS: Loading .../shortbow.clr
% RESTORE_COLOURS: Interpolating to 256 colours from 212 red 212 green 212 blue
IDL> ; Make a contour plot with colour-filled contours, with levels chosen
IDL> ; at 10 K intervals and colours "interpolated" across the whole
IDL> ; spectrum of the colour table
IDL> pp_contour,temp,base=-180,/fill,title='Surface air temperature (K)',$
IDL> bstyle=style(/interp),charsize=1.2,tick_space=45,interval=10
IDL> prend,/view ; close PostScript file and view result.
IDL> ; when you're ready, Quit ghostview/gv from its File menu
OK to print? (Y/N):

If you enter Y, the file is printed.

Here's another example, which makes a zonal-mean precipitation plot. Before starting IDL, enter export UMVERSION=4.5 in Unix, to specify which version of the UM the data comes from. (The example file contains HadCM3 data from UM version 4.5. UMVERSION could be set to a later version than 4.5.)

IDL> eg=pp_assoc('$WU_DATA/example')
IDL> stashlist,eg ; see what quantities are in the file
Submodel Stash Description
ATMOS 3236 temperature at 1.5m
temperature at 1.5m (gbm)
ATMOS 5216 total precipitation rate kg/m2/s
IDL> ppn=ppa(eg,ss(at=5216)) ; get precipitation field, in kg m-2 s-1
IDL> ppn=pp_ff('a*86400',ppn) ; convert from kg m-2 s-1 to mm day-1
IDL> pp_zonal_mean,ppn ; make zonal-mean plot


[ < ] [ > ] Contents

PP files


The Unified Model writes out its data in the PP file format. There are two kinds of PP file:

Conversion from the first to the second kind is done automatically by the UM running at the Met Office, before PP files are archived. Hence at the Met Office you are only likely to encounter the second kind of PP file.

The NCAS-CMS UM setup usually archives PP files of the first kind. Jeff Cole's (NCAS-CMS) program xconv can read both kinds. Jeff also has a program ff2pp to convert from the first to the second kind (in ~jeff/bin in Reading).

To read the first (or only) PP field out of a PP file, you can use the UKMO IDL command ppa:

      IDL> z=ppa('filename')

The variable z (you can call it what you like, of course) now contains the PP field, both its header and its data. There are more versatile methods for reading PP data; this is the simplest.

If the filename contains no / e.g. aaxzca.pmj1sep.pp, IDL looks for a file of that name in the directories named by the colon-separated list in the IDL system variable !pp_path. This variable is set from the Unix environment variable WAVE_PP_PATH on startup. Hence you can change it to suit yourself in your Unix setup.

At the Met Office, some PP files are archived with packing of the PP field data, to save space in MASS. By default, packed PP fields are automatically unpacked by ppa; this behaviour is controlled by the !ppa_acceptpacked system variable, as documented by man,'ppa'. Alternatively you may create unpacked versions of the files, using the Unix command unpack in the Met Office, installed as ppunpack at Reading (setup cgam to obtain access to it). Read its Unix man page for details.

Special facilities have to be enabled for PP input from files which exceed about 2 Gbyte. This sounds a lot for a single file, but it's not unlikely for output from high-resolution models. Support for such "large files" is controlled by the !pp_largefile_extend system variable, as described by an Appendix.


[ < ] [ > ] Contents

PP fields


In the UKMO software, a PP field is an IDL "structure" variable. This is like a Fortran 90 or C structure. The structure variable is composed of many elements, which are mostly (long) integers and floating-point numbers. Each element has a "tag name" to identify it within the structure.

  IDL> help,z ; info in PV-WAVE instead of help

will list all the elements in the structure (PP field) z. The first, for instance, has the tag name LBYR. This element is the year to which the field applies, or if it's a time mean, the year in which the mean starts.

  IDL> print, z.lbyr, z.lbyrd

prints the start and end years of a time-mean.

The data of the PP field is in the element called DATA. This is a two-dimensional floating-point array e.g.

  IDL> help, z.data ; info in PV-WAVE instead of help
FLOAT = Array(96, 73)

As with any array, you can subscript the data array e.g.

  IDL> print, z.data(*,2)

to print the 3rd row. In IDL, array subscripts start from 0 (like C, unlike Fortran). The dimensions of the array are in the elements lbnpt and lbrow of the structure.

  IDL> print,z.lbnpt,z.lbrow
96 73

PP fields are always two-dimensional in structure. The commonest kind are longitude-latitude fields, but there are also cross-sections e.g. latitude-height, latitude-time (often called Hovmüller), time-depth. When a field is really one-dimensional in intent, such as a single timeseries (a function of time only) or a zonal-mean field (a function of latitude only), it nonetheless has to be stored in a 2D PP field structure, with one of the dimensions of the data array having a size of one, usually the x-dimension with lbnpt=1.

The information in the data element comes from the data record of the PP field in the PP file. All the other information comes from the preceding header record in the file (except that if the field has irregularly spaced x- and y-coordinates these come from the "extra data" in the data record, rather than the header). It is a fundamental feature of the UKMO library that the header and data are "carried around" together in PP structures, so that the data is always identified by its header "metadata" (data about data). UKMO routines that produce new fields by combining fields also modify the metadata so that the new field describes itself properly. The UKMO routines often work out how to process fields by looking at their metadata (e.g. to produce the title of a contour plot). It is essential that the metadata data is correct.

A summary description of the field can be obtained with

  IDL> print,pp_guess_title(z)

(This is what pp_contour uses for its titles.)

You can see all of the PP field metadata conveniently with the command printfdr e.g.

  IDL> printfdr,z
LBYR: 1860 LBMON: 9 LBDAT: 1 LBHR: 0 LBMIN: 0 LBDAY: 241 LBYRD: 1990 LBMOND: 9
LBDATD: 1 LBHRD: 0 LBMIND: 0 LBDAYD: 241 LBTIM: 22 LBFT: 43200 LBLREC: 7008
LBCODE: 1 LBHEM: 0 LBROW: 73 LBNPT: 96 LBEXT: 0 LBPACK: 0 LBREL: 2 LBFC: 106
LBCFC: 0 LBPROC: 128 LBVC: 129 LBRVC: 0 LBEXP: 0 LBEGIN: 2000 LBNREC: 0
LBPROJ: 870 LBTYP: 120 LBLEV: 9999 LBRSVD: 0 0 0 0 LBSRCE: 1111
LBUSER: 1 823296 0 8208 0 0 1 BRSVD: 0 0 0 0 BDATUM: 0 BACC: 0 BLEV: 0 BRLEV: 0
BHLEV: 0 BHRLEV: 0 BPLAT: 90 BPLON: 0 BGOR: 0 BZY: 92.5 BDY: -2.5 BZX: -3.75
BDX: 3.75 BMDI: -1e+30 BMKS: 1

or you can use WAVE> info,z,/struct or IDL> help,z,/struct. The meanings of the header elements are described in detail in an Appendix. They start with LB (integers) and B (reals) because that is how they were named by the Fortran 77 PP package, which is no longer in regular use.

Of particular interest are the UM stash code field.lbuser(3) and submodel ID field.lbuser(6). These elements indicate what quantity the field contains (temperature, pressure, etc.). To look up translations for them, you can use

  IDL> stash,z
Submodel Stash FC Description
ATMOS 8208 106 soil moisture content

The "PP field code" (in the column "FC") is an older set of codes. Each stash code implies a particular field code. However, there are field codes for quantities which don't have stash codes assigned, because they aren't generated by the UM. There is a list in an Appendix.

Other important routines for getting information from PP fields are pp_coords for working out the x- and y-coordinates (often longitude and latitude) of the grid points, pp_level for the vertical coordinate (hybrid height, hybrid sigma-pressure, ocean depth, etc.), pp_day for the date and time encoded as a Julian day number, pp_period for the length of the meaning period (e.g. a month, a year, a decade), decode_expt(field.lbexp) for the UM runid.


[ < ] [ > ] Contents

Missing data


Missing data in PP fields

The data array of a PP field can indicate points where data is "missing". When the field is plotted with pp_contour, for instance, areas of missing data are left blank, and when statistics are calculated, missing data values are omitted. Missing data in a PP field is indicated by a data value equal to the bmdi element of the field structure. Hence

  IDL> missing=where(field.data eq field.bmdi,count) ; can use pp_mdi for this
IDL> if count ne 0 then field.data(missing)=0

gets a vector of indices to the missing points, and then replaces missing data with zero. The count test is to avoid an error if there are no missing data points. Likewise

  IDL> present=where(field.data ne field.bmdi,count) ; can use pp_mdi for this
IDL> if count ne 0 then data=field.data(present)

fetches the non-missing values into the 1D array data. The function fv will do this operation:

  IDL> data=fv(field)

is equivalent.

If you write programs that manipulate missing data values, it is essential to refer to bmdi to get the missing data indicator and not assume it has a particular value! This is because it does not always have the same value. The default for fields created by the UKMO library is different from the UM default, for instance.

If operating on a vector of fields, note that you can't subscript a 3D data array. If field is a vector, the above example won't work. You have to loop over the individual fields.

Missing data tolerance

Some UKMO PP statistics routines have a keyword mdtolerance which affects their treatment of missing data. For instance, if fields is a vector of PP fields, then pp_avg(fields) returns a PP field in which each data value is the average of the corresponding points in all the fields. If any of the values at a particular point is missing, missing data is returned in the result field by default. This is completely "intolerant" of missing data. But pp_avg(fields,/mdtolerance) (equivalent to mdtolerance=1) returns a result wherever at least one of the input fields is not missing i.e. as tolerant as possible. Intermediate values such as mdtolerance=0.5 specify the maximum proportion of fields which can have missing data at a point if a non-missing result is to be returned.

Non-PP routines supporting missing data

The UKMO library provides routines avgmdi, stdevmdi and plotmdi, corresponding to avg/sum, stdev and plot, which respectively calculate average/sum and standard deviation and plot ordinary IDL arrays of data in which there may be missing data. avgmdi and stdevmdi support the mdtolerance keyword and plotmdi leaves gaps in plots where there is missing data. They all permit the missing data indicator to be specified, with a default set by the !pp system variable.


[ < ] [ > ] Contents

PP input


If a PP file contains many PP fields (as it usually does) you need to specify which field you want. The best way to do this is by using ss to generate a "search expression" for ppa e.g.

  IDL> sat=ppa('aaxzca.pmj1sep',ss(atmos=3236))

extracts the PP field which has atmosphere stash code 3236 (1.5 m air temperature).

ss has many keywords to specify the fields in various ways. If more than one keyword is given, all the conditions apply together (logical AND). Some keywords can accept a vector (1D array) of values. These values are accepted as alternatives (logical OR). For instance

  ss(period=1,fromdt=1991,todt=1998) ; annual means starting in years 1991-1997
ss(ocean=[101,102],lblev=[1,2]) ; level 1 or 2 for ocean stashcode 101 or 102
ss(mon=[7,8],proc=128) ; time-mean (proc=128) from July or August

If you are accessing the same file for many different variables, it is usual first to obtain a "handle" to it with pp_assoc

  IDL> sep=pp_assoc('aaxzca.pmj1sep')

then get fields from it by using the handle to refer to the file

  IDL> sat=ppa(sep,ss(at=3236))

In fact the one-step command above is equivalent to sat=ppa(pp_assoc('aaxzca.pmj1sep'),ss(atmos=3236)). The name of the variable which stores the handle is up to you. Its actual contents are also not of interest. Its only function is to be used to refer to the file when you want to get PP fields out of it.

You can conveniently see what is in the file using the command stashlist, which will produce a list of all the different stashcode/submodel combinations in the file, with their English descriptions e.g.

  IDL> stashlist,sep
Submodel Stash Description
ATMOS 1 pstar after timestep
ATMOS 2 u compnt of wind after timestep
ATMOS 3 v compnt of wind after timestep
ATMOS 4 theta after timestep
...

In order for this to work, you must have defined the UM version in the Unix environment variable $UMVERSION before starting IDL. stashlist,/dt will tell you all the distinct dates of fields in the file e.g.

  IDL> da=pp_assoc('abbda.decades')
IDL> stashlist,da,/dt
y m d h m
1859 12 1 0 0
1869 12 1 0 0
1879 12 1 0 0

this file contains fields dated 1 Dec 1859, 1869 and 1879. stashlist,/period lists the meaning periods e.g.

  IDL> stashlist,da,/pe
y m d h m
10 0 0 0 0

shows all the fields are decadal means (no other meaning period is shown). stashlist,/level lists the vertical level coordinates used in the file. This is most useful when specifying a particular quantity e.g.

  IDL> stashlist,da,ss(ocean=101),/lev

to list the depth coordinates of the fields of ocean stashcode 101 (potential temperature). The output shows the vertical field code (see under LBVC in the Appendix) for the field as well as its coordinate value.

If you have several PP files and a handle for each of them, you can then search them simultaneously with ppa e.g.

  IDL> salinity=ppa([file1,file2,file3],ss(fc=602))

where file1 etc. are the handles returned by pp_assoc.

Data from long UM experiments is often stored (especially in the Hadley Centre) in "metasplit" or "stashsplit" directories, which contain fields from many UM PP files. The way in which these work is described in an Appendix. Data retrieved by query_masscam is delivered in this form. The UKMO routines regard a metasplit directory as a single "pool" of data, and you get fields from it just as though it were a single PP file. First, obtain a handle to the directory e.g.

  IDL> annual=ss_assoc('aaxzc.000100') ; not pp_assoc

Like ppa and pp_assoc, ss_assoc searches the !pp_path for the directory if there is no / in the name given. The handle annual can then be used as an input to ppa just as if it pointed to a single PP file. stashlist can be used just as for a single PP file to examine the contents of the directory.

In the Hadley Centre, another way to get access to data is by using the Camelot Dataline database (it is intended to replace this with a more reliable MASS cache arrangement) to find it for you e.g.

  IDL> annual=camassoc('aaxzc',period=1)

returns a handle which points to all the annual-mean data from aaxzc online which Dataline knows about. Again, it "looks" like a single PP file to ppa.

Apart from handles, another kind of pointer to fields is a PP header variable. This mechanism is used a lot internally in the UKMO routines. For example,

  IDL> annual=ss_assoc('aaxzc.000100')
IDL> headers=ppa(annual,/all,/header)
IDL> field=ppa(headers,ss(at=3236,yr=1991))

The headers variable is an array of PP headers, which are structures just like PP fields but with no data element. They know which PP file they refer to, so PP headers instead of a handle to the data source can be given as input to ppa.


[ < ] [ > ] Contents

CF-NetCDF input and output


NetCDF is becoming increasingly common as a file format for exchange of climate data. Its advantages are that it is portable (i.e. machine-independent) and supported by a freely available library of I/O routines with interfaces in many programming languages. NetCDF is a very adaptable format, and its authors intended that users would adopt conventions for particular applications. One such convention is COARDS, developed in 1995 for "the sharing and distribution of global atmospheric and oceanographic research data sets." A more recent one, which extends COARDS, is CF. CF has been adopted by several climate centres and projects, such as PCMDI and CMIP. "The purpose of the CF conventions is to require conforming datasets to contain sufficient metadata that they are self-describing in the sense that each variable in the file has an associated description of what it represents, including physical units if appropriate, and that each value can be located in space (relative to earth-based coordinates) and time."

The Met Office IDL library can read netCDF files conforming to CF. Since the library was designed for PP fields, the interface to netCDF files makes their contents look like PP fields, just as if they were PP files. To access a netCDF file, open it with ncassoc e.g.

  IDL> handle=ncassoc('$DATADIR/xsect.nc')

The suffix .nc is conventional for netCDF files, but not mandatory, and ncassoc does not assume it. Like pp_assoc and ss_assoc, ncassoc searches the !pp_path for the directory if there is no / in the name given. The handle it returns can be used as an input to ppa, stashlist, and any other routine which will accept a handle to PP fields. This handle is, in fact, a vector of PP headers.

The data in CF-netCDF files is stored in variables which are identified by names such as air_temperature and precipitation_flux, stored in standard_name or long_name attributes. In order to identify the data as PP fields, these names have to be translated to stash codes or field codes. Although CF does not assign any significance to the names of netCDF variables, as a last resort ncassoc will attempt to match the variable names to stash codes or field codes, if it can't find an equivalent for the standard name or long name. The translation is done using the routine codename. The translation tables are in files identified by the system variable !codenamefile, and the default setting offers translations for quite a wide range of stash codes. If no translation is available for a particular name in your file, ncassoc will give a warning, such as No field code or stash code translation possible for variable xyz with standard_name="northward_ocean_freshwater_transport" and long_name="MEAD DIAGNOSTICS: SALINITY KG/S". This does not mean that ncassoc has failed, but the fields it constructs will have a stash code and field code of zero. You can remedy this by supplying your own translation table to supplement the standard one. See the online help for codename for instructions on doing this.

If ncassoc fails to interpret a netCDF file or misinterprets it, this might be a bug, but it is more likely that the netCDF file is not CF-compliant or contains some error. Diagnosing or fixing these problems may take some familiarity with the conventions. You could first try verifying the file's correctness using the CF checker on the CF home page. If you have to alter the file, this can be done by converting it to plain text with the Unix command ncdump, editing the text, and regenerating the binary netCDF file with ncgen. Alternatively you can perform many useful operations on netCDF files using the NCO utilities.

A netCDF variable can have more than two dimensions; hence it is likely that it might have three (longitude, latitude, vertical or time) or four. The library will interpret such a variable as a 1D or 2D array of PP fields. In the situation where both longitude and latitude are single-valued, which would be the case for an area average, it would be more convenient to have a single timeseries PP field than a vector of PP fields with one data value each. The routine pp_serialise will convert a vector of 1x1 PP fields for a series of times into a single PP timeseries field.

You can use the Met Office IDL library to write CF-netCDF files containing a single data variable. That means the file can have only one quantity in it (characterised by stash code, field code and LBPROC), but this single data variable can contain many PP fields, for different vertical levels and times, if they logically constitute a 3D or 4D field (provided all the fields have the same grid). To write out data, use pp_cfwrite e.g.

IDL> help,z101 ; info in PV-WAVE instead of help
Z101            STRUCT    = -> PP2V5403N288R144XY Array[20]
IDL> stashlist,z101
Submodel Stash Description
   OCEAN   101 potential temperature (ocean) deg.c
IDL> pp_cfwrite,'$HOME/thetao.nc',z101
creates a file with a single data variable containing one 3D field, where z101 is an vector of ocean potential temperature PP fields. The stash code (101 in this case) is used to choose the CF standard name for the variable (sea_water_potential_temperature). The input to pp_cfwrite can be PP headers, rather than PP fields in memory, e.g.
IDL> pp_cfwrite,'$DATADIR/tas.nc',ppa(source,ss(at=3236),/all,/header)
will write out all surface air temperature fields from the PP input source to the CF-netCDF file. In order to do this, the PP fields are read into memory only one at a time. Hence, this operation can be performed for arbitrarily large data variables without exhausting IDL memory.


[ < ] [ > ] Contents

3D fields


A 3D field, such as temperature(longitude,latitude,height), cannot be contained in a single PP field. Instead, it is stored as a 1D vector of PP fields, each being 2D and having a vertical coordinate.

You can read in more than one field at a time with ppa. The commonest need for this is to fetch an entire 3D field e.g.

  IDL> q=ppa(handle,ss(atmos=10,yr=1992),/all,/sort)

fetches all the atmospheric specific humidity fields for 1992 into the variable q, which is a 1D array of PP fields. The /sort keyword is optional, but recommended. It makes sure the PP fields are put in the expected order of levels, which might not be the case if they come from several PP files or metasplit directories.

  IDL> help,q ; info in PV-WAVE
Q STRUCT = -> PP2V8N96R73XY Array[19]
IDL> print,pp_level(q) ; hybrid sigma-pressure coordinate (for old dynamics)
0.996999 0.974956 0.930417 0.869832 0.792228 0.699574
0.599503 0.504522 0.422103 0.354698 0.299752 0.249702
0.199627 0.149501 0.0992468 0.0568542 0.0295943 0.0147972
0.00460588
IDL> print,q.lblev ; model level number
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19

When a 3D field is in memory as a 1D array of (2D) PP fields, you can subscript the array to get a single PP field e.g.

  IDL> pp_contour,q(3) ; to plot model level 4, if they're in order as above

To read in a single one of the levels, you can specify the model level number

  IDL> q=ppa(handle,ss(atmos=10,yr=1992,lblev=4)) ; to get level 4

or you can specify the vertical coordinate. You don't have to be exactly accurate:

  IDL> q=ppa(handle,ss(atmos=10,yr=1992,lev=0.87)) ; to get sigma of about 0.87
IDL> print,pp_level(q)
0.869832
IDL> q=ppa(handle,ss(atmos=10,yr=1992,lev=0.9)) ; not accurate enough
% PPA: No such field(s)

You might also read in fields for many times at once e.g.

  IDL> t=ppa(handle,ss(atmos=3236),/all)
IDL> print,t.lbyr
2099 2109 2119 2129 2139 2149
2159 2169 2179 2199 2209 2219

to fetch all the surface air temperature fields. However, beware trying to read in too much data at once! If the ppa command tries to read hundreds of fields, at HadAM3 resolution it will take many Mbyte of memory. Too much memory usage generally has a bad effect on performance. Special routines are provided for processing a lot of data.

If you refer to the elements of a vector of PP fields, it has an extra dimension for the number of fields, thus

  IDL> help,q(0).lbyr ; info in PV-WAVE
LONG = 1859
IDL> help,q.lblev ; info in PV-WAVE
LONG = Array[19]
IDL> help,q.data ; info in PV-WAVE
FLOAT = Array[96, 73, 19]

Be careful to distinguish the number of elements in a vector of PP fields from the number of elements in the data array of a single PP field. In this example,

  IDL> print,n_elements(q) ; 19 fields
19
IDL> print,n_elements(q(0).data) ; each with 7008 data elements
7008
IDL> print,n_elements(q.data) ; total number of data elements
133152

Although q.data looks like a 3D array, you can't subscript it like one. For instance, index=where(q.data eq 0) will return a correct list of indices to zero values, but q.data(index)=1 will not succeed in setting these points to 1. This is dangerous because it will do something and not generate an error, but not what you expected. That's because (index) is subscripting just the data, which is a 2D array. To deal with these situations, you have to loop over fields and process each in turn.

You can only read many fields into a single variable if they're all on the same grid. This is fine for 3D fields, but (for example), you can't read a temperature field and a velocity field into one variable:

  IDL> stash,[2,4],sub=1
Submodel Stash FC Description
ATMOS 2 56 u compnt of wind after timestep
ATMOS 4 19 theta after timestep
IDL> z=ppa(x,ss(at=[2,4],lblev=1),/all)
% Conflicting data structures: FIELD,.

The CF-netCDF file format is more flexible than PP, and is capable of storing 3- and 4-dimensional data variables. However, when they are interpreted as PP fields by ncassoc, they are made to look like arrays of 2D PP fields, as if they resided in PP files.


[ < ] [ > ] Contents

Time axes, timeseries and day numbers


All PP fields have time information in their metadata (the PP header), usually the start and end of the meaning period e.g. 00:00 on 1 Jan 2004 to 00:00 on 1 Jan 2005 for a mean of the year 2004 (see Appendix on PP headers). It is also possible for time to be one of the two axes of a PP field. For instance, in a Hovmüller plot, time and latitude are the axes. Several UKMO IDL routines (see collapsing dimensions) produce PP fields with time as one of the axes e.g. pa_hovmuller produces a time-latitude PP field from a series of latitude-longitude fields applying to different times. 2D fields with a time axis can be plotted with pp_contour just like any other 2D field, and one of the axes is drawn and labelled as time.

A particularly important kind of field with a time axis is what we refer to as a "PP timeseries field". The field can contain timeseries, for the same set of times, from one or more locations or regions (e.g. daily precipitation amounts for a series of days from a number of observing stations). In a PP timeseries field, y is time and x indexes the individual timeseries in the field. The UM can produce timeseries fields and there are several UKMO IDL routines which make them. For instance,

  IDL> handle=ss_assoc('abbdc.001000') ; a source of decadal mean PP fields
IDL> ; Make a timeseries field of northern hemisphere average temperatures
IDL> ts=pats_area_avg(handle,ss(atmos=3236),region=[0,0,360,90],/box)
IDL> print,ts.lbnpt,ts.lbrow ; one column (timeseries) of 18 rows (times)
1 18
IDL> help,ts.data
FLOAT = Array[1, 18]
IDL> print,reform(ts.data)
292.528 292.851 293.120 293.037 293.245 293.391
293.332 293.322 293.565 293.585 293.569 293.694
293.656 293.889 293.842 293.968 293.876 293.890

For a timeseries field, the PP structure contains an element coords which specifies the region to which the timeseries applies. This is a 3D array subscripted as (timeseries number, 0=low coordinate 1=high coordinate, 0=longitude 1=latitude 2=vertical). Thus, in this example, ts.coords has dimensions (1,2,3); the longitude range of this timeseries is ts.coords(0,0,0) to ts.coords(0,1,0). Their values in this case are -1.875 (1.875 degW) and 358.125 (358.125 deg E), which are coincident, because the whole longitude range is covered by the field. These values are actually the longitude limits of the first and last grid box in each row of the latitude-longitude PP field.

Timeseries can be plotted with tsplot, which draws and labels the x-axis as time. A histogram of the values in a timeseries can be plotted by tshplot.

The routine tsx is used (a) for extracting particular timeseries from a PP field which contains several timeseries (e.g. the timeseries for Exeter alone from a PP field with lbnpt=3 containing timeseries for Oxford, Cambridge and Exeter) and/or (b) for extracting a particular range of dates from a timeseries field. It returns a new timeseries PP field with reduced dimensions.

Since PP fields with a time axis are ordinary PP fields in terms of their structure, most routines for processing PP fields are applicable to them e.g. pp_ff for mathematical operations, pp_regrid for changing the time coordinates by linear interpolation. There are also some routines specially for processing timeseries, including tsderive for differentiation in time, tsintegrate for integration in time, tsfit for fitting a polynomial to the time-dependence, tslump for averaging together into longer time-intervals (e.g. converting a daily timeseries to an annual timeseries), tsfold for calculating a multiannual mean time axis from a time axis spanning several years, tsrecycle for producing an annually repeating time axis spanning several years, tscat for concatenating in time, pp_timeshift for shifting the dates of a time axis to a different year. The last is useful if you want to overplot timeseries from different years on the same time axis.

The coordinates of a time axis are stored as "day numbers" (sometimes called "Julian day numbers"), which means the number of days (including fractions of a day) since a particular time. Double precision is used. The time used depends on the calendar applying to the PP field, as specified by its lbtim element. There are two options:

You can conveniently test the calendar of field with the function isppfield(field,/modeltime).

Day numbers can be converted to "components of time" (year, month, day, hour, minute, second) using jul2dt. This returns variables with a !dt structure, which is native in PV-WAVE and implemented as part of the UKMO library in IDL.

  IDL> x=jul2dt(1.5) ; i.e. 1.5 days since midnight on 13 Sep 1752
IDL> help,x ; info in PV-WAVE
X STRUCT = -> !DT Array[1]
IDL> print,x.year,x.month,x.day,x.hour
1752 9 14 12

The reverse transformation can be done with dt2jul, while var2jul converts components of time supplied individually (not as !dt structures) directly to day numbers.

  IDL> print,var2jul(1752,9,14,12)
1.5000000

All these routines support both real-world and model calendars.

To obtain the time coordinates (as Julian day numbers) of a PP field with a y-axis of time, use pp_coords(/ygen) as usual, or equivalently tsday. To obtain them as !dt structures, use tsdt.


[ < ] [ > ] Contents

Making up your own PP fields


Sometimes instead of reading PP fields from PP files you may wish to create PP fields from from non-PP data the program calculates or reads in from some other kind of file, such an ASCII file. The function makeppfield makes PP fields. The data is provided as a 2D array to make a single PP field, or a 3D array to make a vector of PP fields. For longitude-latitude fields, the data array should have longitude as the first dimension and latitude as the second. Here are two equivalent ways of making a PP field with 1-degree resolution from a data array dimensioned (360,180).

  IDL> help,data ; info instead of help in PV-WAVE
DATA FLOAT = Array[360, 180]
IDL> field=makeppfield(data,zy=-90.5,dy=1,zx=-0.5,dx=1)
IDL> field=makeppfield(data,xcoord=findgen(360)+0.5,ycoord=findgen(180)-89.5,/even)

In this field, the first gridline of longitude is at 0.5E, and the first gridline of latitude at 89.5S.

If the data array has missing data in it, the value of the missing data indicator should be specified with the mdi keyword.

If the data is a time-mean, it is strongly recommended to provide the beginning and end of the meaning period. For instance, for a mean from 1 Dec 2003 to 1 Mar 2004, using the model calendar, this would be done with makeppfield keywords dt=var2dt([2003,2004],[12,3],/modeltime).

If the field is a cross-section, rather than latitude-longitude, use the code keyword to specify its LBCODE. See the Appendix on PP headers for information about LBCODE.

If the field is a timeseries, specify keyword /timeseries and supply the time coordinates (as day numbers) through the ycoord keyword. If it's a single timeseries, the data array should still be 2D, with the first dimension having size unity.


[ < ] [ > ] Contents

PP output


To write one or more PP fields contained in an IDL variable to a single PP file in one operation, you can use ppw e.g.

  IDL> ppw,heat,'$HOME/heat.pp'

It is advisable to specify the directory, as done here. If it is not specified (heat.pp) the file is created in the first directory of the !pp_path.

To write a number of variables to a file in more than one operation, you must first open the file, then write the variables, then close the file.

  IDL> ppop,lun,'$HOME/various.pp',/write,/get_lun ; chooses a spare unit
IDL> writepp,lun,heat
IDL> writepp,lun,freshwater
IDL> free_lun,lun ; closes file and frees the unit

To write variables to a metasplit directory, use ssw. If it does not exist, it must first be created with ss_mkdir. To create the pph file in the directory (see Appendix), either use the /header option of ssw, or execute makepph afterwards.


[ < ] [ > ] Contents

Graphics routines, especially pp_contour


The IDL command for "1D" plots (graphs, timeseries, etc.) is plot. The UKMO library offers a command plotmdi, which has all the functionality of plot and in addition

The UKMO library offers a procedure linekey for drawing the legend (line style, colour, symbol, etc.) for line plots. This routine may be more convenient than the IDL standard commands such as legend.

The UKMO includes the following routines for drawing graphics from PP fields:

pp_contour is a particularly complex routine with many options. It can draw line contours, colour- or pattern-filled contours and "block" plots (drawing an individual rectangle for each gridbox). By default it constructs a title for the plot from the PP field metadata (the PP header) and draws a coastline on a latitude-longitude fields. For block and filled colour contours, a colour bar "legend" is drawn underneath the plot.

Frequently useful pp_contour keywords are /filled, /block, interval, base_longitude, levels, nlevels, /overplot, /noerase, /spline, title, fline, bstyle. The last two of these expect information from two ancillary routines. fline is for specifying pattern-fill for contours (with lines and symbols) and should be given the output of shadestyle e.g. fline=shadestyle(less=0,spacing=0.1,ori=45) for shading values less than 0 with lines at 45 degrees. (This doesn't seem to work in IDL at present.) bstyle is for specifying options to do with choice of colours and the colour bar legend via the function style e.g. bstyle=style(/interpolate) to have colours chosen automatically at even spacing through the colour table so, for example, if you have a rainbow colour table loaded, the first colour will be red, the last violet and the others spread through the intermediate yellow and green. See the section on colour for further discussion.

Some colour PostScript printers don't print exactly the right colours in the colour bar of a pp_contour block plot. In Reading, harold and james have this problem, but flash is OK. The problem arises because IDL uses different PostScript commands in the plot and in the colour bar, and the printer interprets them differently even though the colours are specified identically.

pp_contour can be used to make a movie from a series of PP fields. It has its own movie file format, which can be played with play_movie or converted into an animated GIF with gif4web.

To make a contour plot on some other map projection, use pp_transform e.g.

  IDL> pp_contour,pp_transform(field,7,pole_coords=[90,0],sf=0.65)

for a north-polar stereographic contour plot. The scale factor sf is used to contract the field of view.


[ < ] [ > ] Contents

Colour


Colour tables

The UKMO library deals with colour by means of "colour tables". A colour table defines a set of colours, each in terms of red, green, and blue. Routines producing graphics e.g. plotmdi, pp_contour refers to colours in the table by colour indices. IDL has a number of built-in colour tables. The UKMO library provides others, and users can design their own. Colour indices have values in the range 0-255 i.e. the colour table has 256 entries, and colour indices must be specified to UKMO routines as bytes or 2-byte integers. (See below concerning 4-byte RGB colours.) Note that "colour" is spelt color in the names of routines and keywords.

The command tek_color loads a built-in colour table whose first few colours are 0=black, 1=white, 2=red, 3=green, 4=blue, 5=cyan, 6=magenta, 7=yellow. For example, the following commands:

  IDL> tek_color
IDL> plotmdi,indgen(11) ; plotmdi is an UKMO routine similar to plot
IDL> plotmdi,/over,10-indgen(11),color=2
IDL> plotmdi,/over,[0,0],[0,10],color=3
IDL> plotmdi,/over,[10,10],[0,10],color=4

draw lines in various colours. If no colour keyword is given, the colour !p.color is used. This is called the "foreground" colour, and on the X device (output to the terminal) is usually set to the last colour in the table, which is white when IDL is started. The size of the currently loaded table is !d.table_size, so the index of the last colour is !d.table_size-1. The background of the plot is filled in with the background colour !p.background.

Each graphics device has its own colour table. Hence you may have to reload the colour table when you change graphics device, for instance after you use pr to begin PostScript output. On returning to X with prend, the colour table you had before pr will still be in effect.

Discrete and continuous colour tables

tek_color can be called a "discrete" colour table, because adjacent colours are very different. It doesn't make sense to interpolate between colours in such a table. By contrast, restore_colors,'shortbow' loads an UKMO colour table which runs through the rainbow from violet to red. This is a "continuous" colour table, because you could sensibly interpolate between its entries to get intermediate colours. By default, restore_colors interpolates the colour table when it reads it in to use all the available colours on the current graphics device e.g.

  IDL> restore_colors,'shortbow'
% RESTORE_COLOURS: Loading /opt/local/cgam/idl/ukmo_wave/data/shortbow.clr
% RESTORE_COLOURS: Interpolating to 100 colours from 212 red 212 green 212
blue

restore_colors assumes that the first colour in the table is black and the last white (for background and foreground) so it does not interpolate between those and the adjacent colours. When loading a discrete colour table with restore_colors, specify the /nointerp keyword.

Other continuous colour tables are available through the IDL command loadct. These are built in to IDL. For instance

  IDL> loadct,0 ; Loads a grey colour table which runs from black to white
% LOADCT: Loading table B-W LINEAR
IDL> loadct,10
% LOADCT: Loading table GREEN-PINK
IDL> loadct ; with no colour table, prints a list of the available choices

The UKMO routine style allows colours in continuous colour tables to be specified by "fractional" indices for pp_contour. For instance

  IDL> pp_contour,field,bstyle=style([0.5,1.0],/interp,/fract),/block

chooses colours from half-way along the colour table to its end, which is from green to red for the shortbow table. This behaviour is convenient for continuous colour tables because the actual indices to be used will depend on the size of the table as loaded into the current graphics device.

Used-defined colour tables

To modify the current colour table in memory, use get_colors and put_colors. To save a colour table to a file, use save_colors. To reload it, use restore_colors; the file must be in a directory listed in your !data_path.

Specifying true colours to IDL and PV-WAVE

By default, the UKMO IDL library is run in true colour. That means colours are specified to IDL graphics routines not as colour indices, but as long integers with values red*256^2 + green*256 + blue, where red, green and blue are numbers in the range 0-255. The same applies to the settings of the !p.color, !p.background and any other system variables that specify a colour. For IDL commands (not UKMO routines) such as plot which have keywords specifying colours, color indices cannot be used in true-colour mode; they have to be translated to RGB true-colour values using the function ukmo_color_convert. Hence, if plot and oplot were used instead of plotmdi in the above example, we would code color=ukmo_color_convert(2) instead of color=2. Note that !p.color, the foreground colour, is a 4-byte true-colour value. Hence a vector of colours such as [!p.color,2,3] would be a vector of 4-byte integers and would not be interpreted as colour indices. The latter must be coded instead as [!p.color,ukmo_color_convert([2,3])].

If IDL is run in 8-bit pseudocolour mode (not available for PV-WAVE on Linux), rather than true-colour mode, the IDL commands expect colour indices. Colour indices can be used even in true-colour modes in IDL (but not PV-WAVE); this behaviour is selected with device,decomposed=0. The Met Office library can be run in pseudocolour mode by starting IDL with midl -p.

Colours not looking as you expect them to

If you use pseudocolour rather then true colour, on some systems there can be conflicts between different applications using colours, with the result that IDL may not be able to get as large a colour table as it wants to use. This should not be a problem on Linux. If you have problems of this kind, whose symptom is that the colours don't appear as you expect them to in the IDL windows, first try closing down other colour-hungry applications such as netscape or ghostview and then restart IDL.

If that doesn't work, you may be able to use a private colour table, if your X server supports it. That means the IDL windows will appear in the right colours when the focus is on them (when you've clicked in the window, for instance), but while you focus on a window belonging to another application, such as netscape, the IDL windows will look very strange. In Unix do

  $ export WAVE_MAX_COLORS=256
$ export WAVE_ALLOW_PRIVATE_COLS=1

and then restart IDL.

Conflicts between colours can mean also that ghostview/gv can't get the colours it needs to preview a plot when you run prend,/view. Again, closing down netscape or other colourful windows may help.

If you can't see expected differences between colours on a PostScript plot it may be because the difference between the colors is too small for ghostview to distinguish. You can confirm this by using a different colour table such as tek_color, in which the colours are so stridently different that the truth must come to light.


[ < ] [ > ] Contents

Mathematical computations on fields


To add together PP fields f1 and f2, it would be convenient if we could do f3=f1+f2, for instance. Unfortunately this is not possible because IDL doesn't support mathematical operations on structures. Instead, we have to use a slightly less convenient but equally flexible method.

The function pp_ff is used to do mathematical computation with PP fields. It evaluates a mathematical expression in which the terms can be PP fields or ordinary numbers, and it returns PP fields as the result. The mathematical expression is given as a string. It can contain any IDL operators, functions and system variables. Variables a, b, c, etc. appearing in the expression stand for the first, second, third, etc. argument to pp_ff following the expression. For instance,

  IDL> f2=pp_ff('a*2',f1) ; doubles the values in field f1 to produce f2
IDL> f3=pp_ff('a+b',f1,f2) ; adds fields f1 and f2 together, point by point
IDL> number=7
IDL> f4=pp_ff('a*sin(b)',number,f1) ; f4 = 7*sin(f1), sin expects radians

If any of the arguments (after the expression) are vectors (of PP fields or numbers), the result is a vector of PP fields. In this case each PP field in the result vector is computed using the corresponding elements of the input vectors. For instance, in the f3 example, if f1 is a single PP field and f2 is a 10-element vector of PP fields, f3 will be a 10-element vector, in which f3(0)=f1+f2(0), f3(1)=f1+f2(1), etc. In the f4 example, if we had number=[7,8], the result f4 would be a 2-element vector of PP fields, with f4(0)=7*sin(f1) and f4(1)=8*sin(f1).

When there is more than one PP field input, missing data is returned in the output PP field at any point where any of the corresponding input PP fields has missing data. So, for example, if f1.data(5,6) is missing data, f3(0).data(5,6) will be missing data, regardless of whether f2(0).data(5,6) is missing data. This behaviour of pp_ff is widely used in the UKMO library for imposing masks, like this:

  IDL> f2=pp_ff('a',f1,mask,/quiet)

where f1 and mask are PP fields. Since the mathematical expression does not mention b, its non-missing data values are irrelevant, and the non-missing data values of f2 will be equal to those of f1. However, f2 will have missing data wherever mask has missing data, as well as where f1 has missing data. (The /quiet keyword suppresses a warning message that the input fields have missing data at different points.) You can try this out with f1 being a field of surface air temperature (stash code 3236) and mask one of soil moisture content (8208). The result f2 will be a field of surface air temperature over land, and missing data over sea, where the soil moisture field has missing data.

There are some rules about possible inputs to pp_ff:

The keyword /math_fix gets rid of illegal numbers that result from the calculation (e.g. when dividing by zero), replacing them with missing data. In the Met Office IDL version of pp_ff, the keyword /fortran causes the computation to be carried out by a Fortran subroutine which pp_ff constructs, compiles and calls. This is quicker for large computations. In that case, the mathematical expression must be valid in Fortran.

A few common mathematical operations have special functions for convenience:

  IDL> f5=pp_diff(f1,f2) ; is a shorthand for f5=pp_ff('a-b',f1,f2)
IDL> f6=pp_quotient0(f1,f2) ; pp_ff('a/b',f1,f2) with check for divide by 0
IDL> f7=pp_mag(f1,f2) ; calculates vector magnitude sqrt(f1^2+f2^2)


[ < ] [ > ] Contents

Collapsing dimensions


A large group of routines perform operations that can generally be described as "collapsing dimensions". For instance, computing a zonal cross-section of a longitude-latitude-height field (a vector of longitude-latitude fields, of course, since PP fields are 2D) "collapses" the longitude dimension by averaging over it, and produces a 2D field (latitude-height). Averaging over time similarly "collapses" a time dimension: from a longitude-latitude-time field it produces a longitude-latitude field. Computing the area-average of a longitude-latitude field produces a single number; in effect this has collapsed both the longitude and latitude dimensions.

The routines whose names begin with pa are efficient for processing large amounts of data (though they can be used for small amounts as well). Many pa routines have region and/or mask keywords. Note that all routines which average or integrate over longitude and latitude support the /box keyword, which implies weighting by grid-box areas. This option is strongly recommended, because it is not the default behaviour for fields with lbcode=1 in the PP header, as is always (although inappropriately) coded by the UM. (See help information on pp_area for details.)

Many routines in this table will also work for cross-sections (i.e. where x- and y-dimensions of the PP fields are not longitude and latitude). For example, given a depth-latitude cross-section of ocean temperatures, pp_area_avg can calculate the average on the section. The weights applied along the axes are just the cell-widths in the appropriate coordinate.

Dimensions affected Routine Effect
longitude-latitude -> a number pp_area_avg() Area-weighted average
pp_area_total() Area-weighted integral
longitude-latitude -> latitude pp_zonm([/total]) Returns a vector of zonal means [integrals], without latitude coordinates attached
pp_zonal_mean[,/total] Plots the zonal mean [integral] of the field.
pp_cross_sect([/total]) Returns zonal-mean [-integral] with one column (lbnpt=1) and latitude as the y-coordinate
longitude-latitude -> longitude pp_zonm(/ew[,/total]) Returns a vector of meridional means [integrals], without longitude coordinates attached
pp_zonal_mean,/meridional[,/total] Plots the meridional mean [integral] of the field.
pp_cross_sect(/meridional[,/total]) Returns meridional-mean [-integral] with one column (lbnpt=1) and longitude as the y-coordinate
longitude-latitude-vertical -> a number pp_vol_avg([/total]) Volume-weighted average [integral]
longitude-latitude-vertical -> latitude-vertical pp_cross_sect([/total]) Zonal-mean [-integral] cross-section
longitude-latitude-vertical -> longitude-vertical pp_cross_sect(/meridional[,/total]) Meridional-mean [-integral] cross-section
longitude-latitude-time -> latitude-time pp_cross_sect(/time[,/total])
or pa_hovmuller([/total])
Zonal mean [area-weighted integral] as a function of latitude and time
longitude-latitude-time -> longitude-time pp_cross_sect(/time,/meridional $
[,/total])
or pa_hovmuller(/meridional[,/total])
Meridional mean [area-weighted integral] as a function of longitude and time
longitude-latitude-vertical-time -> vertical-time pa_time_depth([/total]) Area-weighted mean [integral] as a function of vertical coordinate and time
longitude-latitude-time -> time tsarea_avg([/total])
or pats_area_avg([/total])
Timeseries of area-weighted average [integral]
tsextract_pt() or pats_extract_pt() Timeseries of particular longitude-latitude points
longitude-latitude-time -> longitude-latitude pp_avg(/infer) or pa_timemean() Time average (in fact the first two dimensions don't have to be longitude and latitude; the input could be a vector of cross-section fields)
longitude-latitude-vertical -> longitude-latitude pp_zavg Vertical average weighted by layer thickness
pp_ztotal Vertical integral weighted by layer thickness
longitude-latitude-vertical-time -> time pats_vol_avg([/total]) Timeseries of volume-weighted mean [integral]


[ < ] [ > ] Contents

Statistics


Some functions which calculate statistics from PP fields:

Routine Effect
pp_avg Calculate a mean PP field from a vector of PP fields i.e. each data element of the result is the mean of the corresponding elements from the vector of fields.
pp_total Calculate the sum PP field from a vector of PP fields.
pp_stdev Calculate a standard deviation field from a vector of PP fields.
pp_extrema Calculate a maximum and/or minimum PP field from a vector of PP fields.
pp_correlate Calculate the correlation coefficient (often temporal correlation) at each point between two vectors of PP fields, or the spatial correlation between two PP fields, or the correlation between one point and all other points in a vector of PP fields (for teleconnection).
pp_histogram Compute a histogram or PDF of values in a PP field.
pp_2d_histogram Compute a 2D histogram or PDF (to be plotted with pp_contour) of values in a a pair of PP fields.
pp_area_avg and pp_area_total Calculate the mean and sum of a PP field. This may also be regarded as a collapse over both axes of the field.
pp_stats Calculate several statistics of a PP field: mean (the same as pp_area_avg), standard deviation, maximum, minimum and proportion of field which is missing data.
pp_proportion Find the area-fraction of a field where a condition is satisfied.

This list is far from exhaustive. The library also contains routines for statistical distributions (e.g. t and F), EOFs, analysis of variance, power spectra, covariance, skill scores.


[ < ] [ > ] Contents

Some other PP processing operations


Extracting a region

To restrict the latitude--longitude range of a field (for instance, before you plot it with pp_contour), use pp_extract. The region is specified as a four-element vector [west,south,east,north]. The longitude limits west and east are positive for degrees east. Any values which are equal under modulo 360 refer to the same longitude. For instance, longitude ranges -360 to -180 and 0 to 180 are equivalent (both specify longitudes east of Greenwich and west of the date line). The latitude limits south and north are positive for degrees north.

For instance,

      IDL> smallfield=pp_extract(bigfield,[10,50,40,80])

If the input bigfield is global, the result smallfield is a field which covers the longitude range 10E to 40E, latitude 50N to 80N. pp_extract actually extracts the subset of the input field grid points which fall inside the specified region. Hence, if the input was not global in longitude but covered 30E to 90E, the output field would have longitude range 30E to 40E.

Many UKMO routines, including ppa and pa routines, have a region keyword, which is a four-element vector of the same form as the argument to pp_extract.

pp_extract also works for cross-section fields, if appropriate limits in the x- and y-coordinates are specified.

Slices

As well as zonal or meridional mean cross-sections, pp_cross_sect can also extract slices along latitude or longitude circles. For instance, if fields is a vector of latitude-longitude fields on model levels,

  IDL> slice=pp_cross_sect(fields,long=0)

slice is a latitude-level slice along the Greenwich meridian.

Regridding

Bilinear interpolation to change the 2D grid of PP fields can be done with pp_regrid. E.g.

  IDL> ; interpolate old to grid of like
IDL> new=pp_regrid(old,like)
IDL> ; change to latitude coords of like, leave longitude unchanged
IDL> new=pp_regrid(old,ycoord=pp_coords(like,/ygen))
IDL> ; change to longitude spacing of 1 deg, leave latitude unchanged
IDL> new=pp_regrid(old,xcoord=findgen(360))

It also works with cross-section fields.

In PV-WAVE, pp_regrid is able to transform between lat-long grids with true and rotated North Pole. The required Fortran program has not been rewritten for IDL.

In PV-WAVE, pp_regrid_avg is an alternative for changing to 2D grid, by area-averaging. It uses the area-averaging routines from the UM to do this, but the link to IDL has not yet been written.

pp_relayer can be used to change the vertical coordinate of a 3D field e.g. from depth coordinates to density coordinates in the ocean. pp_isolayer may be used first to find the new coordinate surfaces in terms of the old vertical coordinate.

Differential operators

Differentiation in one spatial dimension can be done with pp_derive, and in time for fields with a time-axis by tsderive. Vector differential operators are implemented as pp_grad, pp_div, pp_curl and pp_laplace

.

Polynomial fits

pp_fit will fit a polynomial function at each point to a vector of PP fields. tsfit will fit a polynomial in time to each point in a timeseries PP field.

PP field utilities

pp_comparable tests whether two PP fields are on the same grid. isppfield can carry out a large number of tests on the properties of a PP field.

Locating gridboxes

whichbox will tell you which gridbox of a PP field contains a point whose coordinates you specify, and whereis will tell you the coordinates of a point whose indices you specify.


[ < ] [ > ] Contents

Processing a lot of data


If you want to carry out an operation on a number of fields, for instance averaging them together (with pp_avg), the obvious thing to do is read them all into memory with ppa and then process them. The problem with this, however, is that if there are a lot of fields, this may be very slow or may even cause IDL or your workstation to crash. There is a set of routines intended for bulk processing of large amounts of data. Their names begin with pa. Their advantage and approach is that they read in the fields one at a time as they do the processing.

The pa routines will accept any of three kinds of alternative input:

Headers and handles "imply" PP fields contained in the files and/or metasplit directories they point to. In the reference information for pa routines, these alternatives are referred to together as a "source" of PP fields. Such routines also usually allow an optional second argument which is a "search expression" of the kind accepted by ppa and constructed by ss. The expression implies that the pa routine will process the subset of the PP fields from the source which satisfy the expression. For instance,

  IDL> zc=ss_assoc('aaxzc.001000')
IDL> meanppn=pa_timemean(zc,ss(at=5216))

calculates the time mean of precipitation fields (stash code 5216) from the directory aaxzc.001000. This will produce the same result as

  IDL> zc=ss_assoc('aaxzc.001000')
IDL> ppn=ppa(zc,ss(at=5216),/all)
IDL> meanppn=pp_avg(ppn,/infer) ; ... or ...
IDL> meanppn=pa_timemean(ppn) ; pa_timemean works with PP fields as well

If a handle and search expression are passed to pa_timemean, however, it reads in the fields one at a time as it does the processing.


[ < ] [ > ] Contents

Naming conventions of Met Office library routines


As it may help navigate the library, the following naming conventions are worth pointing out.

pp_ and ts routines

Almost all routines whose names begin pp_ require PP fields as input. Routines whose names begin ts require timeseries PP fields as input.

pa_ and pats_ routines

Routines whose names begin pa_ or pats_ (these are in the Hadley Centre applications library rather than the core) and a few of the pp_ routines (e.g. pp_contour) are intended for processing large amounts of data. The pats_ routines return timeseries PP fields; the pa_ are less specific.

ukmo_ routines

Most routines whose names start ukmo_ are Met Office routines which have the same general purpose as the IDL standard routines without the ukmo_. For instance, ukmo_histogram is a Met Office routine that has the same purpose as histogram but is more powerful.


[ < ] Contents

Met Office system variables


System variables have names beginning ! and are available to all routines. For instance, a particularly useful one is !p, which controls many aspects of plotting by IDL. The Met Office setup defines a number of IDL system variables.

Paths

!pp_path, !data_path and !shared_path are copied from the Unix environment variables $WAVE_PP_PATH, $WAVE_DATA_PATH and $WAVE_SHARED_PATH respectively. They are colon-separated lists of directories used by UKMO routines to search respectively for PP files, other (non-PP) data files, and shared object files for linknload/call_external.

UM version and stashcode translations

For translating stash codes to an English description, the UM version has to be selected. The system variable !um_version is copied from the Unix environment variable $UMVERSION. You should ensure this is set in your Unix environment before you start IDL e.g. export UMVERSION=4.5. It is used to deduce the system variable !stashcodesfile, which is a pattern that matches the filenames of the stashmaster files to be read.

If you have private stashmaster files from your own UM experiments and you would like these to be used to translate private stashcodes, you can do this by defining the system variable !userstashcodesfile in your IDL startup file. This should be a pattern which matches the files you want to be read e.g. /home/jonathan/etc/sm4.5/*.

Stash code translation is done by several routines, including

  IDL> stash,'omega' ; find all stashcodes whose descriptions mention "omega"
IDL> stash,12201 ; translate stashcode 12201

CF-netCDF standard_name and long_name translations

The routine codename converts between such names and PP stash codes and field codes e.g.

  IDL> codename,3236,1
m scode name
1 3236 air_temperature
IDL> codename,'precipitation_flux'
m scode name
1 5216 precipitation_flux

The translation tables are stored in the files named by the system variable !codenamefile, which is a blank-separated list of filenames, such as "stashstdname thcmiplongnames". Each name should specify only the basename of the file, not the directory. Each file should be in a directory listed in the !data_path.

The codename files are plain text. Each line has up to six fields, separated by !. The relevant fields for the translation are: first - submodel (0 for PP field code, 1 for atmosphere stash code, 2 for ocean stash code), second - PP field code or stash code, fifth - standard_name or long_name which the code corresponds to. E.g.

  1!3236!!!air_temperature
2!32209!U COMPONENT OF ICE VELOCITY!!eastward_sea_ice_velocity

PP graphics control

The system variable !ppp controls miscellaneous aspects of various UKMO PP graphics routines, especially pp_contour. See IDL> man,'defpp' for details and enter WAVE> info,!ppp,/structure or IDL> help,!ppp,/structure to see its current settings. Its default value is stored in !ppp_default.

For example, you can make pp_contour plots larger by reducing the number of lines allowed for a title (default is 4) and/or eliminating the space for the colour bar if you're only doing monochrome plots.

  IDL> !ppp.title_lines=1
IDL> !ppp.want_bar=0

PP field code translations and default axis directions

When no stash code is available, the PP field code LBFC from the field header is used instead. Translations for these are stored in the file $WU_DATA/pp_codes. You can add or alter PP field code translations using set_pp_code e.g.

  IDL> set_pp_code,'codes',['1602','steric height/m'] ; (re)defines LBFC=1602

Another possibility is to set the defaults for whether particular kinds of axis are plotted with decreasing or increasing coordinates e.g.

  set_pp_code,'cross',['Latitude','10','176','0','0']

redefines the entry for latitude so that it is plotted increasing to the right when it is on the x-axis. The default is increasing to the left i.e. north on the left. This is changed by the last entry in the [] vector, whose default value is '1'. See man,'set_pp_code' for details. If you want to make changes to PP code translations automatically whenever you start a session, write a IDL script of set_pp_code commands and define the name of the script in the Unix environment with export WAVE_CODES_SCRIPT=filename before you start IDL.

PP control

The system variable !pp controls miscellaneous aspects of various UKMO PP routines. See IDL> man,'defpp' for details and enter WAVE> info,!pp,/structure or IDL> help,!pp,/structure to see its current settings. A particularly useful element of the structure is !pp.last_search, which contains the last search expression evaluated by ppa and similar routines when looking for a PP field. It can be useful to look at it after you have had a failure of this type:

  % PPA: No such field(s)

as it will tell you what field could not be found. The element !pp.rmdi is the default missing data indicator

.

Physical constants

System variable !met_values contains some general physical constants and !um_values the values of some constants used in the UM. See IDL> man,'setup_consts' and IDL> man,'um_values' respectively for details, and enter WAVE> info,variable,/structure or IDL> help,variable,/structure to see their values.