wiki:Docs/MirroringStdAncilFiles

Version 1 (modified by ros, 3 years ago) (diff)

Mirroring Standard Ancillary Files

The UM requires ancillary files to run. These include the land-sea mask, the orography, vegetation and ozone ancillary files, among others.

The Met Office produces and maintains a standard set of ancillary files on their supercomputers in a comprehensive collection of directories known as the ancillary tree. There are sets of ancillary files for global and limited area domains. Currently, the global domains are,

  • n2004
  • n216
  • n216e
  • n320
  • n48e
  • n512
  • n512e
  • n768e
  • n96
  • n96e

and the limited area domains are,

  • e4_11001000_euro
  • m4_288360_uk
  • my_600360_nae
  • ukv

CMS manually mirror these monthly onto ARCHER. This note describes the method.

On ARCHER, the standard ancillary tree is stored in $UMDIR/ancil, a mirror of the corresponding directories at the Met Office.

Currently (Dec 2015) the ancillary tree is 3.7 TB of data.

The Details

The mirroring script is launched on my Met Office PC. The required directories are not visible, but have to be "auto mounted":

cd /cray_hpc/projects/um1

The script is executed as follows,

ssh els049 $HOME/bin/mirror_anc_archer >> archer_ancmir_$(date +%F_%T).log 2>&1

i.e. it is run on the subsidiary machine els049 which ensures that the Met Office Reading/Exeter link is not overloaded. Thus the mirror is directly from Exeter to Edinburgh.

The script mirror_anc_archer:

SSH_ENV=$HOME/.ssh/environment.$(hostname)

. $SSH_ENV

UMDIR=${UMDIR:-/projects/um1}

# need this because the dir is automounted on local machines
cd /cray_hpc/projects/um1

echo "ancil"
echo "====="
cd  ancil
time rsync -az --stats --exclude-from=$HOME/bin/rsync_excludes.txt --partial \
               -e "ssh -qi $HOME/.ssh/id_rsa -o 'BatchMode yes'"  \
               data wmcginty@login.archer.ac.uk:/work/n02/n02/wmcginty/ancil

echo ""
echo "========="
echo""

echo "atmos"
echo "====="
cd atmos
anc_list="KGO e4_11001000_euro master n48e n96 n96e my_600360_nae \
                    m4_288360_uk ukv n320 n216 n216e n512 n512e n768e n2004"
for i in $anc_list
do
  echo $i:
  time rsync -az --stats --exclude-from=$HOME/bin/rsync_excludes.txt --partial \
              -e "ssh -qi $HOME/.ssh/id_rsa -o 'BatchMode yes'" \
              $i wmcginty@login.archer.ac.uk:/work/n02/n02/wmcginty/ancil/atmos
 echo "========"
 echo ""
done 

This sends the data to my ARCHER directory. The $UMDIR/ancil is a link to my ARCHER directory.

When the synchronization is complete, there are some files in the ancillary tree that link to files on the Met Office computers. These are repaired using the clean_link script,

find $DATADIR/ancil -type l -exec clean_link {} \;

The clean links script:

#!/bin/ksh
#
# Author: W. McGinty, NCAS-CMS
# 11th Nov 2015
#
# Corrects links in the ancillary tree after a synch from the 
# Met Office.
#
# Takes a file name, determines if it is a link, and corrects it if
# required.
#
# The Met Office has link targets of the form
#
# /projects/um1/ancil/ancil_versions/mc170130_flk/ancils
#
# which rsync replicates identically on ARCHER.  These need to be
#replaced either with an ARCHER absolute or relative reference.  This
#code uses the absolute method, which is distasteful, but a lot easier
#than working out the relative reference, which may be in another
#branch of the tree.
#
#
# program name
prog=${0##*/}

if  [[ $# != 1 &&  $# != 2 ]]  ; then
  echo "Usage: $prog [-r] file_name" >&2
  echo "The program does a dry run unless the -r option"\
    "is specified, when it corrects the link." >&2
  exit 1
fi

mode="test"
while getopts  "r[run]" opt
do case $opt in
 r)  mode="run";;
 \?)  printf "Usage: %s [-r] [file name]\n" $prog
     exit 1;;
  esac
done
shift $((OPTIND-1))

if (( ${#DATADIR} == 0 )); then
 echo "DATADIR must be specified" >&2
 exit 1
fi 

if [[ ! -d $DATADIR ]] ; then
 echo "$DATADIR must exist and be a directory." .&2
 exit 1
fi


fn=$1

if [[ -L $fn ]]; then

# fn is a link
  tgt=$(readlink $fn)

#Find those which start with an absolute directory
  if [[ $tgt == /projects/um1/* ]]; then

#remove the absolute root /projects/um1/
    branch=${tgt#/projects/um1/}

    if [ $mode == "test" ]; then
# Test it
      echo  "link: " $fn
      echo  "    target is " $tgt
      echo  "    branch is " $branch
      echo  "    New tgt would be " $DATADIR/$branch
      echo  "    basename of link is " $(basename $fn)
      echo  "    dirname of link is " $(dirname $fn)
      echo  "   " ln -sfv $DATADIR/$branch $(basename $fn)
    else
# Correct it
      cd $(dirname $fn)
      ln -sfv $DATADIR/$branch $(basename $fn)
      cd - 1>/dev/null # make sure it is silent
    fi
  fi
fi