Opened 3 months ago

Closed 3 months ago

#3450 closed help (fixed)

Suite stalled unless inserting "sleep 1" in a script

Reported by: luciana Owned by: um_support
Component: Rose/Cylc Keywords: Suite stalled
Cc: Platform: JASMIN
UM Version:

Description

Hi guys.

I managed to solve the problem I'm having (inserting sleep 1!), but that's really bugging me. Why doesn't work without it?

Full suite: /home/users/lucy/cylc-esdm/eios-create-io-cylc-file

Output: /home/users/lucy/cylc-run/read-file-test

File with "sleep 1" at the end: create-io-cylc.sh

My tests with simpler examples actually work without that hack: /home/users/lucy/test-read-file.

Kind regards.

Luciana.

Change History (5)

comment:1 Changed 3 months ago by ros

Hi Luciana,

The effect of running sleep 1 (or any other successful command) at the end of the create-io-cylc.sh script changes the exit code of create-io-cylc.sh to 0. When you don't have that extra command you will see in the logs (job-activity.log) that the post task actually fails. The exit code is 1 to indicate an error.

Adding to bottom of create-io-cylc.sh shows the problem:

echo exit code: $?
sleep 1
echo exit code after sleep: $?

In 20210101T0000Z/post/01/job.out`:

...
FILENAME = /home/users/rshatcher/cylc-run/create-io-no-sleep/post/io.cylc
exit code: 1
exit code after sleep: 0

So you need to look at what the loop is doing at the end of that script.

Regards,
Ros.

comment:2 Changed 3 months ago by luciana

Dear Ros.

After exhaustive tests, I still can't identify where the problem lies.

My tests are in this directory: /home/users/lucy/test-read-file.

  • eios-create-io-cylc-file: original directory
  • eios1: scripts moved to the main directory and files that are not being used were deleted
  • eios2: same scripts, but without Cylc environment variables
  • eios3: same scripts, but with Cylc environment variables defined in run-script-create-io-cylc.sh
  • run-dir: copy of the cylc-run directory

I'm really trying to mimic everything to the same conditions I have when running Cylc to detect where the problem lies. I'm calling ./run-script-create-io-cylc.sh in the current directory. The code in eios2 and eios3 runs without sleep 1, it produces the file I need correctly (io.cylc), even with exit code: 1 from the function create-io-cylc.sh.

I'm lost. I hope you can help me. It works, but clearly, something is not right.

Kind regards.

Luciana.

comment:3 Changed 3 months ago by luciana

Some extra thoughts:

  • Because the script create-io-cylc.sh is the last command in run-script-create-io-cylc.sh, that might be the reason exit code: 1 was not a problem in my tests: nothing happens after that in the minimal example. But that doesn't explain why the file io.cylc is not properly created when running the full suite; and the file is correct here.
  • When running the original suite without sleep 1, the suite expects the input file name from the keyboard. I also have no clue why this happens. You can type something and then io.cylc will be created with what you typed as an input. The file files.txt is completely ignored.
  • The command to read the file line by line was taken from here: Bash Read a File Line By Line. I've also tried different websites, but it's just more of the same.

Kind regards.

Luciana.

comment:4 Changed 3 months ago by ros

Hi Luciana,

In the loop in create-io-cylc.sh, the last command to have run is ./add-files.sh. In add-files.sh you have an exit 1. As this is the last command run before the post task finishes your exit status for the task post is 1. This indicates task failure to cylc.

Cheers,
Ros.

comment:5 Changed 3 months ago by luciana

  • Resolution set to fixed
  • Status changed from new to closed

Hi Ros.

I just got that idea and fixed the code. Really silly, but at least now it's explained.

Kind regards.

Luciana.

Note: See TracTickets for help on using tickets.