HEASoft Good Scripting Practices For Batch Processing



This page provides some tips on how to optimize a HEASOFT session when writing scripts that invoke HEASOFT tasks, as in the processing of data in an unattended batch environment. Here, a batch environment means a "pipeline" session that is meant to run robustly without human interaction, which doesn't have access to a terminal console, and which often runs in parallel to other pipeline sessions. The following two subsections describe how to handle these cases.

Users who wish to manage parallel processing in Python should review the HEASoftPy documentation instead.

PREVENTING CONSOLE TERMINAL ACCESS

By default, HEASOFT is optimized for interactive processing at the console, where the user is expected to be working with an active terminal. Query prompts for unfilled parameter values will stall when attempting to access the console terminal. In some cases, a HEASOFT tool will try to initialize the console terminal even when no query is necessary. The HEADASNOQUERY and HEADASPROMPT environment variables may be used to disable this behavior:

    # C-shell (csh/tcsh):
    setenv HEADASNOQUERY
    setenv HEADASPROMPT /dev/null

    # Bourne shell (sh/bash):
    export HEADASNOQUERY=
    export HEADASPROMPT=/dev/null

    # Perl:
    $ENV{"HEADASNOQUERY"}=1;

These settings should be activated in a session before calling any tasks. The HEADASNOQUERY variable prevents tasks from querying any missing parameter values. Setting the HEADASPROMPT variable to "/dev/null" redirects prompts to the null stream and prevents tasks from trying to open the console terminal in any situation, thereby avoiding error messages like this:

    Unable to redirect prompts to the /dev/tty (at headas_stdio.c:...)

Also, for batch processing, it is wise to invoke a task by explicitly redirecting /dev/null into the standard input. This will defend against the case where an errant task expects data in the standard input stream, which may cause the task to hang.

   # EXAMPLE:
   /path/to/my/task < /dev/null

PARALLEL BATCH PROCESSING

Most HEASOFT tasks utilize parameter files whose location is managed by the PFILES environment variable. PFILES necessarily consists of two semicolon-separated paths: the first or 'local' path, which is by default set to "$HOME/pfiles" (i.e. a directory named "pfiles" in a user's home directory), defines the location of parameter files which will be modified to reflect any 'learned' parameter values after a given task is run. The second or 'system' path, which is by default - and should always be - set to "$HEADAS/syspfiles", defines the location of the unmodified, as-distributed parameter files from which the local copies will be derived.

While storing 'local' copies of parameter files in a user's home directory is useful for saving preferences for future runs of each task, this is not optimal for parallel processing because multiple instances of a task may attempt to read and write to the parameter file simultaneously, resulting in 'collisions' that usually lead to file corruption and errant behavior.

A simple workaround for this problem is to redefine the PFILES environment variable within your script such that all HEASOFT calls that it spawns use a uniquely dedicated 'local' pfiles directory, for example by using the process ID number ("$$"),

   # C-shell:
   setenv PFILES /tmp/$$.tmp/pfiles;$HEADAS/syspfiles

   # Bourne shell:
   export PFILES="/tmp/$$.tmp/pfiles;$HEADAS/syspfiles"

   # Perl:
   $ENV{"PFILES"}="/tmp/$$.tmp/pfiles;$HEADAS/syspfiles";
where $$.tmp serves the role of a unique identifier for each process. Note that your script will need to create the 'local' pfiles directory (i.e. "mkdir /tmp/$$.tmp/pfiles") prior to letting any tasks use it.

An alternate example is the case of running two pipeline sessions in parallel, where each pipeline should have its own distinct working and "pfiles" directories for processing:

   # Pipeline 1 session initialization
   # ... run standard HEASOFT initialization first ...
   #
   mkdir -p /data/pipeline1/pfiles
   setenv PFILES "/data/pipeline1/pfiles;$HEADAS/syspfiles"
   #
   # ... run first pipeline script ...
   #
   # It's good practice to remove the "pfiles" directory after each session:
   rm -rf /data/pipeline1/pfiles
   #
   # ... and similarly for pipeline 2 ...
   # ...
   # ...



HEASoft / FTOOLS Help Desk

If FTOOLS has been useful in your research, please reference this site (https://heasarc.gsfc.nasa.gov/ftools) and use the ASCL reference for HEASoft [ascl:1408.004] or the ASCL reference for the original FTOOLs paper [ascl:9912.002]:

Blackburn, J. K. 1995, in ASP Conf. Ser., Vol. 77, Astronomical Data Analysis Software and Systems IV, ed. R. A. Shaw, H. E. Payne, and J. J. E. Hayes (San Francisco: ASP), 367.

Web page maintained by: Bryan K. Irby


HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Tuesday, 22-Oct-2024 12:01:13 EDT