Command-line access to HEASARC data holdings

The HEASARC Archive can be more easily browsed through the web tools such as Xamin, but once a user has identified the datasets of interest, there are other ways to retrieve them. In particular, for automating the retrieval of large numbers of observations, directories, or files, command line tools remain useful. We summarize the options here.

(In addition to GUI interfaces like Xamin, you can also access the database with the standard application programming interfaces (APIs), including the flexibility of SQL queries to bring complicated query logic directly to the database. See also tutorials for how to access these APIs in Python. But the rest of this document is about bulk downloads from the command line, not querying the database from the command line.)

Note that the situation changed in 2019 when all unencrypted access to the archive was discontinued. This page describes the current situation, based in part on our transition guide.

The HEASARC supports encrypted access through HTTPS or FTPS. For many cases, the protocols will be effectively the same, but in a few cases, the server may respond slightly differently. And different tools will behave differently depending on the specified address as described below. So there are a number of permutations.

To replace the old anonymous FTP, HEASARC will support explicit, passive-mode FTPS connections. Implicit FTPS is not supported, nor is active mode FTP. To replace the unencrypted HTTP, it suffices to change addresses to HTTPS (which will be done automatically in most browsers as well as in wget).

There are several command-line tools that can make use of these protocols such as wget and curl. Note, however, that they require different specifications of the address to get the protocol correct. Specifically, wget uses the ftps://heasarc.gsfc.nasa.gov address, while curl uses ftp://heasarc.gsfc.nasa.gov along with an additional flag to specify to use a secure connection. Alternatively, both tools can use the HTTPS protocol, which is how the Xamin tool generates download scripts.

Here is a summary of the available options:

  • Recommended: we provide a handy download script written in Perl that wraps the wget command and has a few options for getting a directory, a range of directories, a single file, or a wild-card match, etc. It has its own help page, or you can simply download it and call it without arguments to see its usage.
    For example, to download a directory, it is simply:
    download_wget.pl https://heasarc.gsfc.nasa.gov/FTP/nicer/data/obs/2018_01/1050020180/

    and there are options to specify a range of directories by number or files with a wild-card. If the transfer is interrupted and you call the same command again, it will pick up where it left off.

  • Using wget
    • Get a single file with wget:
      wget https://heasarc.gsfc.nasa.gov/FTP/asca/README
      which works identically with "ftps" or "http" instead of "https" (but not "ftp").

    • Get an entire directory with wget:
      wget -q -nH --no-check-certificate --cut-dirs=4 -r -l0 -c -N -np -R 'index*' \
           -erobots=off --retr-symlinks \
           https://heasarc.gsfc.nasa.gov/FTP/asca/data/rev2//10021000/
      See its man pages for the options, but a few to note are:
      • --cut-dirs tells it to remove that many parent directories, so in this case of 4, it will write only a directory called 10021000;
      • -r means recursive, which means it gets everything underneath to a default depth of 5 levels, but ...
      • -l0 tells it not to use any maximum depth, to grab everything underneath;
      • -c is to continue an interrupted job;
      • --retr-symlinks make sure it follows symbolic links to retrieve the linked-to files;
      • etc.


  • Using curl
    Get a single file with curl:
    curl -O https://heasarc.gsfc.nasa.gov/FTP/README
    which also works with the address specified as "ftp" with the --ftp-ssl flag (i.e., it does not work with "ftps" with or without the flag, nor with "http").

  • (There is no straightforward way of getting an entire directory recursively with curl.)

  • There is also a tool called lftp that you can install that provides a number of useful features including re-connection and re-trying after errors, job queueing, etc.

  • Note that Xamin gives users download scripts using the above wget command, e.g.,
      wget -q -nH -r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks \
      --cut-dirs=4 https://heasarc.gsfc.nasa.gov/FTP/asca/data/rev2//10021000/

  • Note that Browse gives users download scripts using the above wget command, e.g.,
      wget -q -nH --no-check-certificate --cut-dirs=5 -r -l0 -c -N -np -R 'index*' \
      -erobots=off --retr-symlinks https://heasarc.gsfc.nasa.gov/FTP/xte/data/archive/AO8//P80110/80110-01/.

If you have questions or encounter problems, please use the HEASARC's Feedback form to let us know.


HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Friday, 18-Aug-2023 11:31:15 EDT

HEASARC Staff Scientist Position - Applications are now being accepted for a Staff Scientist with significant experience and interest in the technical aspects of astrophysics research, to work in the High Energy Astrophysics Science Archive Research Center (HEASARC) at NASA Goddard Space Flight Center (GSFC) in Greenbelt, MD. Refer to the AAS Job register for full details.