|
NOTICE:
This Legacy journal article was published in Volume 1, May 1992, and has not been
updated since publication. Please use the search facility above to find regularly-updated
information about this topic elsewhere on the HEASARC site.
|
An Introduction to the HEASARC
N. E. White
HEASARC
1. Overview
The High Energy Astrophysics Science Archive Research Center, HEASARC, was
created by NASA in 1990 as a site for X-ray and Gamma-ray archival research.
The motivation for the HEASARC is to provide a multi-mission archive for the
high energy data from ROSAT, GRO, BBXRT, Astro-D, and XTE missions, that
coexists with archival data from past missions such as Einstein, HEAO 1, HEAO
3, OSO 8, SAS 2 and 3, Uhuru, and Vela5B. Data from non-US missions, e.g.,
EXOSAT and Ginga, will also be made available as international agreements
allow. The total data volume will be of the order of 1,000 gigabytes by 1995
and the aim is to make these data available on-line for immediate access as
well as by bulk distribution.
The HEASARC is located at the Goddard Space Flight Center and is a
collaboration between Goddard's Laboratory for High Energy Astrophysics, LHEA,
and the NSSDC. The LHEA is responsible for the science content of the archive,
the NSSDC is responsible for the data archive management. The HEASARC data
holding will consist of data from past, concurrent and future missions. The
NSSDC contribution is outlined in the next article by Jim Green, the NSSDC
director. This article will concentrate on the LHEA HEASARC activities (which
currently constitute the bulk of the new funding).
2. Terms of Reference
The terms of reference of the HEASARC are to:
- maintain and disseminate data from previous and concurrent high energy
astrophysics missions,
- provide software and data analysis support for these datasets,
- maintain the necessary scientific and technical expertise for the processing
and interpretation of the data holding,
- develop and maintain tools for combining data from several missions and for
multi-dataset analysis,
- develop and maintain catalogs of observations and ancillary information for
data holdings relevant to that wavelength band,
- coordinate data, software and media standards with other parts of NASA's
Astrophysics Data System, including other
multi-mission centers.
3. Organization
The LHEA part of the HEASARC is under the Office for Guest Investigator
Programs, OGIP, within the LHEA. The OGIP also administers the Compton
Gamma-Ray Observatory Science Support Center (CGRO SSC), and the Guest Observer
Facilities for ROSAT, Astro-D and XTE (Figure 1). The objective of the OGIP is
to provide uniform guest observer support for these missions. The HEASARC forms
a central pillar within the organization, in that it provides the connecting
thread between the various science support facilities. At the end of each of
the various projects the HEASARC will be the final resting place for the
archive and the associated expertise in its analysis.
In setting up the HEASARC there was much concern that it does take on project
responsibilities, and the respective roles of the projects and the HEASARC have
been clearly separated.
The project Data Centers are responsible for:
- archive creation
- delivering all non-proprietary data to HEASARC in FITS format
- providing science expertise to support archival research while project
funding is maintained
The HEASARC provides:
- multi-mission high energy astrophysics archival access
- FITS format standards
- FITS software: i/o and table manipulation tools
At the end of project funding, the HEASARC takes over the science expertise,
probably by transferring a few data center staff to the HEASARC.
For existing data sets (e.g. HEAO 1 and 2) the HEASARC will work to make the
data available in a multi-mission framework. In cases where the original
project is still well-funded, the above rules will apply. For those projects
where there is no longer any project support, the HEASARC will directly apply
resources to make the data available.
Figure 1: LHEA
Organization
4. Requirements
Before discussing how the HEASARC will organize its data holding, it is
worthwhile to consider the motivations for archival research. There are four
distinct categories:
(i) historical studies,
(ii) theoretical follow-up,
(iii) surveys, and
(iv) assurance.
Historical studies are the most obvious archival activity. An observer
discovers a new phenomenon, or is studying one previously known, and needs to
check earlier data to, e.g., independently confirm its existence and/or track
the long-term variability. These studies can be perhaps the most difficult,
since they will involve combining and/or comparing data sets from different
telescopes. The major issues here are gaining easy access to the data, and
cross-instrument calibration. A related activity is to use archival data as
part of a justification to propose to use a new telescope.
Theoretical follow-up is the need to test new models against existing
data. In many cases, the interpretation of a phenomenon can take many years,
with theoreticians repeatedly building models and testing them against the
data. Theoreticians currently have to work closely with the original
investigator to test their models, or make eyeball fits to published data. Here
the major issue is that the theoretician does not have a detailed understanding
of the instrument characteristics or analysis techniques. He or she simply
wants a data product and the associated calibration to test against the model
in a clearly-described, easy-to-read data format.
Surveys provide the opportunity to combine many observations of a single
class of object (e.g., AGN) made by many different investigators using the same
telescope and instrument . The current principal investigator approach to
allocating observation time means that large uniform samples of particular
object types are rarely available to a single observer. Only after the data
enter the public domain can a survey of the properties of a particular class be
made. The main issue here is ensuring that a user can access a sample of all
objects of a particular class.
Assurance is the ability to guarantee both that an observation is
analyzed (and, if appropriate, published) and that unjustified repeat
observations are not made. Observation time on satellites is very limited (and
expensive). Making the data available after some fixed time ensures that all
interested parties in the field get access to that data and that it is
eventually looked at. The issue here is that in many cases an observation may
never be published because the result is not sufficiently noteworthy. It is
essential to provide a simple overview of the main results of the observation
to avoid unnecessary repeated analysis of the raw data.
The four motivations described above place the following requirements on the
HEASARC:
- multi-mission analysis
- hierarchical archive structure
- quick-look capability to assess the value of the data
- vendor-independent data formats
5. Data Analysis
(i) The HEASARC Dilemma
For every mission the data flow is identical. First, the raw data undergoes
some form of data reduction to produce data products -- usually a
photon list, an image, a spectrum, and/or a lightcurve. These are then
analyzed to produce some results which are then, hopefully,
published. While the sequence of events is much the same for each
mission, the dilemma facing the HEASARC is that every mission to date has
produced a data set in a different format, with a different set of analysis
software. This makes the long-term support and distribution of a multi-mission
archive problematic, since every mission is a special case. In addition,
combining datasets from different missions is non-trivial.
Mission-specific formats tend to be used throughout the data processing chain.
The raw telemetry data in many cases involves preprocessing and packing of the
data on the spacecraft so as to maximize the information transmitted to the
ground. There may be multiple telemetry and onboard computer modes which can
add to the complexity of reducing the data. The data products produced by the
data reduction software are more generic, e.g., a lightcurve is a time and a
count rate. However, even for data products, missions typically generate their
own formats. A notable exception is that images have recently begun to be
distributed in FITS format. The results of each mission are also sometimes kept
in mission-specific or vendor-dependent formats, e.g., an INGRES DBMS table.
The data access is limited to a data processing system produced by the project.
These tend to be monolithic systems that are not optimal for long-term
maintenance or general distribution and use by the community.
Specific problems with data processing systems are:
- they are custom built for each mission, even though the underlying functions
are the same
- there is a failure to modularize and isolate the mission-dependent functions
- calibrations and methodology are embedded in the code
- the code is vendor-dependent (e.g., operating system, compiler, DBMS)
The last point is particularly problematic. In the long term it makes
maintenance of the data processing system difficult. The code must repeatedly
be ported to new hardware and software platforms as technology evolves. With so
many different missions in the HEASARC archive this could be a never-ending and
expensive task.
In addition, the user community is becoming increasingly demanding. They
require access to the original raw data, and also want to reduce it from within
a familiar analysis environment, e.g., IRAF, IDL or XANADU.
(ii) The HEASARC solution
The root of the problem is that each mission produces data in different
formats. Many of the data reduction and analysis functions are basically the
same; the driving factor is decoding the different data telemetry and any
mission-specific data product formats. Up to now there has been little, if any
re-use of software between missions. The HEASARC solution is to isolate this
function by reformatting the data to a single standard structure. This should
be self-describing so that the user need only look in the header to be able to
read the file. The FITS standard provides such a capability.
FITS is an IAU and NASA standard for distributing data analysis software, and
there are FITS readers within all the popular environments, e.g., IDL, IRAF,
and MIDAS. The recent adopting of the binary table FITS standard, which
allows the byte structure of each column to be defined in the header, has been
a real breakthrough. It allows compact table structures to be defined which can
mirror the underlying table structures in most data analysis systems such as
MIDAS or IRAF STSDAS tables.
The HEASARC will distribute all useful data as FITS binary tables,
including the telemetry. While at first sight it may seem a formidable problem
to reformat a complex telemetry stream containing science and housekeeping
data, it is actually simpler than having to build from scratch a data reduction
and analysis system. Reformatting the data forces an isolation of the
mission-specific function of decoding the telemetry. The following data
reduction tasks will have both mission-specific and mission-independent
functions. By reformatting the telemetry, it is simple to recycle the
mission-independent functions.
To implement this plan, the HEASARC is taking the following steps: First, the
data reduction system for the next high energy astrophysics mission, Astro-D,
will be constructed so that it forms the basis for a multi-mission
infrastructure. The Astro-D telemetry will be reformatted to FITS and all of
the mission-dependent and independent bits will be isolated (see the article on
Astro-D by Day, Arnaud and White). Second, the HEASARC has begun to reformat
existing telemetry and data products from past missions such as Einstein, HEAO
1, and EXOSAT. The experience learned and FITS file structures defined can be
fed into future missions such as XTE.
To enable both the HEASARC and future missions to reformat to FITS, the HEASARC
is providing a portable FORTRAN 77 subroutine library to write and read FITS
files. This package, called FITSIO, was released earlier this year and has
already proven extremely popular (see the following article by Bill Pence). The
HEASARC is also defining mission-independent FITS file structures for spectra,
lightcurves, and photon lists. These will allow data products to be distributed
transparently between different analysis packages. In particular, the HEASARC
is working with the ROSAT Data Center to define a set of "rationalized" FITS
files for the ROSAT archive. These rationalized files will differ from the
current files in that the structure and keywords will have a multi-mission
flavor.
The HEASARC will not force the community to use one data analysis environment.
Instead it will adopt a policy of ensuring that any HEASARC-produced data
reduction tasks are distributed in ANSI standard code, with the input and
output only operating on FITS files. In addition all parameter checking and
binding will be isolated, so that these packages can be interfaced to the
user's favorite analysis environment.
To facilitate this approach a Data Selector is being produced by the HEASARC,
in collaboration with the Astro-D project, to allow Boolean selections from
FITS tables. This data selector will form the basis of a multi-mission data
reduction system, and will be very similar in concept to the MIDAS and STDAS
table systems. The major advantage of the HEASARC data selector is that it will
directly operate on FITS tables, making the system fully portable. It will be
written in strict FORTRAN 77. The software will isolate the parameter input and
validation from the kernel that actually does the task. This will allow the
selector to run under different analysis environments. The first version will
be built to run under both the IRAF, using the FORTRAN interface, and
XANADU. It will be a trivial matter for other developers to integrate the
selector into their own analysis environments, so long as their environments
have an isolated parameter interface.
The remaining mission-dependent part of any data reduction system is the
calibration data. The HEASARC is defining standard formats for distributing
calibrations. Like the data itself, calibrations can be divided into raw data,
such as a detector energy resolution function, or a telescope point spread
function and calibration products such as a detector response matrix or an
exposure map. The HEASARC will encourage future developers to externally define
all calibration information so that it can be accessed by any data reduction or
analysis system.
6. Data Distribution
Data distribution can be done via on-line access, and by mass distribution,
e.g., via CD-ROM. The HEASARC will provide both methods of access to its data
holding. There will be regular distributions of data on CD-ROMs, primarily of
data products and catalogs from each mission. The first CD-ROM will contain
Einstein SSS spectra and lightcurves and is now close to completion. In
addition there will be remote on-line access to the data.
On-line services such as SIMBAD, NED, IUE, EXOSAT, and Einline are well known
and work well at delivering the data quickly to the user. The disadvantage to
these various services is that each one has a different user interface with
which the user must become familiar. NASA's Astrophysics Data System, ADS uses
a client-server approach to allow remote queries of databases. The archive
sites retain control of the archive contents but will rely on a common user
interface provided by the central organization. The HEASARC is currently
testing its connection to ADS, and should be a fully-functional node by April
1992. Currently ADS only provides the capability to query single catalogs, and
can only be a supplement to the more traditional remote login services. Further
information about ADS can be obtained from IPAC by contacting Mary Wittman at
mew@ipac.caltech.edu.
In addition to ADS the HEASARC provides an on-line service to allow remote
login to the HEASARC data holding and to data analysis software. The
emphasis will be on browsing of the data, such that a user can make a
quick-look assessment of its worth before exporting it, or part of it, to his
or her home site. Rather than invent yet another on-line system, the HEASARC
has adopted an existing system, the one developed for the EXOSAT mission by the
European Space Agency, ESA. The advantage of this system is that it provides
the capability to not only access the data, but also to display and analyze it
remotely.
At the heart of the system is the BROWSE program, a command-driven environment
that allows a user to search one or more database tables by coordinates, name,
object class, or any other valid parameter combination. The user can then
display the selected data, or run analysis software on it. This service is
available now and is described in a following article by Kathy Rhode.
Proceed to the next article
Return to the previous article
Select another article
HEASARC Home |
Observatories |
Archive |
Calibration |
Software |
Tools |
Students/Teachers/Public
Last modified: Monday, 19-Jun-2006 11:40:53 EDT
|