HEASARC Catalog Organization and Metadata
The HEASARC uses a standard relational database for its catalog data and
metadata. A relational database is essentially just a set of tables. Each
table consists of a number of rows and columns. The columns of the table may
be of different types -- strings, integers, floating point numbers -- but
each row has the same structure. Some cells in the table may not have a value
defined, these have the special marker value 'null'.
In addition to its contents, each table has associated metadata that
describes it,e.g.,a name, information about who is authorized to access
it, and indexes that allow more efficient searches on the table.
The HEASARC database recognizes three kinds of tables. Metadata tables,
which always begin with 'ZZ' describe tables or archive data. They give
information about the table: the names of the columns, any special meanings
that are associated with the table, the archive data products associated with
the table. The underlying database system also has metadata tables, but to
ensure that the HEASARC software is portable from vendor to vendor, only the
HEASARC metadata tables are referred to in our software. All of the metadata
tables combined form the HEASARC metabase.
Local user catalogs are the tables with information that users may wish to
extract. Most of these can be categorized as object tables, which describe
specific objects in the sky, or observation tables, which describe
observations by a given satellite or instrument. There are some tables of
atomic data and tables of proposal abstracts as well. All of these are stored
within the same relational database system as the metadata tables. Most local
catalogs for historical reasons begin with 'heasarc_'.
Remote user catalogs are used in the same way as local catalogs, but
reside in other database systems. These may include data in the VizieR system
at the CDS, or databases accessed through Virtual Observatory protocols.
Metadata tables may have information about remote catalogs, but it is usually
much less complete than for local catalogs. There may also be remote tables
that the HEASARC software discovers dynamically when making queries, so that
there is no evidence of their existence in the HEASARC metadata tables.
Missing metadata for remote catalogs is gathered dynamically during the query
process.
This section discusses the metadata that is used by the HEASARC software.
The metadata for remote catalogs is not necessarily stored in these tables,
but when gathered dynamically has much the same structure.
Currently, the HEASARC DBMS utilizes the following metadata tables:
- ZZGEN
- describes the overall characteristics of all of the tables that can be
directly accessed through our software. This includes all metadata and
local catalog tables and some remote catalogs. There are remote
catalogs which can be discovered dynamically during a query sessions
that are not included in ZZGEN. ZZGEN contains non-discipline specific
information about the table. As such it typically duplicates
information that is included in database system-specific tables, but
provides it to the HEASARC software systems in a system-independent
way. Different relational database systems store the information in
very different ways. Each table will have one entry in ZZGEN.
- ZZEXT
- describes domain specific extensions to ZZGEN. This is where metadata
describing elements of specific interest to astronomers would normally
be placed, e.g., which columns are RA and DEC, how large is the default
search radius for a cone search, what column contains the start time of
the observation. A single table may have many entries in ZZGEN, each
gives a single special characteristic for that table. The overall
characteristics of the table are the concatenation of it's single ZZGEN
entry and all of its ZZEXT entries.
- ZZPAR
- describes the parameters of the table. This information is usually also
available in a system-specific table, but gathered here to provide a
standard way to access it. There will normally be one entry in ZZPAR
for each column of each table.
- ZZLINK
- describes links between tables. It shows how given an entry in one
table, one or more entries in another table may be linked to it. There
may be 0 or many ZZLINK entries for a given table.
- ZZWORDS
- lists the keywords pertinent to each publicly visible table. The list
of keywords are roughly in order of relevance and each keyword is
separated by spaces and strictly all lowercase.
- ZZDPSETS
- describes the data products associated with a given table. Each entry
describes a specific data product set for a given table. Tables have
entries in ZZDPSETS if and only if they have data products.
- ZZDP
- describes the data products available in the HEASARC archive. Each data
product is described as a URL so that it need not be physically present
at the HEASARC. For each URL a data product tag is associated. In
principal many tags can be associated with a given URL, but this is
currently discouraged. Note that a data product set described by
ZZDPSETS will often comprise multiple data products.
Note:The ZZDPTYPES and ZZREL tables were defined and used in
earlier incarnations of the HEASARC Database System, but they are no longer
used.
Metabase Details
This section contains the detailed specification of the names, formats
and use of columns in each of the metadata tables.
ZZGEN: contains the generic information to
describe tables available for access. |
table_name |
char20 |
The short name of the table. |
table_location |
char80 |
An identifier of the database system where the table is
stored. |
table_description |
char80 |
A short description of the contents of the table |
create_date |
char19 |
The date the table was created.
Unlike other dates in the HEASARC database, the creation and
modification dates in ZZGEN are given as an ASCII string in the form
YYYY-MM-DD HH:MM:SS. Elsewhere dates should be given as modified
Julian day numbers. |
modify_date |
char19 |
The last date the table was modified. |
table_rows |
int2 |
The number of rows in the table |
|
|
ZZPAR: contains a list of the parameters available for each table.
|
table_name |
char20 |
This field contains the table being described. |
parameter_name |
char24 |
This field contains the parameter being described. |
parameter_description |
char80 |
This field contains a short description of the parameter. |
parameter_comment |
char80 |
This field contains additional information pertaining to
the parameter. |
parameter_format |
char80 |
The basic type and display format for the parameter given as
a string of the form 'format:display' where format gives the type and length
in bytes of the data, e.g., int1, int2, int4, float4, float8, char22, and the
display is a printf display code without the initial '%'. E.g., float8:10.3f
would indicate a double-precision floating-point value that should be
displayed in a field 10 characters long to a precision of 3 decimal places.
The display portion is used to ensure that excess precision is not given for
a variable. The colon and display precision may be omitted. |
parameter_unit |
char80 |
The unit of the parameter. These should generally be given
using the HEASARC standard unit strings . E.g., 'ct/cm^2/s'. Times are
generally expected to be given in Modified Julian Days and should be given
the unit 'mjd'. |
parameter_ucd |
char120 |
The Unified Content Descriptor (UCD) of the parameter.
These should follow the latest IVOA
recommendations for UCDs. |
parameter_is_index |
char1 |
A suggestion to the underlying database or ingest software
that an index should be created on this field. |
parameter_minval |
char80 |
The minimum value of the column within the table. Nulls are
not considered. This is a string even if the underlying column is not. |
parameter_maxval |
char80 |
The maximum value of the column within the table. Nulls are
not considered. This is a string even if the underlying column is not. |
parameter_default |
int4 |
If non-zero, gives the ordering for the display of the
parameters. Parameters with this set to zero are displayed in
an undefined order after the parameters for which there are
non-zero values.
|
|
|
ZZEXT: contains the discipline-specific information.
|
table_name |
char20 |
The name of the table for which the extension information
is being given.
|
parameter_name |
char24 |
The name of the 'virtual column' which is being added to
ZZGEN for this table. See the section below for details
about the signficance of certain values.
|
parameter_value |
char80 |
The value to be given to the 'virtual column'. Note that this
is a string even for numeric values. By convention, if the
value begins with the character '@', it refers to a column in
the table. E.g., if parameter_name='default_search_radius' and
parameter_value='@error', this is interpreted as saying that
the default search radius is whatever the value is stored in
the 'error' column of each row in the table.
|
|
|
ZZLINK: contains links between tables
|
table_name |
char20 |
The table being linked from. |
link_table_name |
char20 |
The table being linked to. |
link_priority |
int2 |
When multiple links are being displayed, this specifies the order
in which they should be displayed. |
link_symbol |
char255 |
A suggested anchor string to use for displaying the link.
Normally this should be either a short string, or an
<IMG> link to a small icon. |
link_criterion |
char255 |
The criterion through which the link is defined, often the
SQL in the where clause describing the link. E.g., in an link
from the ROSMASTER table to the WGACAT table this might be
the string "ror=heasarc_rosmaster.ror" which indicates that a
given row in the ROSMASTER table should link to all of the
rows in the WGACAT table which have the same ror field. A
special syntax is also available for linking using using a
cone search. E.g., if we wish to link the ROSMASTER table to
the all RASSFSC objects within 1 degree (60') of the center
of the field of view the criterion may be written as:
"-cone:heasarc_rosmaster.ra,heasarc_rosmaster.dec,60". In
both these cases the table_name would be 'heasarc_rosmaster',
but the link tables would be 'heasarc_wgacat' and
'heasarc_rassfsc' respectively. |
link_description |
char255 |
A short description of the link. |
|
|
ZZWORDS: contains pertinent keywords describing tables
|
table_name |
char40 |
The name of the table to which the keywords pertain.
The table name is wider here to include VizieR tables. |
words |
char3900 |
Space-separated list of keywords for the specified table in
rough order of relevance. |
|
|
ZZDPSETS: contains the following information:
(1) the existence of an entry for a given table_name indicates that
that table has data products that are available for access,
(2) definitions of generic sets (or categories) of data products, and
(3) how to construct the data products tags for a given set.
|
table_name |
char20 |
The table with which a data set is assoicated. The same data
product may be assoicated with multiple tables but a separate
entry is required for each table. |
set_name |
char25 |
A short name for this data product set. It will be used as a
label when the user is selecting data products. |
tag_format |
char500 |
A comma-separated list describing how to construct the tags
for for the set. Anything inside "@{" and "}" is considered
to be a column name in that table. For example,
"rosat.hri.@{seq_id}.events" means each tag or set of tags is
constructed starting with "rosat.hri.", then the value in the
'seq_id' column for that row, followed by ".events". More
than one tag can be formed from the tag_format if the tags
are separated by commas. For example, tag_format =
rosat.@{instrument}.@{seq_id}.lc,rosat.ao*.cover |
set_description |
char80 |
A longer, user-understandable description for this data
product set. |
|
|
ZZDP: contains the generic information to describe
data products that are available for access.
|
dp_tag |
char45 |
The data product tag, typically an abstract identifier for the
file, directory, or remote URL. The HEASARC convention for
the dp_tag values associated with a given observation and data
product is "mission.instrument.observation_id.type.unique_id".
However, there are many ancilliary data products like images,
cover pages, abstracts, etc. which apply to an entire mission
or an instrument of such a mission rather than a specific
observation, and there cannot be a strict naming convention for
those types of data products. However, the guideline is that
those tags should start with "mission.instrument.*" or just
"mission.*" as appropriate. No HEASARC naming convention for
multi-mission data products has yet been defined. The dp_tag
value should always be all lowercase letters. |
dp_type |
char35 |
A short description of the kind of data product. Typical data
product types are Image, Plot,
Lightcurve, Events, Spectrum, and
Telemetry, but many others are in usage. The dp_type
value should be mixed case. |
dp_format |
char25 |
The uncompressed format of the data product. (In general, most
data products are compressed to save storage space.) The
dp_format value should be all uppercase. Typically, the HEASARC
uses one of the following:
FITS |
A FITS file (usually compressed). |
GIF, JPEG, PNG, PS, HTML |
Quick-look data. |
HTTP |
A link to a remote Web page. |
TAR |
A tar (Unix archive) file. |
DIRECTORY |
A directory in a hierarchy. This implicitly refers to
the contents of this directory. |
ASCII |
Human-readable (often "quick look") text data. |
BINARY |
Program data not in FITS format. |
|
dp_level |
int2 |
Currently unused. This field is intended for the heretofore rare
occasions when it is desirable to archive old versions of data
products in addition to the current version. The dp_level is kind
of a reverse version number in which 0 is the most recent version,
1 would be the previous (older) version, 2 would be older still,
etc. While this may appear clumsy at first, it makes it vastly
easier to query for a list of the latest data products
versions. |
dp_url |
char80 |
The URL that points to the data. This typically uses "shortcuts"
described below in the discussion of ZZEXT parameters. |
|
Special Values in ZZEXT
Entries in ZZEXT are often called "virtual parameters," since they are
used to extend the table information stored in ZZGEN with
table-specific metadata. The existence and values of these virtual parameters
in ZZEXT control how tables are treated by the HEASARC
Browse system.
Entries in the ZZEXT table with table_name='zzdp' are used as
shortcuts in building the URLs stored in the dp_url field.
Although the dp_url field in ZZDP is currently limited to 80 characters,
URLs can be much longer, indeed indeterminately longer. In order to
support arbitrarily long URLs, to conserve storage space, and to increase the
speed of data products queries, it was decided to use variables which would
be looked up in ZZEXT and repeatedly expanded to construct the full URL. Such
shortcuts use a syntax like "${shortcut}". For a given "${shortcut}" the
ZZEXT table is queried for the parameter_value matching the table_name "zzdp"
and the parameter_name "shortcut". This parameter value then replaces
"${shortcut}" in the dp_url entry.
E.g., suppose ZZEXT has the following entry
(table_name, parameter_name, parameter_value) =
'zzdp', 'missionbase', 'ftp://heasarc.gsfc.nasa.gov/mission/data/'
In a row in ZZDP with
dp_url='${missionbase}/obs/k95432.fits.gz'
the ${missionbase} shortcut is recognized as a shortcut and
replaced by the value in the ZZEXT table. I.e., the full URL is:
ftp://heasarc.gsfc.nasa.gov/mission/data/obs/k95432.fits.gz
Since shortcuts may be defined in terms of other shortcuts, such
expansions are done until there are no more shortcuts left to expand in the
URL. Circular shortcut references are detected and disallowed.
For example, "${heasarc}" might be "http://heasarc.gsfc.nasa.gov",
"${rosat}" might be "${heasarc}/FTP/rosat/data", and "${rosat_pspc} might be
"${rosat}/pspc/processed_data".
Shortcuts also have the added advantage that, if the relative location of
some number of data products had to be changed, it is a lot easier to change
a single entry in ZZEXT than to change thousands of rows in ZZDP.
Note that these shortcuts only apply to the ZZEXT entries for the ZZDP
table.
The HEASARC's Data Products Layer implements linking a given row in a
database table to the data products for associated with that row. It utilizes
the ZZDP and ZZDPSETS metabase tables.
Example of How the Browse Web Interface Uses the Data Products Layer
Suppose the ROSAT catalog, HEASARC_ROSMASTER, contained the following five
rows:
seq_id | instrument | ra | dec | name |
RF150003 | PSPC | 225.341508 | 66.405361 | H1504+65 |
RF150007 | PSPC | 216.643574 | 1.512456 | PG1426+015 |
RF150015 | PSPC | 219.479943 | 64.504469 | HD129333 |
RH100192 | HRI | 325.652292 | 38.089434 | XRT/HRI THERM CYG |
RH100193 | HRI | 85.021314 | -69.764432 | DRACO CLOUD |
When a user goes into Browse and does a search which displays the above
rows in HEASARC_ROSMASTER, Browse displays the above with checkboxes to the
left of each of the sequence IDs. Suppose the user checks the second and
fourth sequence IDs in the list. He/she then chooses to preview or download
the data products associated with those observations. For a given data
products set (or "category" in Browse Web Interface terminology), the
software looks up the tag_format in ZZDPSETS for that set and the table
HEASARC_ROSMASTER. Suppose the tag_format field for some set says:
rosat.@{instrument}.@{seq_id}.*
The software then fills in the information from the HEASARC_ROSMASTER table
into the tag format. So, for the second row, seq_id is RF15007 and instrument
is PSPC. The resulting data products tag that is constructed from this
information (and after converting to all lowercase) is:
rosat.pspc.rf15007.*
Similarly, for the fourth row, seq_id is RH100192 and instrument is HRI, so
the constructed data products tag becomes:
rosat.hri.rh100192.*
Browse then queries the HEASARC Data Products Layer (specifically the ZZDP
table) to get the data product information for that tag. The asterisk ("*") is
interpreted as a wildcard, so the result is a list of tags. The ZZDP table
returns the matching tag(s), types, formats, and URLs. For example, say the
table ZZDP looks like the following:
dp_tag | dp_type | dp_format | dp_url |
rosat.pspc.rh100192.aspect.1 | ASPECT | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.aspect.2 | ASPECT | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.events | EVENTS | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.lc.1 | LIGHTCURVE | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.plot.1 | PLOT | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.plot.2 | PLOT | GIF | http://heasarc.gsfc... |
rosat.pspc.rh100192.image.1 | IMAGE | JPEG | http://heasarc.gsfc... |
rosat.pspc.rh100192.image.2 | IMAGE | GIF | http://heasarc.gsfc... |
Browse then formats the above URLs into HTML links to each data product.
It also uses the above information to package multiple data products into a
Unix tar file for the user to download together as a convenience.
User Tables
Browse dynamically queries the metadata tables to determine available
tables and the columns they contain. The following guidelines are commonly
adopted in building tables. Tables are normally added to the HEASARC by
creating a TDAT file, an ASCII representation of the
table and then using the HDBingest command to bulk
copy the table into the database.
The primary right ascension and declination columns (as indicated by the
ZZEXT fields) should use J2000 coordinates. While other coordinate systems
may be used within Browse for cone-searches, positional cross-correlations
are not feasible when the base coordinate systems are different. If a table
is supplied with coordinates in a different system, then new columns should
be added to the table in J2000 coordinates and these new columns should be
made the primary positional fields. Conventionally, the primary position
fields have the names 'ra' and 'dec'.
Times and dates should be stored in Modified Julian Day (MJD) numbers.
Dates may be specified using integer values, while real numbers should be
used for finer grained times. Double precision values allow time resolution
of a few microseconds which is usually enough for catalog data.
A class field should normally be used only for information on object
classes using the HEASARC standard set of source classes.
The table priority should be used to highlight the key tables for a given
mission which may have priorities of 2 or 3. Tables made redundant by newer
versions should be given priorities of 8 or 9. Typical object tables usually
are given a priority of 5.
The default search radius should normally reflect either the size of the
observation or the positional uncertainty of the source position.
Documentation prepared by the
HEASARC Database Group
HEASARC Home |
Observatories |
Archive |
Calibration |
Software |
Tools |
Students/Teachers/Public
Last modified: Monday, 09-Oct-2006 20:01:16 EDT
HEASARC Staff Scientist Position - Applications are now being accepted for a Staff Scientist with significant experience and interest in the technical aspects of astrophysics research, to work in the High Energy Astrophysics Science Archive Research Center (HEASARC) at NASA Goddard Space Flight Center (GSFC) in Greenbelt, MD. Refer to the AAS Job register for full details.
|