The Xamin Guide v1.2

December 1, 2020

Introduction

This document describes the Xamin system used to access astronomical data at the High Energy Astrophysics Science Archive Research Center at NASA's Goddard Space Flight Center.

What is Xamin?

Xamin is the primary interface to the HEASARC's catalog and archive systems. Xamin supports simple and complex table queries, monitoring of tables, and downloads of HEASARC archival data. Xamin also links to tables and data at other sites where appropriate.

Xamin can be used through a web interface designed as a Rich Internet Applicaiton (RIA) and also through a command line interface. All facilities are available through both interfaces.

Documentation Overview

The are a number of documents describing aspects of the Xamin system. While there is some overlap between these documents they are intended for different purposes and audiences. Documents include:

The Xamin Guide: This document itself provides a general overview of the Xamin system with particular emphasis on how the system is built and how it can be used. Users interested in an overview of the general capabilities may find this document more accessible than the detailed interface guides. This is the primary document for administrators since it describes in detail the steps for installing and maintaining the Xamin system. Developers may find this document less useful since it discusses the code in only fairly general terms, but the code overview may be a helpful starting point.; This web guide -- designed to be used from the Xamin Web environment -- describes the Web interface in detail.
The Xamin CLI User's Guide: This document describes the use of the Xamin CLI interface. The Web interface is based upon the CLI and the CLI document spends a bit more time on the details and consequences of certain settings, so this document may be useful for both Web and batch users.
The Xamin Plotting User's Guide: This document describes the powerful plotting capabilies. Xamin's plotting interface is build over the plot.ly library.
The Xamin requirements document.

Current implementations

Xamin is currently implemented on the HEASARC web environment as the servlet

    https://heasarc.gsfc.nasa.gov/xamin

The Xamin web application contains a number of end-point URLs. This web application includes not only the run-time code, but full documentation and code for the installation and maintenance of Xamin.

Migrating from Browse to Xamin

The Xamin interface replaces the Browse web interface. Xamin supports all of the query functionality in Browse and provides many new capabilities, however this new system uses a single Web page where users can progressively expand capabilities rather than the many distinct Web interfaces that Browse had employed. This section describes how some of the tasks accomplished in the old system are done in Xamin.

Querying all HEASARC holdings: This works essentially the same as before. Just enter a position and/or time and and start a query. You get back a table with the number of hits for each HEASARC table. Click on the table icon to query a specific table with the specified constraints.
Querying a specific table: In Browse if you wanted to query a specific table you could go to the index page or first select a mission and then the specific table. In Xamin you enter the table in the Tables Explorer quick entry box at the top of the form, or you can browse the the table entries in the Tables Explorer hierarchies. You can also search for tables using a keyword search of their documentation. Any of these can be used to select a table.
Querying all tables from a given mission: This was the default mode for Browse when the user checked a mission box. In Xamin, to select all of the tables for that mission just open the Tables Explorer/mission hierarchy and click on the icon to add all of the tables for that mission. Then use Find matches in... Selected Tables to search all of the tables in the mission for the selected time or positions. You can add tables from other missions, or delete some of the tables before querying if you like. You can then query the selected tables for position or time constraints.
How do I do a query by parameters?: Enter the table name the use the Parameters Explorer to add parameter constraints. You can add a constraint for any parameter by clicking next to it. Note that you can use fairly general SQL expressions in the constraints and these can involve table parameters.
How do I do cross-correlations?: If you have 2-4 tables selected, you can do a correlation of them just by clicking on Query to...Correlate. Spatial cross-match constraints will be added automatically, but you can override or modify these to meet your needs.
Can I search the metadata for tables as I could in the keyword entry interface?: Yes. Use the Tables Explorer keyword search.
How do I monitor a mission for new observations of the target I'm interested in?: Xamin has a broad capability to check if the results for any query have changed. First get a HEASARC services account. Login during an Xamin session and run a query that finds observations you are interested in. Use Query to...Save to save this in your account. Bring up the summary of your saved tables with theSession/User Tables--Info menu. If you click on the Frequency column next to your saved table you should get a pane that allows you to ask to be informed whenever the result of this query change.

System Elements

Interfaces

Users may access all Xamin capabilities from either a Web-based GUI or through a CLI interface that can be embedded in scripts.

Web Interface

The Web interface is built on top of the ExtJS library to provide a very powerful but compact interface. Users can query, correlate, plot and download data all in the same page. A single box entry to the GUI is also available. It allows users to specify targets, formats, tables and such in a single entry box and starts up the Web interface after having sorted all of these out. This interface is available on the HEASARC home page.

CLI Interface

The CLI interface uses a simple keyword driven interface to build and run queries. The Web interface is built on top of the CLI interface. Since the underlying database is not visible to the public, even the CLI queries go over a simple Web interface. A small Jar file is used to relay CLI queries to the system. This users.jar file may be downloaded from as https://heasarc.gsfc.nasa.gov/xamin/distrib/users.jar.

Queries

Xamin supports a broad range of queries. Users can query single tables, they can ask what tables have data at a given location or time, or they can combine information from multiple tables in a variety of different ways.

Single Table Queries

Queries of single tables can involve standard positional or temporal constraints. Simple or complex constraints can be levied against any parameter in the table. Output fields can be selected from the columns of the table and synthetic columns created as expressions of those fields. If a user simply wishes to browse a table it can be queried without any constraints.

Discovery queries

Discovery queries query master tables for positions and/or times that the user has specified and find science tables that have information for those positions or intervals. Users can query the entire HEASARC holdings or they can restrict the query to a specific list of tables of interest. Only HEASARC tables are indexed in discovery queries.

Correlations

Correlations include any query that involves multiple tables. Most correlations involve checking for spatial adjacency, but correlations can be made on time or other criteria.

All tables can take part in correlations. However remote tables must be downloaded to the HEASARC before the correlation can take place. So in practice correlations of remote resources can only be done if the tables are of modest size or if constraints that can be passed to the remote site can restrict the size to a reasonable number. It may be impractical to query remote tables where the size of the downloaded data is more than ~10,000 rows.

In the web interface spatial correlation constraints are assumed and created for the user -- they can be deleted if not desired. In the command line version the OFFSET keyword is used to specify positional correlations, but the user must include this explicitly.

Temporal cross-correlations can be made as simple constraints on the time columns of the two tables. At the HEASARC times are stored in MJD's, so to ask for observations within a day of one another we can add a constraint of the format

     b.time-1 .. b.time+1

in the Parameters explorer for the field a.time. In the command line interface that might be specified as

     constraint='a.time between b.time-1 and b.time+1'

where the quotes are to ensure that the shell treats the value as a single string.

While any number of tables can be correlated in principle, queries tend to become very slow when large numbers of tables are involved. The web interface imposes a maximum of four tables in a correlation, no limit is specified for the command line. This limit can be gotten around if necessary by breaking up the correlation into a series of correlations and saving intermediate results as tables.

Anti-correlations

Anticorrelations are a special class of correlation where the user wishes to find rows in one table or tables that do not have corresponding data in another table, e.g., objects that have not been observed. Only a single anticorrelation table is permitted in a correlation. The results of an anticorrelation are defined as follows:

A query is performed on all tables of the query including all of the non-group constraints (e.g., all of the constraints that would go in the WHERE clause in the SQL). The fields returned in this query are row identifiers for the non-anticorrelation tables of the query. E.g., if we are querying tables a, b, c, and d with an anticorrelation on table d, then we apply all of the constraints on the query (including constraints on table d) and find out the row tuples for tables a,b and c. These tuples are stored in a temporary table. E.g., one of the results of this query might be that row 5 of table a, row 6 of table b, and row 7 of table c are found, so we save a tuple (5,6,7).
A second query is now performed where all query constraints that involved table d are removed and a new constraint is added that the any tuple of rows that was found in the first query is excluded. The output fields that the user asked for, and any group constraints (constraints that would be put in a HAVING clause) are added back into the query. The results of this query are returned to the user.

Manifestly a user may not request an output field in an anticorrelation that requires any column in the anticorrelating table. The same is true of any group constraint.

Note that Xamin does not do careful analysis of the input constraints to determine if pieces of an input constraint can be included in the second query. E.g., compare the constraint

     constraint='d.exposure>1000 and c.exposure>1000'

versus

     constraint=d.exposure>1000,c.exposure>1000

From Xamin's perspective in the first example we have a single constraint. If d is the anticorrelation table, then in the first case Xamin will not include any constraint on the exposure from table c when it runs the second part of the anticorrelation. In the second case, since the constraint on table c is specified independently this constraint will be included in the second query.

Autocorrelations

Autocorrelations involve a correlation of a table with itself. This can be useful in looking for neighbors or duplicates. The structure of the Xamin metadata table often suggests using autocorrelations to get results. To perform an autocorrelation simply specify the table multiple times. The table prefixes allow you to specify which instance of the table you are looking at.

When doing an autocorrelation it is usually necessary to make sure that you don't return a row matching itself. A simple way to ensure that is to add a constraint of the form

       constraint='a.id != b.id'

in the query. In the Xamin web interface just enter != b.id next to the entry for a.id. Of course the appropriate field may be named something other than id.

Data Products

Data products are any information not in the table being queried that is somehow linked to the table as a whole or to a specific row or rows in the table. There is extensive support for data products in the Xamin Web interface, but the location of data products can be retrieved in the CLI interfaces and these results can be using in scripting data access.

There are two fundamental types of products:

Mission archival products represent the data holdings in the HEASARC archive. These are the mission data from more than two dozen NASA and non-NASA missions. For many Xamin users querying observation catalogs is just a step in finding and locating the datasets of interest to their research. These products and be retrieved and downloaded using Xamin. These products are defined in mission data product files. All data products for a given mission are normally defined in a single XML file.
Link products are URLs that point to data that are related to a specific row, but are not part of the HEASARC archive proper. We distinguish several types of link products.
- Linked tables return rows in a linked table that are related to the row in the initial table. E.g., each row of the SwiftMaster table links to the associated rows in the Swift instrument tables (BATMaster, XRTMaster and UVOTMaster). In turn the instrument tables link to the mission master table. Table links can and often do span missions. Table links are provided between many of the major observation master tables, e.g., linking each Chandra observation to nearby XMM observations. Following a table link executes a new Xamin query. Such link products are automatically generated from the Browse zzlink.tdat file which is updated whenever new table links are included. A TableLinks.xml file may preserve the current table link definitions.
- Bibliographic links provide links to papers using HEASARC observations. A master bibliography table, ZZBIB, indicates which observations have been used in which table. The table has been collected manually and does not cover all of the literature for all missions but covers much of the HEASARC's holdings. Since these links are pointed to from the zzbib table, they are defined in the PointedProducts.xml file. This needs to be updated whenever entries from a new table is added to ZZBIB.
- Dynamic service links use positional information from each row to query other services. Currently the HEASARC provides service links to NED, SIMBAD and SkyView. Unlike other products these are available for any table without needing to be specifically associated with a table. The mechanism used for positional service links is not restricted to positions. If a remote service can be invoked by specifying a value and the metadata for a table indicate it has a field or fields that have appropriate data, a dynamic link can be made.

By default, the Web interface only shows archival products but the user can request the other types. In the CLI interface users must explicitly request data product records.

Downloading archive data

In the web environment the user selects data products in a grid query results page. Data products can be selected for all rows in the table, or for individual rows. Users can open the data products for a given row and look within the data product hierarchy selecting only the data products of interest.

As data products are collected they are added to a shopping cart. The shopping cart can contain products from multiple selections and from multiple queries. Users can also download individual data products directly (or in the case of directories open a web link to the directory). Link products cannot be added to the shopping cart -- they generally involve data that is intended to be viewed in browser or which are found at remote sites.

At any point in an Xamin session users can download the data in their shopping cart. They can download individual products one by one from the shopping cart, but if there are more than two or three products users may prefer to collect the data products together. A tar file of all of the products in the cart can be created. This tar file is created dynamically so the download should start almost immediately.

Whenever a significant amount of data is to be downloaded we recommend that a data download script be created rather than directly tarring the data. Scripts can be created which use either the WGET or CURL commands both of which are widely available. Scripts are much more easily restarted and allow the users complete control of the data download process. The script can be displayed and copied via striping or it can be downloaded explicitly.

In the command line environment a query can be used to build a download script to download archival products or to provide a list of links to link products.

Setting Up Data Products

Xamin data products use two kinds of metadata:

Product descriptions are table independent: they describe the content of a product and where it is located (or how to construct its URLs) in terms of a set of variables. A product may have other products as child products. A child product inherits the variables of its parent. Often products specify a file template which is used to match against the file system relative to a root location. A root variable will point to some directory associated with a specific observation.
The TableLink metadata sets up a link between a specific table and a product. Many tables can link to the same a product. The table links describe how the values of the variables in the product definitions should be created based upon the columns in the table the product is being linked to. They can also specify a boolean expression that must be satisfied if the product is to exist. This allows for conditional products. E.g., we might wish to link from a table of objects to different linking tables depending upon the classification of the object. [There is an unfortunate confusion in terminology between this use of table links, i.e., the links between a table and any products, and a particular kind of product, where we want to link one row of a table query to a query of some other table.]

Data product information is relatively stable and batch procedures have been developed for interpreting XML descriptions of the products and updating the database.

In the full Xamin distribution data product information is found in the ops/products/missions directory for each mission, and in ops/products for the link, bibliographic and positional products. The updateProducts command, a script which invokes appropriate Java classes, reads the XML files and updates the Xamin metadata.

User Sessions

Xamin uses HEASARC services login sessions. Users may request any number of sessionss, but must provide a valid E-mail. Users must respond to a message sent to them by E-mail before the session is activated.

Capabilities

A user can store any number of user tables in their session. These tables can be used exactly as any other table in the system users can correlate the user tables with other local and remote tables. User tables are only available when a user has connected into the corresponding session. For the command line the USER and PASSWORD keywords are used to ensure appropriate access privileges.

On-line users can also save the current query configuration at any time into a named configuration. This configuration can be restored at any later time. Note that the configuration only affects the Query window, not any of the other Xamin windows. If a configuration is saved with the name default, then auser can automatically restore this configuraition in subsequent logins. To change the default configuration the current default must first be deleted.

Output formats

A variety of output formats are supported in Xamin.

ASCII

Two ASCII formats are supported. The Aligned format pads the output fields to align columns and headers. The Streaming format does no padding. Messages are shown after query results in the Aligned format, but are shown as they are processed in the Streaming format.

VOTable

The VOTable format is an XML format supported by the Virtual Observatory. Product data is returned using service descriptors that point to the products using the Xamin DataLink service.

VO Protocols

ConeSearch and TAP formats provide data using the VO ConeSearch and TAP protocols. These are pure VOTable formats which satisfy the requirements for these protocols (e.g., the type of UCD used, and the syntax of the INFO elements included). SIA and SSA services are supported for appropriate image and spectra data products. A DataLink service is used to support the HEASARC's hierarchical data products.

Excel

Data may be downloaded as Excel spreadsheets in the .xls format.

FITS

Data may be downloaded as FITS binary tables.

Table Sources

Xamin allows users to specify many different kinds of tables to query.

Local Tables

These are stored in the Postgres database and include the familiar HEASARC tables, the metainfo metadata tables and a few master tables used for index queries.

User Tables

Each user session may have their own persistent tables.

Uploaded Tables

In the command line interface user may query temporarily uploaded tables without requiring any user account or they can upload tables into a user account. Only the second option is currently available from the Web.

External Tables

Users may query tables at Vizier and Virtual Observatory sites. The syntax for querying these tables is exactly the same (except for the table names used) and in principle users can do all of the same kinds of queries. However in practice downloads of large number of rows from a remote site may be impractical so that users should try to ensure that no more than ~10,000 rows are required to be downloaded from a remote site.

Sample Queries

This section briefly describes some of the kinds of queries that you can do with Xamin. The web and CLI user�s guides illustrate the specific steps needed to do a few of these queries. The discussion here focuses on what you can do rather than how to do it.

Discovery Queries.

You can use discovery queries to find out if the HEASARC has information at given positions and/or times. They can be made against all HEASARC tables, or against a specified list of two or more tables. The result of a discovery query is a list of the tables that have information at the requested positions/times including the number of rows in each table that match the criteria. The list includes links that actually do the query for each table. Only positional and temporal constraints may be specified in a discovery query. When you do a discovery query you get back a table which provides links to the queries of the individual tables.

Examples:

What HEASARC tables have information near 3c273?
Does the HEASARC know of any observations made on June 6, 2007?
What X-ray data does the HEASARC have on Cen-A in 2005-2006?
What ROSAT, Chandra, XMM and ASCA observations does the HEASARC have on 3c273, Abell 1656 or 10 15 06.2, +14 25 19.8?

Simple Single Table Queries

If you just want to query a given table for information on specific targets or times, then enter the position or time constraints just as with a discovery query, and also specify the table you are interested in. You can also view a table without specifying any constraints. A single table query of an observation is often the first step in downloading data associated with the selected observations.

Examples:

What does the XMMMaster table look like?
What observations has Chandra made of 3c273 or 3c279?
What did XTE observe on 2004-10-10?

Complex Single Table Queries

You can do a lot more with single table queries than positional or temporal constraints. You can put constraints on any parameter, decide exactly which columns to display -- including columns that you define�and set the order columns are displayed in. You can specify a given sort order for the output. You can even do simple statistics on a table.

Examples:

What observations has ROSAT made of 3c273 that had exposures > 10,000 seconds?
What are the longest XMM observations?
What is the average flux of RASSFSC sources as a function of galactic latitude?
What high-galactic-latitude sources in the XMM Serendipitous Source Catalog have fluxes more than 10 times the flux error?
For each source observed by ROSAT what were the total number of observations and the total exposure time of all observations of the source per instrument?

Cross-correlations

Cross-correlations allow you to combine information from multiple tables. This can include tables of your own that you have uploaded or generated in previous queries. Typically cross-correlations use a positional constraint where we match results from the two tables according to positional coincidence. However cross-correlations can also be done based on time or any other characteristics of the table. All of the options available for single table queries are available for cross-correlations too. One special kind of correlation is an anticorrelation, looking for data in one table or tables which does not have matching data in another. Data products can be linked to the results of the cross-correlation whenever any of the source have data products.

Examples:

What Chandra observations have been made of objects in my uploaded source list?
What targets have been observed by both ROSAT and Chandra?
What targets have been observed by ROSAT with an exposure of at least 40000 seconds and Chandra with an exposure of 10000 seconds?
What targets have been observed by Chandra with an exposure time greater than the ROSAT exposure time?
How many times has each Messier object been observed by Chandra?
What is the x-ray/radio flux ratio of objects seen in both the XMMSSC and FIRST catalogs?
Which of the objects in my source list have not yet been observed with Chandra?
What are the brightest objects classified as AGN in the WGACAT catalog that have not been observed by Chandra?

External Queries

External tables are queried exactly like native HEASARC tables. Access to these remote tables may be slower than for local tables, but users can query by position, add constraints and join tables. Simple positional and parameter constraints may be applied remotely. More complex constraints require downloading of data to the HEASARC database. Queries that would require downloading more than of order 10,000 rows may not be practical. External queries of VO image and spectral data services may have an associated data product.

Examples:

What are 100 nearest USNO-B sources near the north Galactic pole?
Are there Chandra observations of black holes with masses greater than 100 solar masses? (E.g., J/ApJ/667/131 is a table of black hole masses).

Monitors

Users can check for new observations and for other changes to the database. Any local query can be saved in the session and the user can ask that the query be repeated occasionally and that they be informed of the changes. This feature can also be used to keep a table up to date.

Examples:

Let me know about new XMM or Chandra observations of Abell 496 or Abell 1656.
Are there any new Chandra ACIS observations with exposures longer than 10,000 seconds?
Is there any new information for any tables with positional errors less than 1 degree on TY Pyx?
Maintain a list of distinct PIs and the number of observations for Suzaku.

Upload Queries

Users can query their own tables and use these tables in correlations with HEASARC and external tables. Tables can be uploaded in CSV, VOTable and TDAT formats. Any string may be used as the separator in the CSV format, not just commas. The data are uploaded into an image of the input table and a few system columns may be added if positional columns are recognized (by UCD or name). Users may also upload a list of targets where each line of the input is assumed to be a target. Some lines may be target names and others coordinates. Coordinates may in turn be in mixed sexagesimal and decimal formats. However all coordinates must be in the same coordinate system. A list of targets is uploaded into a table with three rows: the original input string and the J2000 RA and Dec corresponding to the position. The derived coordinates can be null if an unresolveable target name was found. While source lists can be queried as tables in their own right they are usually correlated with some other table.

Examples:

Which elements of my source list are at high Galactic latitude?
Given a CSV file with RA,Dec and Magnitude, find the difference between these magnitudes and magnitudes in the USNO B catalog?
Given a VOTable with a set of stellar positions, find the objects where the offset with Hipparcos positions is larger than 20" (assuming that any offset >20" is a misidentification).

Installation

This section describes the steps in creating the complete Xamin data access system. The discussion of each step includes both the general issues that need to be resolved by an implementor in their system, and the specific choices made at the HEASARC.

This section describes a complete installation of the Xamin server. Users who only wish to be able to use the Xamin from their CLI environment need only download the very small users.jar file from the Xamin distribution directory (e.g. http://heasarc.gsfc.nasa.gov/xamin/distrib/users.jar).

Installing and populating the database.

Xamin is designed to use a PostgreSQL database. Some PostgreSQL specific features are used in the software to populate the system and in the query system. The keyword search interface uses PostgreSQL features that paralleled in some other systems (e.g., SQLServer), but may not be generally available. Postgres specific features of the system are called out

Postgres is freely available at:

   http://postgresql.org

While one click install downloads are available, these generally require superuser privileges on the destination machines..

Installing the database

This section gives a brief overview for installing Postgres. Much more information is available at the PostgreSQL site.

The tar file should be unpacked with

      > tar zxf postgresql-X.Y.Z.tar.gz

this creates a subdirectory of the current directory of postgresql-8.4.2. An INSTALL file with detailed directions for installing Postgres is available in this directory. The path used for the HEASARC installation is noted below.

Within the postgres-ql-X.Y.Z directory

    > ./configure --prefix=desired_postgres_directory

e.g.,

    > ./configure --prefix=/local/db/postgres

Then execute:

     make
     make install

to compile and install Postgres in the desired location.

The current Postgres operational machine is installed on the DBMS1 machine.

The Postgres server is ideally run under a special user account but may be run under a normal account. The special account can be created using normal system procedures.

Initial configuration.

To initialize the database first create the directory in which the database data is to be stored. This directory should be owned by the same user as will run the server.

    > mkdir directory_for_data

then initialize the database using

    > initdb -D directory_for_data

where initdb is in the bin subdirectory of your Postgres installation.

To create an initial superuser account and password you may wish to include the -U and -pwprompt or -pwfile=file arguments to the initdb command. The -U argument specifies the name of a superuser account to be created for the database (e.g., postgres). The -pwprompt argument indicates that the command should prompt for the password for the account while the -pwfile argument gets that password from a file.

The output from the initdb command indicates how one can start the database server. This command should be put in the startup scripts for your database machine (but is not currently for the HEASARC).

Once the database has been initialized some small amount of customization of the configuration files is desirable. These are found in the data directory mentioned above.

The pg_hba.conf file should be updated to allow remote access. This can be configured in a number of ways.

We have also made the following changes to the postgresql.conf file:

   shared_buffers = 500MB

   temp_buffers   = 8MB
   max_prepared_transactions = 5

   work_mem = 16MB
   maintenance_work_mem = 64MB
   max_stack_depth = 8MB

All of these lines are updated from the values in the default configuration which is sized for a small machine.

Postgres accounts

At least two and preferably three Postgres accounts should be set up to manage the Xamin database. A superuser, postgres, account will be needed to set up the user accounts and to create the database in which the Xamin system will run.

A second account, e.g., xamin, may be defined as the owner of the xamin database with all associated privileges, or this can be done under the postgres account. Using a separate xamin account is preferable since this account need have no privileges outside of the Xamin database. The account will be used in many of the internal operations scripts, but is only invoked by external user actions when new 'user accounts' are created and destroyed and the associated schemas need to be created or destroyed in the Xamin database.

The webuser account is the account that external users normally invoke when accessing Xamin. The webuser account has the privileges to read all non-system tables and to create and update tables in user schema where the tables associated with user sessions will be supported. Note that all users share the same actual database accoun but cannot read other users' data since a password is required to access session data.

The names and passwords of these accounts are needed in the [heasarc|ops|dev].settings files found in the Xamin WEB-INF area. Generally the ops/dev settings files will use the xamin account which is also the privileged account in the heasarc.settings. The webuser account is used as the non-privileged account in the heasarc.settings.

To create a user in Postgres use the CREATE ROLE SQL command or the createuser script provided in the Xamin user area. E.g.,

                create role xamin with login encrypted password 'password';
                create role webuser with login encrypted password 'another';

The Q3C and PGSphere libraries

The Q3C and PGSphere libraries is used to index some tables in Xamin and to support some VO functionality. They need to be incorporated into the Postgres installation. Depending upon the version of Postgres downloaded the Q3C source may be available within the contrib directory of the Postgres installation. If not it may be downloaded from GITHub at https://github.com/segasai/q3c. The Q3C library should be placed in the contrib directory parallel to the other contribution. To build the Q3C library simply do

            make
            make install

in the Q3C directory making sure that the pg_config command points to the current Postgres installation.

After the code has been built log into Postgres as the Postgres superuser and enable the extension for the HEASARC database using the command

             create extension q3c;

The PGSphere library must be downloaded. The latest version is currently at https://github.com/akorotkov/pgsphere. After copying it into the contrib area and building it similarly to

Preparing for data ingest.

Before starting data ingest, an installer will need to download the the needed jar files. Jar files are modified zip files which are used to store Java classes. However they can also be used to store other kinds of files including data files used for running programs. The Xamin jar includes a number of utility programs (mostly written in Perl) that need to be unpacked to be used. The Xamin war file is created in building a full distribution of Xamin. This war file (a jar file containing the complete Web application) is currently created using NetBeans. To unpack the war, copy it to the directory in which you want to unpack it and

   >  jar xf xamin.war

Unfortunately JAR files do not seem to preserve execute permissions for scripts properly. One can simply set the execute flag for the scripts in the operations area, or use the Permissions.jar (which should be included in the Xamin distribution) with the executables.list file in the WEB-INF/classes directory. E.g., the command (see below for the definition of the logical names)

0
   > java -cp $XAMIN_JARDIR/Permissions.jar permissions.PermissionRestorer $XAMIN_ROOT/.. < $XAMIN_ROOT/../executables.list

will restore appropriate permissions.

Initializing the database.

The scripts for initializing and populating the HEASARC database are included with the Xamin Web service. The service should be installed as described below. The scripts described below can all be found in the operations area of the installed service (typically something like .../xamin/WEB-INF/classes/ops). .

The initialize command in the .../ops/ingest directory will perform many of the tasks noted below: defining the HEASARC database, populating it with HEASARC tables, creating the table documentation and master tables, setting up the areas for supporting VO TAP queries, and ingesting data products. More details on these steps are given below.

Many of the environment variables used in operations scripts are defined in the .../ops/daily/cronenv.sh file which is used to initialize them before they are run. The settings in this script should be carefully examined and may need to be updated. The environment variables include:

APP_ROOT: Defines the base directory of the overall Xamin web application.
XAMIN_ROOT: Defines the base directory of the Xamin operations scripts typically, $APP_ROOT/WEB-INF/classes/ops.
XAMIN_JARDIR: The location of the JARS used within Xamin, typically, $APP_ROOT/WEB-INF/lib.
LOG_ROOT: A directory containing subdirectories in which which system commands outputs are placed, notably the system operations logs. This does not log user requests to Xamin.
POSTGRES_ROOT: The root of a Postgres installation. This is mostly used to find Postgres executables, e.g., $POSTGRES_ROOT/bin/psql
XAMIN_LOGDIR: The directory in which system actions are logged, usualy $LOG_ROOT/logs
XAMIN_BACKUPS: The directory in which user backups are placed, usually $LOG_ROOT/bck
XAMIN_TMPDIR: A directory in which system commands can create temporary files, usually $LOG_ROOT/tmp
PGHOST: The computer where the Postgres database is to be found
PGUSER: The user account under which requests are to be made. Normally this will be the postgres or xamin user for system operations. This will be set in the settings file for external Xamin access.
PGDATABASE: The database to queried. Normally this will be the heasarc database.
TDAT_DIR: The directory (on the TDAT_HOST) where TDAT files are stored.
TDAT_HOST: The host machine on which TDAT files are stored.
HEASARC_SETTINGS: A pointer to the settings file to be user. The settings files are generally used to convey information to Java programs. Operations scripts normally use the dev.settings or ops.settings (depending upon whether this is a development or operational installation), while the web application generally uses heasarc.settings. All of these files are included in the $APP_ROOT/WEB-INF directory.

Creating the database

Postgres allows for multiple databases to be supported by a single Postgres installation. Each database can have multiple schemas and each schema multiple tables. Cross-database queries are not supported, but cross-schema queries are. All tables used within Xamin should be placed within a single database which may now be created either by using the psql command (the CLI interpreter for the database), or using the createdb command. E.g., using psql we might have:

    > psql
    prompt=# create database heasarc;
    CREATE DATABASE
    prompt=#

We place all tables into the heasarc database. Note that if we had run the previous command as the postgres user and have a separate xamin Postgres account we may want to

    prompt=# alter database heasarc owner to xamin;

to change the ownership of the HEASARC database to the xamin Postgres account.

Metadata and feedback table creation

The metadata table, metainfo, needs to be created before any other tables can be read into the system. The

    buildmeta.csh

command sets this up. If a feedback table is to be supported in the system this can be created using the

    buildfeedback.csh

command.

Copying tables

An initial ingest of HEASARC tables may be done using the [run]fullcopy command. The runfullcopy (in ops/ingest) sets up logs and such for the fullcopy command. Just

   > runfullcopy

will copy all HEASARC tables into the system using the current values of the TDAT files in the HEASARC's http://heasarc.gsfc.nasa.gov/FTP/heasarc/dbase/tdat_files directory. As discussed in operations below, this script is also used to update the database with new and updated tables.

The runfullcopy command uses other scripts in the directory. It calls fullcopy after setting up logs. This calls copyit for each matching TDAT file. The real workhouse inside copyit is pgcopy.pl which parses the TDAT files and creates the SQL to ingest the table. This is then sent to the Postgres psql to be executed.

The procedure populates both the tables themselves and the metadata regarding the tables. However it does not populate the data products related to a table.

In addition to populating the science tables, the runfullcopy command updates the documentation tables within Xamin. Xamin keeps a copy of all HEASARC and Vizier table documentation in the database. These can either be fully refreshed or updated depending upon whether an argument is specified in the command. The updateDocs.pl script is used by runfullcopy to find and ingest documentation.

Creating and updating the documentation table

The Tables Explorer keyword search capability uses the tabledocs table. This is also used when understanding the Vizier table structure. This table uses the Postgres TSVECTOR column to provide keyword searchable representation of HEASARC tables. The table_docs_create.sql command creates the table. The loadDocs.pl script extracts both HEASARC and VizieR documentation. HEASARC documentation is the HEASARC table HTML documents. VizieR documentation is the online documentation for each table. Before indexing them in the TSVECTOR column, loadDocs.pl strips these documents of their structure elements (i.e., HTML keywords).

This resulting SQL file should be run and then the fixdocs.sql script should be run. This last script adds in a bit of HEASARC metadata to the HEASARC descriptions, populates the TSQUERY column in the tabledocs table and gets rid of descriptions of HEASARC tables that are not present in the database.

Ingesting HEASARC product metadata.

The metadata for HEASARC products is ingested using the updateProducts command in ops/products, specifying a directory with the product definition files for each mission. E.g., as the code is distributed if the user makes the directory specified above the default directory then the command

   updateProducts missions

will ingest the data products for the missions (Found in ops/products).

The HEASARC has a products file for each mission and special products files for table links, external links, and bibliographic links that should be individually processed:
updateProducts PointedProducts.xml
updateProducts PosProducts.xml

A cron job which looks at the zzlink.tdat file updates the linked table dataproducts automatically.

The updateProducts command only updates a single Xamin database. If there are multiple databases that need to be updated (e.g., backup and test databases), then it
may need to be run multiple times to update the other tables. See the documentation for this command if needed.

Building master tables

Several master tables are used to support index queries where the user specifies a position and/or time and finds lists of tables that have data in the specified locations and/or intervals. These master tables are built using the tools in the ops/master directory. The createMasters.pl script does an initial creation of the master tables. Run the

    buildFullMasters.pl

script to populate the master tables: pos_big, pos_small, pos_time_big, pos_time_small and time. This process can take a couple of hours.

Master tables are views of two underlying tables a _prime table and a _bck. Creating the master tables is fairly expensive because we create a clustered index on fairly large tables. We do not wish to recreate them and regenerate the index each time we update a table. After buildFullMasters.pl completes, the _bck tables are empty. As tables in the database are updated, part of the update process is to add the position/time information from those tables into the corresponding _bck tables. Another table keeps track of the tables that have been updated since the last time we ran buildFullMasters.pl. When a user makes a subsequent query of the master tables, we look in the _bck table for information about science tables that have been updated, and in the _prime table for information about unchanged tables. Since only a small fraction of the tables get updates, the _bck table is much smaller than the _prime table. Thus even though it is not as efficiently indexed, master table queries can run quickly.

Periodically we run buildFullMasters.pl to bring the data from the _bck tables into the primary tables. Since this takes a couple of hours, the approach is to build up the new versions of the master tables using temporary table names. Only when the new master tables are complete do we drop the previous version and rename the new tables to the standard names. This takes seconds.

Preparing for VO Access

The Virtual Observatory Table Access Protocol (TAP) interface requires some support structures within the Postgres environment. The TAP_SCHEMA schema is used to provide standard metadata to TAP clients. A TAP_ASYNC schema is used ot support asynchronous requests from data from Xamin. Tables holding the results from asynchoronous requests are stored there along with tables that describe the status of requests. TAP requires support for Astronomical Data Query Language (ADQL) queries. ADQL defines extensions to SQL that make it easier to specify spatial constraints. A set of functions and types that enable the support for these constraints is defined. This ADQL specific functionality is defined in terms of another spatial library pgSphere which must first be added to Postgres. A pgSphere implementation is found in the ops/vo/pgsphere directory and includes a Makefile and instructions for installation. [Note: As of Postgres Version 9, the installation script for pgSphere fails to compile since the LANGUAGE attribute in the CREATE FUNCTION calls is case sensitive. References to LANGUAGE 'c' should be replaced with LANGUAGE 'C' and similarly with 'sql'.] After pgSphere has been installed the ops/vo/tap.csh file can be used to load the SQL statements that define the TAP-specific schemata and the ADQL functions.

The heasarc.settings file.

The heasarc.settings file is used to set up the defaults for the Xamin system. The file is normally found in the WEB-INF subdirectory of the Xamin web application. Below we provide an annotated version. Other installations will need to adjust the values to meet their requirements. End users should generally not need to worry about it. It is not used on the client side of the Xamin batch interface.

An example heasarc.settings file.

# The default database
# This will be used as the suffix in the jdbc_xxxx settings
database=postgres

# The JDBC parameters for Postgres

# This is probably the same for everyone.
jdbc_class_postgres=org.postgresql.Driver

# The host and database name will need to be updated.
# The port is the Postgres default port.
jdbc_url_postgres=jdbc:postgresql://db.host:5432/postgres
jdbc_user_postgres=dbuser
jdbc_pwd_postgres=password

# Nominal hosts versus actual host names. We want to
# be sure to translate any hostname we get into a valid
# host. So if we get heasarc we want the full path
# heasarc.gsfc.nasa.gov. If running on a load balancer
# then we want to translate the actual server name
# that we are running on, to the name that we want to
# publish.
heaweb1=heasarc.gsfc.nasa.gov
heaweb2=heasarc.gsfc.nasa.gov
heaweb3=heasarc.gsfc.nasa.gov
heaweb4=heasarc.gsfc.nasa.gov
# What to translate the host "localhost" to.
localhost=heasarc.gsfc.nasa.gov

# These are settings that the user is not allowed to change.
fixedSettings=database,jdbc_class_postgres,jdbc_url_postgres,jdbc_user_postgres

# The User account for non-privileged access
standard_user=dbuser

# Is there forwarding to a TOMCAT server. This happens only
# for the nominal released and operational versions, not development versions.
# When we are constructing a local URL from the elements available in the
# server environment we may find that the servlet is running on port 6080,
# but we want to publish the original port 80 for the Apache server.
ForwardedPort=80,6080

# The current version of Xamin.
# The first digit is incremented with major releases.
# The second digit is incremented when there is a significant change to user functionality.
# Letters are appended to show bug/documentation fixes that do not appreciably
# change end user functionality.
Version=1.3g

# The following two fields are place holders that we fill in during the installation
# procedure with appropriate versions for the HDB and Xamin projects that are used
# in building the Xamin web application.
HDBVersion=0
XaminVersion=0

# This is replaced by the current date in the installation procedure.
ReleaseDate=2011-01-01

# This is a relative or absolute URL to service that should be invoked
# to request a user login. Note that Xamin itself does not manage user logins.
# While a user is logged in, all Xamin requests use the login service
# to validate that the indicated user is logged in to the current session.
LoginServlet=/srvlogin

# This is a pointer to the location of the HEASARC hera service.
HeraServer=herasrv.gsfc.nasa.gov
HeraPort=60000
DftHeraName=AnonHera

# These are the E-mail addresses for administrators who will receive notifications
# when Xamin users submit feedback.
feedbackRecipients=user1@xxx.gov,user2@xxx.gov

# These provide a skeleton for the structure of download requests.
wgetcommand="wget -q -nH -r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks "
curlcommand="curl -s --remote-name "

# User download requests using the TAR download feature are constrained
# to the tree below.
DownloadPath=/FTP

Smaller versions of the settings files ops.settings and dev.settings are used by the operations scripts. These include the Postgres settings that should be used by the operations scripts (which will typically be different from that used in Xamin user queries).

Preparing the Web server.

The current Xamin framework has been tested within only the TOMCAT environment though there are no known reasons why other servlet containers could not be used. TOMCAT is available as a download from the Apache Web site for a variety of machines.

Customizing TOMCAT

Relatively little customization of the TOMCAT environment has been done, but other installations may wish to tweak things. To enable Web logs similar to those produced by the Apache Web server something like the following XML is required to be added to the Engine table of the server.xml file.

<Valve className="org.apache.catalina.valves.AccessLogValve"
      directory="logs" prefix="localhost_access_log." suffix=".log"
      pattern="common" resolveHosts="false"/>

The following entries in the VirtualHost information in the Apache server configuration files enables the TOMCAT server to run under the Apache server with any URL that begins with /xamin processed by TOMCAT and all other URLs handled directly by Apache.

   <VirtualHost>
        ....
        ProxyPass        /xamin http://localhost:nnnn/xamin
        ProxyPassReverse /xamin http://localhost:nnnn/xamin
        ....
   </VirtualHost>

The first line says that when the Apache server sees the request it should delegate to another web server running at port nnnn. The second line tells Apache that when it get back a response from the remote port it should scrub references in the headers to the actual server and replace them with references to the Apache server.

Setting up the Web applications

Xamin is intended to run in a servlet environment and at the HEASARC is developed using TOMCAT. To install Xamin in TOMCAT simply copy the xamin.war file to the TOMCAT webapps directory and restart TOMCAT. This should automatically install the system. Installers may need to update the heasarc.settings file in the webapps/xamin directory.

One disadvantage to the preceding approach is that the directory webapps/xamin directory that is created will be owned by whatever user is running the TOMCAT server. This may make it inconvenient to update the application. Installers can create the webapps/xamin directory manually and the install the application from the war file with

     mkdir webapps/xamin
     chdir webapps/xamin
     jar   xf some/path/to/xamin.war

where the user is assumed to start in directory in which TOMCAT has been installed. The web application may need to be restarted within TOMCAT.

The Xamin war file contains the full Xamin distribution including operational scripts. Scripts are generally found under the xamin/WEB-INF/classes/ops directory. A site may wish to move these operational scripts to other director
ies, however the WEB-INF area and its subdirectories are invisible to the outside world so this area can be used without revealing operational details to the public.

Installing new versions of Xamin.

In practice we use the xaminInstall.pl procedure to install new versions of Xamin. This includes multiple steps and provides detailed feedback on each of the steps in the process. It allows us to use a common installation procedure when there are multiple developers who may update Xamin. The steps in the install are:

Step 0: Get arguments and confirm we are on right machine.
Step 1: Create a temporary directory to place the data content in.
Step 2: Unjar the Xamin war into the temporary area.
Step 3: Copy the needed JavaScript, JSP and HTML files from the temporary area into the ExtJS compilation area.
Step 4: Use the ExtJS CMD commands to create the compressed JavaScript files
Step 5: Copy the compressed JavaScript files back to the temporary area. This will update the xamin.jsp and single box html files and add the compressed JavaScript files.
Step 6: Customize the settings file.
Step 7: Fix and update other files in the temporary area before release.
Step 8: Rename the current operational version to the backup name (prod and test)
Step 9: Rename the temporary area to the desired name.
Step 10: Restart the servlet.

There are a number of checks made to ensure that the size of the war file and the number of libraries and executable files is reasonable and that the servlet successfully restarts. Steps 3-5 relate to an optimization of the JavaScript used in the Xamin web application. The web client is divided into almost 100 JavaScript files. Downloading each of these separately can add a few seconds to the start of Xamin. So the default version of Xamin uses the features of ExtJS to compress the Java into a single file. So normally users can access Xamin through either an index.jsp (or no name after the final / in the URL), or xamin.jsp version. The Xamin.jsp version will download each JavaScript separately, while the default index.jsp version uses the compressed files.

The xaminInstall script requires two arguments: the name of the developer doing the installation (which is used to find a standard location for the input war file), and the type of installation being made. This determines where the installation is to be done. A prod or production release goes in the production area of primary HEASARC machines, test goes on HEASARCDEV in the standard location, and dev goes into special areas defined separately for each user. There are many other options which can be used to override the default behavior.

Setting up the CLI/Batch interface

Since direct access to the database is not available outside of GSFC the Xamin batch mode gets around this by providing a query proxy to the user. The user need download only a very small executable Jar file that contains two classes. The classes in this jar take the user's input arguments and convert them in to an HTTP POST request to the Xamin web server. If there are file arguments (i.e., UPLOAD) then these files are read and appended to the POST. If the user has specified an OUTPUT argument, then the output is written to the designated file. The users.jar file is normally accessible as .../xamin/distrib/users.jar.

The server side of the batch interface is a thin layer which initiates a servlet that runs the user specified query and returns the results. The CLI servlet is very similar to the servlet used in processing Web requests except for the handling of user authentication. If a user wishes to access an Xamin account through the batch interface, both the user name password must be supplied in the command arguments. These are authenticated using the HEASARC login service.

To execute the users JAR file the user should enter:

       java -jar user.jar arg1 arg2 ...

For example

      java -jar user.jar table=rosmaster,ascamaster offset=a:b:10

will run a positional cross-correlation of the rosmaster and ascamaster tables with an maximum allowed offset of 10 arc minutes.

Code Overview

This section describes the relationship of elements of the code used within Xamin.

The Query Library

Query processing is performed using the HEASARC Database (HDB) library written in Java. Users invoke this library through a thin servlet layer. The CLI interface calls this directly while the Web client does it internally in XMLHTTPRequests.

This sections describes the general flow of control in processing a query with a view to showing how the various classes work together.

Startup

The primary class for query processing is the Controller class. For CLI processing this is initiated using the main() method, while in the Web environment a Controller instance is instantiated and its exec method is called. The special SQLController class is used if the SQL used in the query is provided directly rather than having to be built up from other arguments. This bypasses the code that constructs the query, some parsing and modification of the query may be made.

The first order of business for the Controller is to set up three global structures. The Connecter is used to get connections to the underlying database or databases. The Settings are used to manage the system and user settings. The MessageHandler is used to convey information about processing to the user. Each of these is available in a global context. To allow for robust access even in an environment where there may be many simultaneous queries, these three globals use a static ThreadLocal variables to ensure that each instance of a query provides an independent instance to the query processing.

The Connecter allows for reuse of a single connection to the underlying database or when multiple connections are desired to get as many connection as needed. During shutdown it ensures that all connections are closed.

The Settings are essentially is just a Map from keywords to string values. Settings are set by system set up files, user arguments and also by program operations.

The Message Handler provides a standard way to send messages back to the user. Initially it is set to simply echo whatever is sent on the standard error output. Once the query processing has progressed to a stage where the desired output format has been established, a message handler compatible with that format will be used. E.g., for a VOTable the messages will be rendered as INFO elements at appropriate locations in the VOTable.

Help and User account processing

After establishing the Settings the Controller begins to analyze the request the user has made. There are a few non-query requests that are handled first.

If the user has requested help documentation the Help class is used to statisfy the request. If a USER setting has been specified, the the UserAccount class is used to ensure that the password specified is correct (or that the user has consistently forgone password protection on the account).

Next there is a check to see if this is an account processing directive. A user may create an account or modify it using the functions in the AccountManager or view or delete tables using the TableManager.

Setting up the query

After verifying that the request is not one of the non-query requests, the Controller.runQuery method starts setting up the query environment. The first step is determining the output. A StandardDisplay object is created which will handle the communications with the user. A FormatFactory which creates the WriterElements appropriate for the selected format is created and the WriterElement are plugged into the StandardDisplay. WriterElements separate out tasks for writing. Notably there are WriterElements for writing table information and distinct elements (but associated through the FormatFactory) for writing out the data products associated with the element. The global MessageHandler is updated so that messages will be incorporated appropriately into the output, e.g., if the output is FITS, the messages may be rendered as FITS comments.

If the user has requested information about the setup environment, (showArguments or ShowSettings), then these are handled by sending the appropriate messages.

Finding the tables to be queried

The first step in building the query is determining which tables are used in the query. Tables can be either either local (e.g., already present in the database), or uploaded. Uploaded tables will be added to the database temporarily for the duration of the query and then deleted afterwards. If there is a user account associated with the query, then the uploaded table can be persisted.

The Controller looks for a TableFinder class to use. Normally this will be the StandardTableFinder, but in principle this can be overriden by the user.

The TableFinder looks to the TABLE keyword to find any existing tables that are to be used in the query. It creates a QueryBuilder class that will be used to store information about the query and add Tables to it that represent each of the user selected tables. A MetadataInfo object is created that has all of the required metadata for the system by querying the system metadata table. The metadata for tables not participating in the query is not included in the MetadataInfo object. To maximize efficiency when many Xamin queries are being done, metadata is cached and updated periodically, typically every 5-10 minutes. Thus when the metadata tables are updated the effect may not be seen immediately.

Next come the uploaded tables. These are specified using the UPLOAD. Format specific classes (TDat, CSV, VOTable or SourceList) implement the Table interface for the input data. A generic TableCopier class then copies this table into underlying database as a temporary table. Metadata for the uploaded tables is added into the MetadataInfo object.

Building the Query

Once the tables that are to used in the query are found, work begins on setting up all of the SQL clauses in the query using a QueryConstructor. User specified constraints are added to the Where clause. There may be explicit constraints using the CONSTRAINT keyword, or are converted to explicit constraints using the user specified time or positional constraints. Constraints use ParsedExpressions which identify the underlying columns used in each constraint. This allows us to deal with the aliases that tables may have in the query. Generally the first table is aliased to 'a' the second to 'b' and so forth. A TableAliaser manages the mapping between tables and aliases.

Positional constraints involve substantial work. User specified positions may be in a variety of coordinate systems and may involve source names. the SkyView SourceCoordinates class is used to handle this. DistanceCalculator classes use different ways to specify a maximum radius constraint for a query. This is used both for point sources and cross-correlations.

The output fields for the query may be defaulted or may be explicitly specified.

As each of the user inputs is processed the QueryBuilder object is updated appropriately. E.g., position, time, and explicit constraints are added in separate steps. Any requested sort order is saved. If the user has requested a grouping clause that can be added.

Running the Query and Security Checking

After all of the user inputs are processed, the QueryBuilder generates the full SQL for the query. The query is validated using the SecurityChecker class. This uses an SQL parser that we updated from the JSQLParser project. While this parser does not support all queries available within Postgres it does a pretty good job of general SQL parsing including supporting what is needed for the VO ADQL parsing. The SecurityChecker class and the parser confirm that the SQL parses successfully, and that it meets our security constraints: it only accesses table that it should have access to (not
system tables or other users' tables), it only accesses functions on the while list of allowed functions, and that it does not create unexpected artifacts.

Every year or two there is a need to add new functions to the white list of accepted functions. This requires updating and recompiling the SecurityChecker class.

This SQL is then sent to Posgres with an 'explain' prefix. This confirms that Postgres can execute the query and returns an very poor estimate of how long the query will take.

If there are no problems, then the QueryBuilder returns a Query. This is executed to return a Table.

If the user has requested that the results of a query be saved as a user table, then two SQL statements are actually executed. The first statement uses a select into user.xxx ... to run the query and save the result as a user table. The second query is just select * from user.xxx and displays the results of the table that has just been created.

Displaying the results

Up to this point there have been no results shown to the user other than possible messages. Now the Display object's write method takes the Table returned by executing the query and calls methods for each of the WriterElements that have been plugged into it. Each writer element has methods to be invoked at the beginning of processing, for each row and after processing. E.g., if we are asking for data products, then there will be at least two WriterElements called for each row. The first will write whatever is desired regarding the actual table results, the second will look to see if there are any products associated with the row and if so will display them in some format appropriate fashion.

The Web Client

The Xamin Web client is primarily built in JavaScript and uses the Sencha ExtJS JavaScript library in building the GUI. The interfaces envisages a user treating the Web page as a virtual console, opening and closing panes as they process requests.

When the web client starts up, it first queries the Xamin server to get the current list of available tables. It processes any user arguments and then displays the primary Query pane to the user. Users can select tables and and add constraints to queries. Depending upon what the user has done various capabilities are enabled. Eventually the user submits a query. The users inputs are processed, and a query, in the same format a user may use in the CLI interface, is sent using an XMLHTTPRequest to the Xamin server. The results are processed in the web client and displayed to the user in a new pane.

The Xamin Web interface is described in detail in its own document.

Xamin Servlets and JSPs

Both the CLI and Web libraries use servlets to initiate requests. These are primarily written in Java with some small JSP files. A key function of the servlets is to transform the input arguments (which may include file uploads) into the key/value pairs understood by the clients. The servlet layer also addresses user security issues. A number of utilities are provided as servlets for use by the web client. The VO TAP interface requires substantial coding in the servlet layer for its implementation. Key elements of the servlet layer include:

Key Servlet Packages and Classes
heasarc.xamin	The package containing the primary servlets for both CLI and web requests
QueryServlet	The main class for processing web client query requests. It parses the inputs and returns the results as needed
CLIServlet	The main class for processing cli queries
TableParams	Get the parameters for a given table (or tables)
BatchEmul	Emulate Browse batch queries
heasarc.xamin.node	Classes for building nodes in the table selection trees including both HEASARC and Vizier tables
heasarc.vo	Support Cone, TAP, SIA, SSA and DataLink queries using VO protocols
TapAsync	Support TAP Asynchronous queries
Vosi*	Support TAP VOSI capabilities
ADQLServlet	Synchronous ADQL query
heasarc.stilts	Invoke STILTS library for plotting
heasarc.sources	Invoke coordinate converter to see if token is source name
heasarc.downloads	Support download requests
Key JSP files
xamin.jsp	The primary file for the Web interface
getDoc.jsp	Get the documentation for a table
delete/addUser.jsp	Delete or add a user as requested by the login service

The Metadata Table

A single metadata table, metainfo, is used to store all metadata for the Xamin system. When a user account is created an extension metainfo table is created in the schema for the user and the join of this user table -- which contains metadata specific to user tables -- and the system table used. The metainfo table uses a generic structure for representing metadata rather than providing a specific structured tables. This allows for great flexibility in the growth of the metadata at the cost of some complexity in the queries and size of the metadata.

To find information in a table we generally need to know the table, row and column we are interested in. Each row of the metadata table is essentially a table, row, and column identifier followed by the information that belongs there.

There are some additional sources of information that are used in Xamin.

Bibliographic information is stored in a the zzbib table but zzbib is not a special metadata table, it is queryable just as any other table. The only special feature of this table is that it is used in the When clauses in a number of conditional products -- but using the same syntax as other conditional links.

Structure of the Metainfo table.

The metainfo table comprises five columns:

type: The type column is used to specify the kind of data being described. It is analogous to the table name in a conventional representation. The query select distinct type from metainfo is the equivalent of determining what tables are present. Commonly used types include mission, table, product, linkfield, parameter.
name: The name column is used to specify which element of a given type is being looked at. It corresponds to a row identifier in a table view. E.g., if we are providing metadata on tables, then the type field might be 'table' and the name field would be the name of the table. Some care is needed to ensure uniqueness of the name within a type. E.g., it would not enough to simply use the name of the parameter, the name should include the table name and parameter name to ensure uniqueness across all tables.
relation: The relation can be thought of as the column identifier. It describes what the relationship the current piece of information has to the element being described. A table might have a 'description' or an 'author' or any number of characteristics. A parameter may have a 'unit' or an 'ucd'. Some relationships involve other kinds of data. E.g., a table might 'include' a parameter or 'link' to a product.
value: The actual content is presented in this row. E.g., for a table description it will be some text describing the table. For a product template it might be a string that can be used to find files associated with the given product. The value is a text string which in some cases may encode a numeric value.
vorder: Unlike a conventional table, the metainfo structure allows multiple entries with the same type, name and relation. This is often very desirable, e.g., for a table to link to many parameters. In cases where the order of these relationships is important the vorder keyword can be used to specify this ordering. For scalar values or where the order is not significant it can be left null. This integer field is the only non-text field in the metadata.

Current types and relations in the metadata

Type (table)	Relation (column)	Description

document	The document type stores information about document used somewhere in the system. Currently the only information for a document is its URL, but author, update time, format, ... may be added in the future
document	located	The URL of a document

linkfield	A linkfield describes the mapping between the values available in a given table, and the variables used in a product template.
linkfield	field	The variable name used in a product template description
linkfield	value	The SQL used to to extract the template from a row of the table

mission	The mission type is currently mostly a placeholder, but may include information about instruments, PI's, wavelength coverage and more in the future
mission	end_time	The start of mission science operations
mission	start_time	The end of mission science operations

parameter	The parameter type gives the characteristics for a particular parameter of a given table.
parameter	description	A short text description of the parameter
parameter	format	The format in which the parameter should be displayed
parameter	maximum	The maximum value for the parameter in the table
parameter	minimum	The minimum value for the parameter in the table
parameter	note	A comment on the parameter
parameter	order	The order in which the parameter is to be displayed. If positive then the parameter is a default/standard parameter. If negative, then when all parameters are displayed the order should be the absolute value of this parameter, except that values of 0 come last.
parameter	otype	The type of the parameter in the Browse system
parameter	ucd	The uniform content descriptor for the parameter
parameter	unit	The unit[s] of the parameter

product	The product type is used to describe a data product which may be an archival product, bibliographic reference, table link or generic external product.

product	description	A short text description of the product
product	includes	A link to a child product
product	links	User to link the generic products
product	match	A template string used to find files or URLs for a product given values in a row
product	type	The general type of the product (e.g., FITS, ASCII, HTML, ...)

table	This type describes the characteristics of a particular table in the database.
table	author	The author of the table
table	bibcode	The bibcode for a paper describing the table
table	containedBy	A mission which includes this table (may be repeated)
table	contains	A link to a parameter entry
table	defaultSearchRadius	The default radius for position queries of this table
table	describedBy	A document describing the table
table	description	A short text description of the table
table	end_time	The column name for the column containing the end time for a table
table	includesTypes	Object types described by this table
table	j2000coordinates	A pair of columns giving the J2000 coordinates for the row
table	lastUpdated	The date when this table was last updated
table	priority	A numeric priority associated with the table
table	regime	The regime of the table (may be repeated)
table	start_time	A column name for the start time associated with each row
table	tableType	The kind of table (observation, object, ...)

product	description	Text description of specific product
product	includes	Children of product
product	match	Template to build URL or file name (normally using some variables defined in tablelinks.).
product	type	Type/format of product (e.g., directory, GIF, FITS, ...)
product	siacolumn/ssacolumn	Information needed for SIA/SSA requests to this product.

tablelink	A tablelink maps a table to product. It indicates the linkfield mappings that are to be used in finding the products for a given row.
tablelink	linkfield	A linkfield used in this link from a table to a product
tablelink	product	The name of the product being linked to
tablelink	table	The name of the table being linked from
tablelink	when	A conditional expression indicating if this link should be made for the current row

linkfield	field	The name of the variable being defined for this link
linkfield	product	The SQL used to extract the variable for this linkfield

user	User entries are only found in the user metainfo tables, not in the primary metainfo table. Exactly one user name should be found in each user metainfo table.
user	email	The E-mail address of the user.
user	password	An encrypted version of the user password (if supplied).
user	status	The status of the account: active/inactive/validating'

s[si]acolumn	various	Metadata associated with SIA/SSA data requests
s[si]alink	various	Similar to tablelink, but for for SIA/SSA data requests
s[si]alnkfield	various	Similar to linkfield for for SIA/SSA data rquests

The Master Tables

A set of master tables is used to store time and position information so that users can quickly find all tables that match specified positional or temporal constraints. These tables are found in the master_table schema. A total of 11 tables and 5 views are used.

Name	Table/View	Description
pos_big_prime	Table	Weekly positions where the position error is > 1 degree.
pos_small_prime	Table	Weekly positions where the position error is < 1 degree.
pos_big_bck	Table	Incremental positions where the position error is > 1 degree.
pos_small_bck	Table	Incremental positions where the position error is < 1 degree.
pos_big	View	Current positions where the position error is > 1 degree.
pos_small	View	Current positions where the position error is < 1 degree.
pos_time_big_prime	Table	Weekly positions/times where the position error is > 1 degree.
pos_time_small_prime	Table	Weekly positions/times where the position error is < 1 degree.
pos_time_big_bck	Table	Incremental positions/times where the position error is > 1 degree.
pos_time_small_bck	Table	Incremental positions/times where the position error is < 1 degree.
pos_time_big	View	Current positions/times where the position error is > 1 degree.
pos_time_small	View	Current positions/times where the position error is < 1 degree.
time_prime	Table	Weekly times
time_bck	Table	Incremental times
time	View	Current times
update_list	Table	Tables that have been updated since weekly cleanup

Position and Time Tables

There are two kinds of information stored in the master tables: positional and temporal. The pos tables are used to support position-only discovery queries. The time tables are used to support temporal-only queries. The pos_time tables are used to support queries where the user specifies both time and position constraints.

Position tables

Positional information is split into two tables, big and small. This split is needed to enable efficient queries when the user wishes to use the table-specific (or occasionally row specific) default search radius in the query. If we query the table using the default search radius constraint we get a constraint that looks like dec between XX-dsr and XX+dsr where XX is the declination of the requested center. In our database, this constraint cannot take advantage of the clustered index on declination. However if we know that the default search radius is always less than say 1 degree we can put in the constraint dec between XX-dsr and XX+dsr and dec between XX-1 and XX+1. The additional constraint using a constant range allows us to make effective use of our index. However some HEASARC tables (e.g., INTSCW) have very large default search radii, greater than 20 degrees. To make effective use of the dec index, then we cannot search them with this technique. Fortunately there are relatively few such tables.

So to make positional queries with default search radii efficient we separate the tables with relatively small default search radii and index them in the pos_small master tables. Those with large search radii are placed in pos_big. When we want to find all of the tables we run two queries (as a union). The query of the pos_small tables includes a constraint using the maximum default search radius that can take advantage of the dec index. The query of the pos_big tables does not include this additional constraint, but this table is much smaller since only a very few tables have such large error bars.

Time Tables

Time information is kept in separate tables for similar reasons. A typical time query asks for information that spans a given instant or interval. It would be very efficient if we could add simply add a constraint that the row be within some constant time of the users requested time. E.g., we might use time between TT - 10 and TT+10 if we knew that no row spanned an interval more than 10 days. However the HEASARC database contains tables where the time interval for a given row is years (typically when the row describes a series of observations, e.g., in XTEINDEX). So correct time searches can't use an index on time as efficiently as we might wish. By extracting the time information into a separate table we are able to search a much smaller table and the query runs in an acceptable time even though it makes limited use of the indexes.

Position/Time tables

The pos_time tables accommodate queries that specify both temporal and positional constraints. These are the smallest tables in number of rows. These are kept distinct from the purely temporal tables to maximize the efficiency of the time-only queries.

Primary and backup tables

All of the underlying tables are paired as prime and bck Periodically the master tables are regenerated from the tables in the HEASARC database. All current information from all HEASARC tables is then available in the prime tables. The bck and update_list tables are emptied. During the week, some tables may be updated. Whenever a table is updated, any rows associated with that table in any of the bck tables is deleted and then appropriate position/time rows are placed in the appropriate bck tables. The deletion is done first to accommodate the common case that a table is updated multiple times during the week. The name of the table being updated is added to the update_list.

Current views

Five views are provided that combine the prime and bck tables such that the user sees only the most recent information for each table. E.g., the views look something like

          select AA from xx_prime
             where table_name not in (select table_name from update_list) and BB
            union
          select AA from xx_bck where BB

By referring to the views, the users of the master tables are isolated from the update process.

Indexing

To support efficient queries the master tables are clustered (have a primary index) on declination where available or time. Clustering substantially increases the responsiveness of the query. Indexes are also provided on the table_name field.

Operations

The following table describes some of the routine operations required for Xamin at the HEASRC. Many of these tasks are automatically initiated as cron jobs. Cron jobs use the /www/server/xamin/ area for links to non-Xamin executables (e.g., PostgreSQL commands), temporary storage, backups and logging. The outputs for many of the cron jobs are logged in /www/server/xamin/ops/logs. The environment for cron jobs is set in the xamin/WEB-INF/classes/ops/daily/cronenv.sh which is sourceed before the cron jobs are executed.

The scripts are given with respect to xamin/WEB-INF/classes/ops. Logs are generally kept in /www/server/xamin/ops/logs.

Process	Frequency	Initiation	Script	Description	Log
Update tables	Every 12 hours	Automatic	ingest/runfullcopy	Update HEASARC tables in the Postgres database. Also update HEASARC and Vizier documentation.	xamin.ingest.DATE
Update documentation table	Every 12 hours	Automatic	Included in runfullcopy	Update the table documents table.	See above
Update master tables	As needed	Auto	Included in runfullcopy	When tables are updated add position/time information to master `_bck` files.	See above
Update bibliography tables	As needed	Automatic	Included in runfullcopy	Add entries to bibliography tables. As far as Xamin is concerned this is just a table update, but the table is maintained by the HEASARC. This happens automatically whenever runfullcopy sees the zzbib.tdat file updated.	See above
Clean master tables	Weekly	Automatic	master/buildFullMasters.pl	Put all master table information into _prime tables.	fullMasters.DATE
TAP async user tables/jobs cleanup	Daily	Automatic	daily/cleanasync.pl	Clean up all information on jobs in the TAP_ASYNC schema that are past their destruction time.	cleanup.log.DATE
User usage review	Weekly	Automatic	ops/daily/sizeReport	Review disk usage of all users. This will also such TAP asynchronous and master table usage. The output of this cron is not logged so it is sent as a mail message to the cron initiator.	Cron user e-mail
Run table monitor	Daily	Automatic	daily/monitor.pl	See if the queries that have created user tables would now give different results and update/notify tables/users as requested.	monitor.log.DATE
HTTP logs	Daily?	Auto	TBD	Copy TOMCAT http logs -- may want to join with Apache logs	TBD
User backup	Daily	Automatic	daily/userbck.pl	Backup Xamin user account information. The HEASARC's TDAT files serve as a backup for the science tables so a backup of those is not required.	userbck.log.DATE
Testing	Hourly	Automatic	tests/batchTest.pl	Run tests that show that system is operational. At least one test (comprising multiple queries) is run each hour. Every table is queried at least once each day.	autochecks.log
Update tables test	Daily	Automatic	tests/makeTables.pl	Since the tables in the Xamin system can change we need to keep the tables test up to date. This updates tests/tables.tests.	Not logged
Statistics	Quarterly	Manual	TBD	Compute statistics including at least: Number of query requests Frequency given tables were requested Total number of rows downloaded. Table statistics aggregated by mission Download requests per mission Download volumes per mission	NA
Compare Xamin and Browse	Weekly	Automatic	ingest/compareXaminBrowse	Ensure that the tables seen in Xamin and Browse are consistent.	Cron user e-mail
Delete tables from Xamin	As needed	Manual	ingest/deleteTable	Delete a table that is no longer used in Xamin.	delete.TABLE
Add user schema	As requested by srvlogin	Automatic	xamin/jsp/createUser.jsp	Add a schema for a new user account. This script is invoked by the HEASARC login service and is no externally invokable	Not logged
Delete user schema	As requested by srvlogin	Automatic	xamin/jsp/deleteUser.jsp	Deleta a schema for user account. This script is invoked by the HEASARC login service and is not externally invokable.	Not logged
Add/update data products	As needed	Manual	products/updateProducts	When new products are defined, the mission products definition file in products/mission/MISSION.xml must be created or updated. This script reads this file and updates the metadata table to define the products.	Not logged

Schedule

This table summarizes the current schedule for Xamin cron events. Times are Eastern time.

Time	Event
00:50	Test: basic tests
01:50	Test: infrastructure tests
02:30	Clean asynchronous
02:50	Test: basic tests
03:00	User backups
03:10	Daily monitor
03:30 Tu	Build full masters
03:50	Test: infrastructure tests
04:50	Test: basic tests
05:00	Table ingest
05:50	Test: infrastructure tests
06:50	Test: basic tests
07:50	Test: infrastructure tests
08:50	Test: basic tests
09:50	Test: infrastructure tests
10:50	Test: basic tests
11:50	Test: infrastructure tests
12:50	Test: basic tests
13:50	Test: infrastructure tests
14:50	Test: basic tests
15:50	Test: infrastructure tests
16:50	Test: basic tests
17:00	Table ingest
17:50	Test: infrastructure tests
18:50	Test: basic tests
19:50	Test: infrastructure tests
20:50	Test: basic tests
21:00	Update tables test
21:00 Th	Generate user size report
21:50	Test: infrastructure tests
22:00	Compare Xamin and Browse
22:10	Test: table tests
22:50	Test: basic tests
23:50	Test: infrastructure tests

Acknowledgements and licenses

Included software

ExtJS

The ExtJS JavaScript library is used extensively in the Web interface. The HEASARC only uses versions of ExtJS released under the GPL.

pgSphere

The pgSphere provides support for spherical geometry within Postgres. This library is available at its home site. No license is specified in the distribution. .

nom.tam.fits

The nom.tam.fits library is used for writing FITS output.

com.ice.tar

This library provides several classes to support writing TAR files from Java. It includes the statement:

This work was authored by Timothy Gerard Endres, time@gjt.org.
This work has been placed into the public domain.
You are free to use this work in any way you wish.