Xamin can be used through a web interface designed as a Rich Internet Applicaiton (RIA) and also through a command line interface. All facilities are available through both interfaces.
https://heasarc.gsfc.nasa.gov/xaminThe Xamin web application contains a number of end-point URLs. This web application includes not only the run-time code, but full documentation and code for the installation and maintenance of Xamin.
Find matches
in... Selected Tables
to search all of the tables in
the mission for the selected time or positions. You can add
tables from other missions, or delete some of the tables before
querying if you like. You can then query the selected tables for
position or time constraints. Query to...Correlate
.
Spatial cross-match constraints will be added automatically, but
you can override or modify these to meet your needs. Query to...Save
to save
this in your account. Bring up the summary of your saved tables
with theSession/User Tables--Info
menu. If you
click on the Frequency column next to your saved table you
should get a pane that allows you to ask to be informed whenever
the result of this query change. All tables can take part in correlations. However remote tables must be downloaded to the HEASARC before the correlation can take place. So in practice correlations of remote resources can only be done if the tables are of modest size or if constraints that can be passed to the remote site can restrict the size to a reasonable number. It may be impractical to query remote tables where the size of the downloaded data is more than ~10,000 rows.
In the web interface spatial correlation constraints are assumed and created for the user -- they can be deleted if not desired. In the command line version the OFFSET keyword is used to specify positional correlations, but the user must include this explicitly.
Temporal cross-correlations can be made as simple constraints on the time columns of the two tables. At the HEASARC times are stored in MJD's, so to ask for observations within a day of one another we can add a constraint of the format
b.time-1 .. b.time+1in the Parameters explorer for the field a.time. In the command line interface that might be specified as
constraint='a.time between b.time-1 and b.time+1'where the quotes are to ensure that the shell treats the value as a single string.
While any number of tables can be correlated in principle, queries tend to become very slow when large numbers of tables are involved. The web interface imposes a maximum of four tables in a correlation, no limit is specified for the command line. This limit can be gotten around if necessary by breaking up the correlation into a series of correlations and saving intermediate results as tables.
Note that Xamin does not do careful analysis of the input constraints to determine if pieces of an input constraint can be included in the second query. E.g., compare the constraint
constraint='d.exposure>1000 and c.exposure>1000'versus
constraint=d.exposure>1000,c.exposure>1000From Xamin's perspective in the first example we have a single constraint. If
d
is the anticorrelation table, then in
the first case Xamin will not include any constraint on the exposure
from table c
when it runs the second part of the
anticorrelation. In the second case, since the constraint on table c
is specified independently this constraint will be included in the
second query.
When doing an autocorrelation it is usually necessary to make sure that you don't return a row matching itself. A simple way to ensure that is to add a constraint of the form
constraint='a.id != b.id'in the query. In the Xamin web interface just enter
!= b.id
next to the entry for a.id
. Of course the appropriate
field may be named something other than id
.
There are two fundamental types of products:
As data products are collected they are added to a shopping cart. The shopping cart can contain products from multiple selections and from multiple queries. Users can also download individual data products directly (or in the case of directories open a web link to the directory). Link products cannot be added to the shopping cart -- they generally involve data that is intended to be viewed in browser or which are found at remote sites.
At any point in an Xamin session users can download the data in their shopping cart. They can download individual products one by one from the shopping cart, but if there are more than two or three products users may prefer to collect the data products together. A tar file of all of the products in the cart can be created. This tar file is created dynamically so the download should start almost immediately.
Whenever a significant amount of data is to be downloaded we recommend that a data download script be created rather than directly tarring the data. Scripts can be created which use either the WGET or CURL commands both of which are widely available. Scripts are much more easily restarted and allow the users complete control of the data download process. The script can be displayed and copied via striping or it can be downloaded explicitly.
In the command line environment a query can be used to build a download script to download archival products or to provide a list of links to link products.
On-line users can also save the current query configuration at
any time into a named configuration. This configuration can be
restored at any later time. Note that the configuration only
affects the Query window, not any of the other Xamin windows. If a
configuration is saved with the name default
, then
auser can automatically restore this configuraition in subsequent
logins. To change the default configuration the current default
must first be deleted.
Examples:
Examples:
Examples:
Examples:
Examples:
Examples:
Examples:
This section describes the steps in creating the complete Xamin data access system. The discussion of each step includes both the general issues that need to be resolved by an implementor in their system, and the specific choices made at the HEASARC.
This section describes a complete installation of the Xamin server. Users who only wish to be able to use the Xamin from their CLI environment need only download the very small users.jar file from the Xamin distribution directory (e.g. http://heasarc.gsfc.nasa.gov/xamin/distrib/users.jar).
Xamin is designed to use a PostgreSQL database. Some PostgreSQL specific features are used in the software to populate the system and in the query system. The keyword search interface uses PostgreSQL features that paralleled in some other systems (e.g., SQLServer), but may not be generally available. Postgres specific features of the system are called out
Postgres is freely available at:
http://postgresql.org
While one click install downloads are available, these generally require superuser privileges on the destination machines..
The tar file should be unpacked with
> tar zxf postgresql-X.Y.Z.tar.gzthis creates a subdirectory of the current directory of postgresql-8.4.2. An INSTALL file with detailed directions for installing Postgres is available in this directory. The path used for the HEASARC installation is noted below.
Within the postgres-ql-X.Y.Z directory
> ./configure --prefix=desired_postgres_directorye.g.,
> ./configure --prefix=/local/db/postgres
Then execute:
make make installto compile and install Postgres in the desired location.
The current Postgres operational machine is installed on the DBMS1 machine.
The Postgres server is ideally run under a special user account but may be run under a normal account. The special account can be created using normal system procedures.
To initialize the database first create the directory in which the database data is to be stored. This directory should be owned by the same user as will run the server.
> mkdir directory_for_datathen initialize the database using
> initdb -D directory_for_datawhere initdb is in the bin subdirectory of your Postgres installation.
To create an initial superuser account and password you may wish to include the -U and -pwprompt or -pwfile=file arguments to the initdb command. The -U argument specifies the name of a superuser account to be created for the database (e.g., postgres). The -pwprompt argument indicates that the command should prompt for the password for the account while the -pwfile argument gets that password from a file.
The output from the initdb
command indicates how
one can start the database server. This command should be put in
the startup scripts for your database machine (but is not
currently for the HEASARC).
Once the database has been initialized some small amount of customization of the configuration files is desirable. These are found in the data directory mentioned above.
shared_buffers = 500MB temp_buffers = 8MB max_prepared_transactions = 5 work_mem = 16MB maintenance_work_mem = 64MB max_stack_depth = 8MBAll of these lines are updated from the values in the default configuration which is sized for a small machine.
A second account, e.g., xamin, may be defined as the owner of the xamin database with all associated privileges, or this can be done under the postgres account. Using a separate xamin account is preferable since this account need have no privileges outside of the Xamin database. The account will be used in many of the internal operations scripts, but is only invoked by external user actions when new 'user accounts' are created and destroyed and the associated schemas need to be created or destroyed in the Xamin database.
The webuser account is the account that external users normally invoke when accessing Xamin. The webuser account has the privileges to read all non-system tables and to create and update tables in user schema where the tables associated with user sessions will be supported. Note that all users share the same actual database accoun but cannot read other users' data since a password is required to access session data.
The names and passwords of these accounts are needed in the [heasarc|ops|dev].settings files found in the Xamin WEB-INF area. Generally the ops/dev settings files will use the xamin account which is also the privileged account in the heasarc.settings. The webuser account is used as the non-privileged account in the heasarc.settings.
To create a user in Postgres use the CREATE ROLE SQL command or the createuser script provided in the Xamin user area. E.g.,
create role xamin with login encrypted password 'password'; create role webuser with login encrypted password 'another';
make make installin the Q3C directory making sure that the pg_config command points to the current Postgres installation.
After the code has been built log into Postgres as the Postgres superuser and enable the extension for the HEASARC database using the command
create extension q3c;
The PGSphere library must be downloaded. The latest version is currently at https://github.com/akorotkov/pgsphere. After copying it into the contrib area and building it similarly to
Before starting data ingest, an installer will need to download the the needed jar files. Jar files are modified zip files which are used to store Java classes. However they can also be used to store other kinds of files including data files used for running programs. The Xamin jar includes a number of utility programs (mostly written in Perl) that need to be unpacked to be used. The Xamin war file is created in building a full distribution of Xamin. This war file (a jar file containing the complete Web application) is currently created using NetBeans. To unpack the war, copy it to the directory in which you want to unpack it and
> jar xf xamin.warUnfortunately JAR files do not seem to preserve execute permissions for scripts properly. One can simply set the execute flag for the scripts in the operations area, or use the Permissions.jar (which should be included in the Xamin distribution) with the
executables.list
file in the WEB-INF/classes directory. E.g., the command (see below
for the definition of the logical names)
0 > java -cp $XAMIN_JARDIR/Permissions.jar permissions.PermissionRestorer $XAMIN_ROOT/.. < $XAMIN_ROOT/../executables.listwill restore appropriate permissions.
The initialize
command in the .../ops/ingest
directory will perform many of the tasks noted below: defining the
HEASARC database, populating it with HEASARC tables, creating the
table documentation and master tables, setting up the areas for
supporting VO TAP queries, and ingesting data products. More
details on these steps are given below.
Many of the environment variables used in operations scripts are
defined in the .../ops/daily/cronenv.sh
file which
is used to initialize them before they are run. The settings in
this script should be carefully examined and may need to be
updated. The environment variables include:
Postgres allows for multiple databases to be supported by a single Postgres installation. Each database can have multiple schemas and each schema multiple tables. Cross-database queries are not supported, but cross-schema queries are. All tables used within Xamin should be placed within a single database which may now be created either by using the psql command (the CLI interpreter for the database), or using the createdb command. E.g., using psql we might have:
> psql prompt=# create database heasarc; CREATE DATABASE prompt=#We place all tables into the
heasarc
database. Note
that if we had run the previous command as the postgres user and
have a separate xamin Postgres account we may want to
prompt=# alter database heasarc owner to xamin;to change the ownership of the HEASARC database to the xamin Postgres account.
The metadata table, metainfo, needs to be created before any other tables can be read into the system. The
buildmeta.cshcommand sets this up. If a feedback table is to be supported in the system this can be created using the
buildfeedback.cshcommand.
> runfullcopywill copy all HEASARC tables into the system using the current values of the TDAT files in the HEASARC's http://heasarc.gsfc.nasa.gov/FTP/heasarc/dbase/tdat_files directory. As discussed in operations below, this script is also used to update the database with new and updated tables.
The runfullcopy
command uses other scripts in the
directory. It calls fullcopy
after setting up logs.
This calls copyit
for each matching TDAT file. The
real workhouse inside copyit is pgcopy.pl
which
parses the TDAT files and creates the SQL to ingest the table.
This is then sent to the Postgres psql
to be
executed.
The procedure populates both the tables themselves and the metadata regarding the tables. However it does not populate the data products related to a table.
In addition to populating the science tables, the runfullcopy
command updates the documentation tables within Xamin. Xamin keeps
a copy of all HEASARC and Vizier table documentation in the
database. These can either be fully refreshed or updated depending
upon whether an argument is specified in the command. The updateDocs.pl
script is used by runfullcopy
to find and ingest
documentation.
This resulting SQL file should be run and then the fixdocs.sql script should be run. This last script adds in a bit of HEASARC metadata to the HEASARC descriptions, populates the TSQUERY column in the tabledocs table and gets rid of descriptions of HEASARC tables that are not present in the database.
The metadata for HEASARC products is ingested using the updateProducts command in ops/products, specifying a directory with the product definition files for each mission. E.g., as the code is distributed if the user makes the directory specified above the default directory then the command
updateProducts missionswill ingest the data products for the missions (Found in ops/products).
The HEASARC has a products file for each mission and special
products files for table links, external links, and bibliographic
links that should be individually processed:
updateProducts PointedProducts.xml
updateProducts PosProducts.xml
A cron job which looks at the zzlink.tdat file updates the linked
table dataproducts automatically.
The updateProducts command only updates a single Xamin
database. If there are multiple databases that need to be
updated (e.g., backup and test databases), then it
may need to be run multiple times to update the other
tables. See the documentation for this command if needed.
createMasters.pl
script does an initial creation of the master tables. Run the
buildFullMasters.plscript to populate the master tables: pos_big, pos_small, pos_time_big, pos_time_small and time. This process can take a couple of hours.
Master tables are views of two underlying tables a _prime
table and a _bck
. Creating the master tables is
fairly expensive because we create a clustered index on fairly
large tables. We do not wish to recreate them and regenerate the
index each time we update a table. After buildFullMasters.pl
completes, the _bck
tables are empty. As tables in
the database are updated, part of the update process is to add the
position/time information from those tables into the corresponding
_bck
tables. Another table keeps track of the tables
that have been updated since the last time we ran buildFullMasters.pl
.
When a user makes a subsequent query of the master tables, we look
in the _bck
table for information about science
tables that have been updated, and in the _prime
table for information about unchanged tables. Since only a small
fraction of the tables get updates, the _bck
table
is much smaller than the _prime
table. Thus even
though it is not as efficiently indexed, master table queries can
run quickly.
Periodically we run buildFullMasters.pl
to bring
the data from the _bck
tables into the primary
tables. Since this takes a couple of hours, the approach is to
build up the new versions of the master tables using temporary
table names. Only when the new master tables are complete do we
drop the previous version and rename the new tables to the
standard names. This takes seconds.
LANGUAGE 'c'
should be
replaced with LANGUAGE 'C'
and similarly with 'sql'.]
After pgSphere has been installed the ops/vo/tap.csh file can be
used to load the SQL statements that define the TAP-specific
schemata and the ADQL functions.
The heasarc.settings file is used to set up the defaults for the Xamin system. The file is normally found in the WEB-INF subdirectory of the Xamin web application. Below we provide an annotated version. Other installations will need to adjust the values to meet their requirements. End users should generally not need to worry about it. It is not used on the client side of the Xamin batch interface.
# The default database # This will be used as the suffix in the jdbc_xxxx settings database=postgres # The JDBC parameters for Postgres # This is probably the same for everyone. jdbc_class_postgres=org.postgresql.Driver # The host and database name will need to be updated. # The port is the Postgres default port. jdbc_url_postgres=jdbc:postgresql://db.host:5432/postgres jdbc_user_postgres=dbuser jdbc_pwd_postgres=password # Nominal hosts versus actual host names. We want to # be sure to translate any hostname we get into a valid # host. So if we get heasarc we want the full path # heasarc.gsfc.nasa.gov. If running on a load balancer # then we want to translate the actual server name # that we are running on, to the name that we want to # publish. heaweb1=heasarc.gsfc.nasa.gov heaweb2=heasarc.gsfc.nasa.gov heaweb3=heasarc.gsfc.nasa.gov heaweb4=heasarc.gsfc.nasa.gov # What to translate the host "localhost" to. localhost=heasarc.gsfc.nasa.gov # These are settings that the user is not allowed to change. fixedSettings=database,jdbc_class_postgres,jdbc_url_postgres,jdbc_user_postgres # The User account for non-privileged access standard_user=dbuser # Is there forwarding to a TOMCAT server. This happens only # for the nominal released and operational versions, not development versions. # When we are constructing a local URL from the elements available in the # server environment we may find that the servlet is running on port 6080, # but we want to publish the original port 80 for the Apache server. ForwardedPort=80,6080 # The current version of Xamin. # The first digit is incremented with major releases. # The second digit is incremented when there is a significant change to user functionality. # Letters are appended to show bug/documentation fixes that do not appreciably # change end user functionality. Version=1.3g # The following two fields are place holders that we fill in during the installation # procedure with appropriate versions for the HDB and Xamin projects that are used # in building the Xamin web application. HDBVersion=0 XaminVersion=0 # This is replaced by the current date in the installation procedure. ReleaseDate=2011-01-01 # This is a relative or absolute URL to service that should be invoked # to request a user login. Note that Xamin itself does not manage user logins. # While a user is logged in, all Xamin requests use the login service # to validate that the indicated user is logged in to the current session. LoginServlet=/srvlogin # This is a pointer to the location of the HEASARC hera service. HeraServer=herasrv.gsfc.nasa.gov HeraPort=60000 DftHeraName=AnonHera # These are the E-mail addresses for administrators who will receive notifications # when Xamin users submit feedback. feedbackRecipients=user1@xxx.gov,user2@xxx.gov # These provide a skeleton for the structure of download requests. wgetcommand="wget -q -nH -r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks " curlcommand="curl -s --remote-name " # User download requests using the TAR download feature are constrained # to the tree below. DownloadPath=/FTPSmaller versions of the settings files
ops.settings
and dev.settings
are used by the operations scripts.
These include the Postgres settings that should be used by the
operations scripts (which will typically be different from that used
in Xamin user queries).
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".log" pattern="common" resolveHosts="false"/>The following entries in the VirtualHost information in the Apache server configuration files enables the TOMCAT server to run under the Apache server with any URL that begins with /xamin processed by TOMCAT and all other URLs handled directly by Apache.
<VirtualHost> .... ProxyPass /xamin http://localhost:nnnn/xamin ProxyPassReverse /xamin http://localhost:nnnn/xamin .... </VirtualHost>The first line says that when the Apache server sees the request it should delegate to another web server running at port nnnn. The second line tells Apache that when it get back a response from the remote port it should scrub references in the headers to the actual server and replace them with references to the Apache server.
Xamin is intended to run in a servlet environment and at the
HEASARC is developed using TOMCAT. To install Xamin in TOMCAT
simply copy the xamin.war file to the TOMCAT webapps directory and
restart TOMCAT. This should automatically install the system.
Installers may need to update the heasarc.settings
file in the webapps/xamin directory.
mkdir webapps/xamin chdir webapps/xamin jar xf some/path/to/xamin.warwhere the user is assumed to start in directory in which TOMCAT has been installed. The web application may need to be restarted within TOMCAT.
The Xamin war file contains the full Xamin distribution
including operational scripts. Scripts are generally found under
the xamin/WEB-INF/classes/ops directory. A site may wish to move
these operational scripts to other director
ies, however the WEB-INF area and its subdirectories are invisible
to the outside world so this area can be used without revealing
operational details to the public.
Installing new versions of Xamin.
In practice we use the xaminInstall.pl procedure to install new
versions of Xamin. This includes multiple steps and provides
detailed feedback on each of the steps in the process. It
allows us to use a common installation procedure when there are
multiple developers who may update Xamin. The steps in the
install are:
Step 0: Get arguments and confirm we are on right machine.
Step 1: Create a temporary directory to place the data content in.
Step 2: Unjar the Xamin war into the temporary area.
Step 3: Copy the needed JavaScript, JSP and HTML files from
the temporary area into the ExtJS compilation area.
Step 4: Use the ExtJS CMD commands to create the compressed
JavaScript files
Step 5: Copy the compressed JavaScript files back to the temporary
area. This will update the xamin.jsp and single box html
files and add the compressed JavaScript files.
Step 6: Customize the settings file.
Step 7: Fix and update other files in the temporary area before
release.
Step 8: Rename the current operational version to the backup name
(prod and test)
Step 9: Rename the temporary area to the desired name.
Step 10: Restart the servlet.
There are a number of checks made to ensure that the size of the
war file and the number of libraries and executable files is
reasonable and that the servlet successfully restarts. Steps
3-5 relate to an optimization of the JavaScript used in the Xamin
web application. The web client is divided into almost 100
JavaScript files. Downloading each of these separately can
add a few seconds to the start of Xamin. So the default
version of Xamin uses the features of ExtJS to compress the Java
into a single file. So normally users can access Xamin
through either an index.jsp (or no name after the final / in the
URL), or xamin.jsp version. The Xamin.jsp version will
download each JavaScript separately, while the default index.jsp
version uses the compressed files.
The xaminInstall script requires two arguments: the name of the
developer doing the installation (which is used to find a standard
location for the input war file), and the type of installation
being made. This determines where the installation is to be
done. A prod or production release goes in the production
area of primary HEASARC machines, test goes on HEASARCDEV in the
standard location, and dev goes into special areas defined
separately for each user. There are many other options which
can be used to override the default behavior.
Since direct access to the database is not available outside of
GSFC the Xamin batch mode gets around this by providing a query
proxy to the user. The user need download only a very small
executable Jar file that contains two classes. The classes in this
jar take the user's input arguments and convert them in to an HTTP
POST request to the Xamin web server. If there are file arguments
(i.e., UPLOAD) then these files are read and appended to the POST.
If the user has specified an OUTPUT argument, then the output is
written to the designated file. The users.jar file is
normally accessible as .../xamin/distrib/users.jar.
The server side of the batch interface is a thin layer which initiates a servlet that runs the user specified query and returns the results. The CLI servlet is very similar to the servlet used in processing Web requests except for the handling of user authentication. If a user wishes to access an Xamin account through the batch interface, both the user name password must be supplied in the command arguments. These are authenticated using the HEASARC login service.
To execute the users JAR file the user should enter:
java -jar user.jar arg1 arg2 ...For example
java -jar user.jar table=rosmaster,ascamaster offset=a:b:10will run a positional cross-correlation of the rosmaster and ascamaster tables with an maximum allowed offset of 10 arc minutes.
This sections describes the general flow of control in processing a query with a view to showing how the various classes work together.
The first order of business for the Controller is to set up three global structures. The Connecter is used to get connections to the underlying database or databases. The Settings are used to manage the system and user settings. The MessageHandler is used to convey information about processing to the user. Each of these is available in a global context. To allow for robust access even in an environment where there may be many simultaneous queries, these three globals use a static ThreadLocal variables to ensure that each instance of a query provides an independent instance to the query processing.
The Connecter allows for reuse of a single connection to the underlying database or when multiple connections are desired to get as many connection as needed. During shutdown it ensures that all connections are closed.
The Settings are essentially is just a Map from keywords to string values. Settings are set by system set up files, user arguments and also by program operations.
The Message Handler provides a standard way to send messages
back to the user. Initially it is set to simply echo whatever is
sent on the standard error output. Once the query processing has
progressed to a stage where the desired output format has been
established, a message handler compatible with that format will be
used. E.g., for a VOTable the messages will be rendered as INFO
elements at appropriate locations in the VOTable.
After establishing the Settings the Controller begins to analyze the request the user has made. There are a few non-query requests that are handled first.
If the user has requested help documentation the Help class is used to statisfy the request. If a USER setting has been specified, the the UserAccount class is used to ensure that the password specified is correct (or that the user has consistently forgone password protection on the account).
Next there is a check to see if this is an account processing directive. A user may create an account or modify it using the functions in the AccountManager or view or delete tables using the TableManager.
After verifying that the request is not one of the non-query requests, the Controller.runQuery method starts setting up the query environment. The first step is determining the output. A StandardDisplay object is created which will handle the communications with the user. A FormatFactory which creates the WriterElements appropriate for the selected format is created and the WriterElement are plugged into the StandardDisplay. WriterElements separate out tasks for writing. Notably there are WriterElements for writing table information and distinct elements (but associated through the FormatFactory) for writing out the data products associated with the element. The global MessageHandler is updated so that messages will be incorporated appropriately into the output, e.g., if the output is FITS, the messages may be rendered as FITS comments.
If the user has requested information about the setup environment, (showArguments or ShowSettings), then these are handled by sending the appropriate messages.
The first step in building the query is determining which tables are used in the query. Tables can be either either local (e.g., already present in the database), or uploaded. Uploaded tables will be added to the database temporarily for the duration of the query and then deleted afterwards. If there is a user account associated with the query, then the uploaded table can be persisted.
The Controller looks for a TableFinder class to use. Normally
this will be the StandardTableFinder, but in principle this can be
overriden by the user.
The TableFinder looks to the TABLE keyword to find any existing
tables that are to be used in the query. It creates a QueryBuilder
class that will be used to store information about the query and
add Tables to it that represent each of the user selected tables.
A MetadataInfo object is created that has all of the required
metadata for the system by querying the system metadata table. The
metadata for tables not participating in the query is not included
in the MetadataInfo object. To maximize efficiency when many
Xamin queries are being done, metadata is cached and updated
periodically, typically every 5-10 minutes. Thus when the
metadata tables are updated the effect may not be seen
immediately.
Next come the uploaded tables. These are specified using the UPLOAD. Format specific classes (TDat, CSV, VOTable or SourceList) implement the Table interface for the input data. A generic TableCopier class then copies this table into underlying database as a temporary table. Metadata for the uploaded tables is added into the MetadataInfo object.
Once the tables that are to used in the query are found, work begins on setting up all of the SQL clauses in the query using a QueryConstructor. User specified constraints are added to the Where clause. There may be explicit constraints using the CONSTRAINT keyword, or are converted to explicit constraints using the user specified time or positional constraints. Constraints use ParsedExpressions which identify the underlying columns used in each constraint. This allows us to deal with the aliases that tables may have in the query. Generally the first table is aliased to 'a' the second to 'b' and so forth. A TableAliaser manages the mapping between tables and aliases.
Positional constraints involve substantial work. User specified positions may be in a variety of coordinate systems and may involve source names. the SkyView SourceCoordinates class is used to handle this. DistanceCalculator classes use different ways to specify a maximum radius constraint for a query. This is used both for point sources and cross-correlations.
The output fields for the query may be defaulted or may be explicitly specified.
As each of the user inputs is processed the QueryBuilder object is updated appropriately. E.g., position, time, and explicit constraints are added in separate steps. Any requested sort order is saved. If the user has requested a grouping clause that can be added.
After all of the user inputs are processed, the QueryBuilder
generates the full SQL for the query. The query is validated using
the SecurityChecker class. This uses an SQL parser that we
updated from the JSQLParser project. While this parser does
not support all queries available within Postgres it does a pretty
good job of general SQL parsing including supporting what is
needed for the VO ADQL parsing. The SecurityChecker class
and the parser confirm that the SQL parses successfully, and that
it meets our security constraints: it only accesses table
that it should have access to (not
system tables or other users' tables), it only accesses functions
on the while list of allowed functions, and that it does not
create unexpected artifacts.
Every year or two there is a need to add new functions to the
white list of accepted functions. This requires updating and
recompiling the SecurityChecker class.
This SQL is then sent to Posgres with an 'explain' prefix.
This confirms that Postgres can execute the query and returns an
very poor estimate of how long the query will take.
If there are no problems, then the QueryBuilder returns a
Query. This is executed to return a Table.
If the user has requested that the results of a query be saved
as a user table, then two SQL statements are actually executed.
The first statement uses a select into user.xxx ...
to run the query and save the result as a user table. The second
query is just select * from user.xxx
and displays
the results of the table that has just been created.
When the web client starts up, it first queries the Xamin server
to get the current list of available tables. It processes any user
arguments and then displays the primary Query pane to the user.
Users can select tables and and add constraints to queries.
Depending upon what the user has done various capabilities are
enabled. Eventually the user submits a query. The users inputs are
processed, and a query, in the same format a user may use in the
CLI interface, is sent using an XMLHTTPRequest to the Xamin
server. The results are processed in the web client and displayed
to the user in a new pane.
The Xamin Web interface is described in detail in its own
document.
Key Servlet Packages and Classes | |
---|---|
heasarc.xamin | The package containing the primary servlets for both CLI and web requests |
QueryServlet | The main class for processing web client query requests. It parses the inputs and returns the results as needed |
CLIServlet | The main class for processing cli queries |
TableParams | Get the parameters for a given table (or tables) |
BatchEmul | Emulate Browse batch queries |
heasarc.xamin.node | Classes for building nodes in the table selection trees including both HEASARC and Vizier tables |
heasarc.vo | Support Cone, TAP, SIA, SSA and DataLink queries using VO protocols |
TapAsync | Support TAP Asynchronous queries |
Vosi* | Support TAP VOSI capabilities |
ADQLServlet | Synchronous ADQL query |
heasarc.stilts | Invoke STILTS library for plotting |
heasarc.sources | Invoke coordinate converter to see if token is source name |
heasarc.downloads | Support download requests |
Key JSP files | |
xamin.jsp | The primary file for the Web interface |
getDoc.jsp | Get the documentation for a table |
delete/addUser.jsp | Delete or add a user as requested by the login service |
A single metadata table, metainfo, is used to store all metadata
for the Xamin system. When a user account is created an extension
metainfo table is created in the schema for the user and the join
of this user table -- which contains metadata specific to user
tables -- and the system table used. The metainfo table uses a
generic structure for representing metadata rather than providing
a specific structured tables. This allows for great flexibility in
the growth of the metadata at the cost of some complexity in the
queries and size of the metadata.
To find information in a table we generally need to know the table, row and column we are interested in. Each row of the metadata table is essentially a table, row, and column identifier followed by the information that belongs there.
There are some additional sources of information that are used in Xamin.
Bibliographic information is stored in a the zzbib table but zzbib is not a special metadata table, it is queryable just as any other table. The only special feature of this table is that it is used in the When clauses in a number of conditional products -- but using the same syntax as other conditional links.
select distinct type from
metainfo
is the equivalent of determining what tables
are present. Commonly used types include mission, table,
product, linkfield, parameter. Type (table) | Relation (column) | Description |
---|---|---|
document | The document type stores information about document used somewhere in the system. Currently the only information for a document is its URL, but author, update time, format, ... may be added in the future | |
document | located | The URL of a document |
linkfield | A linkfield describes the mapping between the values available in a given table, and the variables used in a product template. | |
linkfield | field | The variable name used in a product template description |
linkfield | value | The SQL used to to extract the template from a row of the table |
mission | The mission type is currently mostly a placeholder, but may include information about instruments, PI's, wavelength coverage and more in the future | |
mission | end_time | The start of mission science operations |
mission | start_time | The end of mission science operations |
parameter | The parameter type gives the characteristics for a particular parameter of a given table. | |
parameter | description | A short text description of the parameter |
parameter | format | The format in which the parameter should be displayed |
parameter | maximum | The maximum value for the parameter in the table |
parameter | minimum | The minimum value for the parameter in the table |
parameter | note | A comment on the parameter |
parameter | order | The order in which the parameter is to be displayed. If positive then the parameter is a default/standard parameter. If negative, then when all parameters are displayed the order should be the absolute value of this parameter, except that values of 0 come last. |
parameter | otype | The type of the parameter in the Browse system |
parameter | ucd | The uniform content descriptor for the parameter |
parameter | unit | The unit[s] of the parameter |
product | The product type is used to describe a data product which may be an archival product, bibliographic reference, table link or generic external product. | |
product | description | A short text description of the product |
product | includes | A link to a child product |
product | links | User to link the generic products |
product | match | A template string used to find files or URLs for a product given values in a row |
product | type | The general type of the product (e.g., FITS, ASCII, HTML, ...) |
table | This type describes the characteristics of a particular table in the database. | |
table | author | The author of the table |
table | bibcode | The bibcode for a paper describing the table |
table | containedBy | A mission which includes this table (may be repeated) |
table | contains | A link to a parameter entry |
table | defaultSearchRadius | The default radius for position queries of this table |
table | describedBy | A document describing the table |
table | description | A short text description of the table |
table | end_time | The column name for the column containing the end time for a table |
table | includesTypes | Object types described by this table |
table | j2000coordinates | A pair of columns giving the J2000 coordinates for the row |
table | lastUpdated | The date when this table was last updated |
table | priority | A numeric priority associated with the table |
table | regime | The regime of the table (may be repeated) |
table | start_time | A column name for the start time associated with each row |
table | tableType | The kind of table (observation, object, ...) |
product |
description |
Text description of specific product |
product |
includes |
Children of product |
product |
match |
Template to build URL or file name (normally
using some variables defined in tablelinks.). |
product |
type |
Type/format of product (e.g., directory, GIF,
FITS, ...) |
product |
siacolumn/ssacolumn |
Information needed for SIA/SSA requests to
this product. |
tablelink | A tablelink maps a table to product. It indicates the linkfield mappings that are to be used in finding the products for a given row. | |
tablelink | linkfield | A linkfield used in this link from a table to a product |
tablelink | product | The name of the product being linked to |
tablelink | table | The name of the table being linked from |
tablelink | when | A conditional expression indicating if this link should be made for the current row |
linkfield |
field |
The name of the variable being defined for
this link |
linkfield | product |
The SQL used to extract the variable
for this linkfield |
user | User entries are only found in the user metainfo tables, not in the primary metainfo table. Exactly one user name should be found in each user metainfo table. | |
user | The E-mail address of the user. | |
user | password | An encrypted version of the user password (if supplied). |
user | status | The status of the account: active/inactive/validating' |
s[si]acolumn |
various |
Metadata associated with SIA/SSA data
requests |
s[si]alink |
various |
Similar to tablelink, but for for SIA/SSA
data requests |
s[si]alnkfield |
various |
Similar to linkfield for for SIA/SSA data
rquests |
Name | Table/View | Description |
---|---|---|
pos_big_prime | Table | Weekly positions where the position error is > 1 degree. |
pos_small_prime | Table | Weekly positions where the position error is < 1 degree. |
pos_big_bck | Table | Incremental positions where the position error is > 1 degree. |
pos_small_bck | Table | Incremental positions where the position error is < 1 degree. |
pos_big | View | Current positions where the position error is > 1 degree. |
pos_small | View | Current positions where the position error is < 1 degree. |
pos_time_big_prime | Table | Weekly positions/times where the position error is > 1 degree. |
pos_time_small_prime | Table | Weekly positions/times where the position error is < 1 degree. |
pos_time_big_bck | Table | Incremental positions/times where the position error is > 1 degree. |
pos_time_small_bck | Table | Incremental positions/times where the position error is < 1 degree. |
pos_time_big | View | Current positions/times where the position error is > 1 degree. |
pos_time_small | View | Current positions/times where the position error is < 1 degree. |
time_prime | Table | Weekly times |
time_bck | Table | Incremental times |
time | View | Current times |
update_list | Table | Tables that have been updated since weekly cleanup |
So to make positional queries with default search radii efficient we separate the tables with relatively small default search radii and index them in the pos_small master tables. Those with large search radii are placed in pos_big. When we want to find all of the tables we run two queries (as a union). The query of the pos_small tables includes a constraint using the maximum default search radius that can take advantage of the dec index. The query of the pos_big tables does not include this additional constraint, but this table is much smaller since only a very few tables have such large error bars.
select AA from xx_prime where table_name not in (select table_name from update_list) and BB union select AA from xx_bck where BBBy referring to the views, the users of the master tables are isolated from the update process.
source
ed
before the cron jobs are executed.
The scripts are given with respect to xamin/WEB-INF/classes/ops. Logs are generally kept in /www/server/xamin/ops/logs.
Process | Frequency | Initiation | Script | Description | Log |
---|---|---|---|---|---|
Update tables | Every 12 hours | Automatic | ingest/runfullcopy | Update HEASARC tables in the Postgres database. Also update HEASARC and Vizier documentation. | xamin.ingest.DATE |
Update documentation table | Every 12 hours | Automatic | Included in runfullcopy | Update the table documents table. | See above |
Update master tables | As needed | Auto | Included in runfullcopy | When tables are updated add position/time information to
master _bck files. |
See above |
Update bibliography tables | As needed | Automatic | Included in runfullcopy | Add entries to bibliography tables. As far as Xamin is concerned this is just a table update, but the table is maintained by the HEASARC. This happens automatically whenever runfullcopy sees the zzbib.tdat file updated. | See above |
Clean master tables | Weekly | Automatic | master/buildFullMasters.pl | Put all master table information into _prime tables. | fullMasters.DATE |
TAP async user tables/jobs cleanup | Daily | Automatic | daily/cleanasync.pl | Clean up all information on jobs in the TAP_ASYNC schema that are past their destruction time. | cleanup.log.DATE |
User usage review | Weekly | Automatic | ops/daily/sizeReport | Review disk usage of all users. This will also such TAP asynchronous and master table usage. The output of this cron is not logged so it is sent as a mail message to the cron initiator. | Cron user e-mail |
Run table monitor | Daily | Automatic | daily/monitor.pl | See if the queries that have created user tables would now give different results and update/notify tables/users as requested. | monitor.log.DATE |
HTTP logs | Daily? | Auto | TBD | Copy TOMCAT http logs -- may want to join with Apache logs | TBD |
User backup | Daily | Automatic | daily/userbck.pl | Backup Xamin user account information. The HEASARC's TDAT files serve as a backup for the science tables so a backup of those is not required. | userbck.log.DATE |
Testing | Hourly | Automatic | tests/batchTest.pl | Run tests that show that system is operational. At least one test (comprising multiple queries) is run each hour. Every table is queried at least once each day. | autochecks.log |
Update tables test | Daily | Automatic | tests/makeTables.pl | Since the tables in the Xamin system can change we need to keep the tables test up to date. This updates tests/tables.tests. | Not logged |
Statistics | Quarterly | Manual | TBD | Compute statistics including at least:
|
NA |
Compare Xamin and Browse | Weekly | Automatic | ingest/compareXaminBrowse | Ensure that the tables seen in Xamin and Browse are consistent. | Cron user e-mail |
Delete tables from Xamin | As needed | Manual | ingest/deleteTable | Delete a table that is no longer used in Xamin. | delete.TABLE |
Add user schema | As requested by srvlogin | Automatic | xamin/jsp/createUser.jsp | Add a schema for a new user account. This script is invoked by the HEASARC login service and is no externally invokable | Not logged |
Delete user schema | As requested by srvlogin | Automatic | xamin/jsp/deleteUser.jsp | Deleta a schema for user account. This script is invoked by the HEASARC login service and is not externally invokable. | Not logged |
Add/update data products | As needed | Manual | products/updateProducts | When new products are defined, the mission products definition file in products/mission/MISSION.xml must be created or updated. This script reads this file and updates the metadata table to define the products. | Not logged |
Time | Event | |
---|---|---|
00:50 | Test: basic tests | |
01:50 | Test: infrastructure tests | |
02:30 | Clean asynchronous | |
02:50 | Test: basic tests | |
03:00 | User backups | |
03:10 | Daily monitor | |
03:30 Tu | Build full masters | |
03:50 | Test: infrastructure tests | |
04:50 | Test: basic tests | |
05:00 | Table ingest | |
05:50 | Test: infrastructure tests | |
06:50 | Test: basic tests | |
07:50 | Test: infrastructure tests | |
08:50 | Test: basic tests | |
09:50 | Test: infrastructure tests | |
10:50 | Test: basic tests | |
11:50 | Test: infrastructure tests | |
12:50 | Test: basic tests | |
13:50 | Test: infrastructure tests | |
14:50 | Test: basic tests | |
15:50 | Test: infrastructure tests | |
16:50 | Test: basic tests | |
17:00 | Table ingest | |
17:50 | Test: infrastructure tests | |
18:50 | Test: basic tests | |
19:50 | Test: infrastructure tests | |
20:50 | Test: basic tests | |
21:00 | Update tables test | |
21:00 Th | Generate user size report | |
21:50 | Test: infrastructure tests | |
22:00 | Compare Xamin and Browse | |
22:10 | Test: table tests | |
22:50 | Test: basic tests | |
23:50 | Test: infrastructure tests |
This work was authored by Timothy Gerard Endres, time@gjt.org. This work has been placed into the public domain. You are free to use this work in any way you wish.