The ftjoin task joins the columns of two tables into one table, using an arbitrary matching expression. In the languages of databases, this is known as a "database join." The two tables to be joined are designated the "left" and "right" tables.
ftpaste and ftjoin can both be used to join the columns of two tables. When using ftpaste, the rows must match one-to-one. When using ftjoin, the rows are matched according to any desired criteria.
Each row in the left table is matched with each of the rows in the right table, using a user-supplied matching expression. For example, if table "L" is a list of detected stars, and table "R" is a catalog of known stars, then one could ask for the stars in table "L" that are within a predefined search radius of the stars in table "R". Any CFITSIO calculator expression which evaluates to true or false is valid (see help for 'colfilter').
The rows that match are saved to the output file. Multiple matches are allowed. For example, two stars of a binary in table "L" may match a single catalog entry in table "R", as long as they both satisfy the matching criterium. These matches would appear as two separate output rows. The output will contain a single table with all the columns of both input tables.
ftjoin has a direct analogy to classical database join operations. For example, in the SQL database language, one might issue a star catalog matching operation with the following query:
SELECT * FROM L,R WHERE ANGSEP(L.RA,L.DEC,R.RA,R.DEC) < 10.0/3600;
ftjoin l.fits r.fits out.fits "ANGSEP(L_RA,L_DEC,R_RA,R_DEC) < 10.0/3600" \ leftnameprefix="L_" rightnameprefix="R_"
It is also possible to save non-matching rows as well. This depends on the type of join (specified by the jointype parameter). The default type of join, "INNER", saves only rows that match both tables. As shown in the table below, a row can appear in the output depending on whether the row appears in the left table, right table or both.
Row appears in output if it is in... | |||
---|---|---|---|
jointype | Left only | Left and Right | Right only |
INNER / INTERSECTION | No | Yes | No |
LEFT OUTER | Yes | Yes | No |
RIGHT OUTER | No | Yes | Yes |
FULL OUTER / UNION | Yes | Yes | Yes |
An "INNER" join is equivalent to a standard set-wise intersection. A "FULL OUTER" join is equivalent to a standard set-wise union.
Continuing with the star catalog analogy, a "LEFT OUTER" join between tables "L" and "R" would produce all matches between the the detection table "L" and the catalog table "R", plus any remaining non-matching stars in table "L".
When using an "OUTER" join type, missing values are filled with null values. Note that the input tables must have null values defined via TNULLn keywords in order for this to work properly.
In order to avoid column name clashes, ftjoin can rename columns automatically. ftjoin can append a prefix and/or suffix to each column name. The pre/suffix should be different for the left and right tables. By default, the column names of the left table have the prefix "L_" added, and the prefix "R_" is applied to the right table. However, it is worth noting that simply adding a prefix does not always avoid column name clashes. The output table and the matching expression use the renamed column names, not the original names.
ftjoin can issue a warning or terminate if there are duplicate column names, depending on the dupcolname parameter.
The standard CFITSIO calculator expressions are available to the user, but the #ROW expression is treated specially. Because two different tables are being joined, the two new different variables are available: #L_ROW and #R_ROW. The variable #L_ROW is the row number in the left table, and #R_ROW is the row number in the right table. Note that the names #L_ROW and #R_ROW are hard-coded; they are independent of any leftnameprefix or rightnameprefix settings that you choose.
Upon finishing successfully, ftjoin updates the task parameters nbothmatch, nleftmatch and nrightmatch with the number of joint matches, left-only matches, and right-only matches, respectively. These can be used to determine the success of the operation.
Generally speaking ftjoin is only efficient for modest size tables. The full tables should fit into memory at once. No optimization or special indexing is done. For two tables with M and N rows, the number of cross-comparisons is M * N.
1. Match two tables by name. Assume two tables have a NAME column which refers to the names of various stars. This example forms a combined table with the columns of both tables where the NAMEs match.
ftjoin left.fits right.fits out.fits "L_NAME == R_NAME" \ leftnameprefix="L_" rightnameprefix="R_"
2. Match two catalogs by position. Consider an example where we are looking for pulsars inside of supernova remnants. We can download the Green supernova catalog as a FITS file (see HEASARC snrgreen catalog); and the ATNF pulsar catalog as a FITS file (see atnfpulsar catalog). Then, we can match the tables by asking the question, which pulsars are inside of the catalogged supernova remnants. In this case, the maximum of the MINOR_DIAMETER and MAJOR_DIAMETER columns is used to estimate the radius of the remnant.
ftjoin snrgreen.fits atnfpulsar.fits snr_psr.cat \ 'ANGSEP(L_RA,L_DEC,R_RA,R_DEC) < MAX(L_MINOR_DIAMETER,L_MAJOR_DIAMETER)/60.0/2.0' \ leftnameprefix="L_" rightnameprefix="R_"
The factor of 2.0 converts a diameter into a radius, and 60.0 converts arcminutes into degrees. Of course, this table will require follow-up screening to eliminate chance coincidences and double-matches (of which there are some).
3. Add new data to an existing base catalog. Consider a case where we have a base catalog of stars, and a catalog of new data from a second observatory for a few stars which we would like to join to this table. We would like new data to be associated with existing data, when the star already exists in the base catalog. This is a case where an "outer" join is appropriate: we want to keep all the main catalog entries regardless of whether they match the ancillary data. Here we will match by NAME columns.
ftjoin main.cat newobs.cat newmain.cat "NAME == NEWOBS_NAME" \ leftnameprefix=NONE rightnameprefix="NEWOBS_" jointype="LEFTOUTER"
Since we would like to keep the original table unperturbed, the left prefix is set to NONE so that the column names are not renamed. The new observations are prefixed with "NEWOBS_". Entries in the main catalog which don't have any corresponding ancillary data will have NULL ancillary values.
4. Selecting output columns. Currently ftjoin does not have an explicit mechanism to select output columns. However, this can be done with CFITSIO pipes. For example, if one wanted to keep only the columns NAME,FLUX,PERIOD, one could use the following:
ftjoin left.fits right.fits - "MATCHING EXPRESSION" | \ ftcopy '-[col L_NAME;R_FLUX;R_PERIOD]' out.fits
The "-" hyphen is used to pipe the output of ftjoin, and ftcopy is then used to select the requested columns.