The ftselfjoin task searches for matching entries within a single table, usually duplicates. ftselfjoin is a "wrapper" around the ftjoin task, but specialized for the operation of duplicate-searching within a single table. In a sense, this is a database "join" of a table with itself (in database terminology, a "join" is a matching operation).
Users may specify any matching criteria. For example, one could search a table of sources for all sources which have the same name. However, the power of ftselfjoin is in specifying more complicated matching criteria. A more complicated example might be to search a table of sources for all sources that are within 1 degree of any other source in the table. (see example below)
Matches occur in pairs, known as the "left" and "right" matches. The rows that match are saved to the output file. Multiple matches are allowed. These matches would appear as two separate output rows. The output will contain a single table with all the columns of both left and right matches.
The matching expression is commutative if it doesn't care about whether left is being compared to right, or right is compared to left. If the expression is commutative, then the default processing will produce two output entries for every match: one when left matches right, and another when right matches left. If this is undesireable, the duplicates can be avoided by setting commutative=YES.
ftselfjoin never reports a match between a table row and itself, regardless of the 'commutative' setting.
In order to avoid column name clashes, ftselfjoin should rename columns automatically. ftjoin can append a prefix and/or suffix to each column name. The output table and the matching expression use the renamed column names, not the original names.
At least one of the prefixes or suffixes must be used to remove the ambiguity between the column names of the left and right matches. By default, the column names of the left table have the prefix "L_" added, and the prefix "R_" is applied to the right table. ftjoin will issue a warning or terminate if there are duplicate column names.
Generally speaking ftjoin is only efficient for modest size tables. The full tables should fit into memory at once. No optimization or special indexing is done. For two tables with M and N rows, the number of cross-comparisons is M * N.
1. Search a table for entries with the same name. Assume the table has a NAME column which refers to the names of various stars. This example forms a table which shows where the NAMEs match.
ftselfjoin table.fits out.fits "L_NAME == R_NAME" \ leftnameprefix="L_" rightnameprefix="R_" commutative=YES
Note that the setting "commutative=YES" will avoid duplicate entries.
2. Match two catalogs by position. Consider an example of a table containing a star catalog, and it is desired to find all stars within a given radius of any other star, say 0.5 degrees. We use the ANGSEP() function to measure the distance between stars:
ftselfjoin table.fits out.fits \ 'ANGSEP(L_RA,L_DEC,R_RA,R_DEC) < 0.5' \ leftnameprefix="L_" rightnameprefix="R_" commutative=YES
Note that the setting "commutative=YES" will avoid duplicate entries.