mathpha -- performs mathematical operations on PHA files
mathpha expr units outfil exposure areascal ncomments
This task takes a user-defined input expression representing mathematical operations to be performed on (one or more) PHA datasets. The input expression is thus allowed to contain PHA filenames (and optionally, an extension number), a number of mathematical operators, integers, reals, and pairs of curved brackets ("(" & ")"). Assuming the expression is able to be parsed (and a number of other criteria are satisfied), the operations specified are performed on the PHA dataset, the errors propagated (see below), and a new file containing the resultant PHA dataset is written. The units in which the algebra is performed is controlled by a parameter ("units"; see below), the default of which is COUNTS (as opposed to COUNTS PER SECOND). This is true IRRESPECTIVE of whether the original PHA data themselves are stored as counts or counts per second. The exposure time written for the new PHA dataset is controlled by a user-defined parameter ("exposure"); see below. Users also have the option of specifying the values written to the o/p file of several other mandatory keywords (see below).
NOTE - THIS IS A MODERATELY POWERFUL TOOL, THUS OPEN TO CONFUSION/ABUSE. Users are strongly urged to read & attempt to fully understand this help file. In particular, users should understand the implied units for any integers/reals which form part of the input expression, the details of the error propagation, and options available to calculate the exposure time written to the o/p file. Both of these issues are discussed in detail below, and in further detail in the OGIP Memo OGIP/95-008 available online from the OGIP. Any remaining confusion/uncertainties/strange results should be reported to the author.
At the current time, only a modest number of mathematical operations are possible (as listed in below), though it is anticipated that they will serve the vast majority of users' needs. These will be added to as dictated by demand. In addition, due to the relative crudeness of the expression parser, there are number of rules which users must conform to when constructing their input expression (see below). Violation of these rules will result in the task (hopefully) stopping , (possibly) crashing, or (maybe even) producing incorrect results.
Only the the OGIP recognised PHA file formats described in Arnaud etal 1992 (Legacy, 2, 65) and its appendix provided in OGIP memo OGIP/92-007a are supported for both the input and output files.
The input expression is a character string (see below for limitations on expression length) which forms a mathematical expression consisting of:
Only the "obvious" restrictions are currently placed on filenames, namely that the path must be specified, and that the parser must not be able to interpret the path/filename as a mathematical operation, an integer/real, or some combination thereof (this currently poses a problem under unix & ultrix - see below). A specific extension of the input file containing the PHA dataset to be operated on can be specified by including the integer extension number in square ("["extn#"]") brackets after the filename. If such a specification is not used, then the entire PHA file will be searched for extensions likely to contain a PHA dataset. In such cases, if the input file is then found to consist of more than one PHA dataset, then the task will warn the user and stop.
Note that filenames containing the reserved arithmetical symbols for addition ("+"), subtraction ("-"), or multiplication ("*") - or containing parentheses ("(" or ")") - will likely be misinterpreted by mathpha as operators and cause the task to fail.
Currently, expressions given on the command line are limited to the maximum xpi parameter length of 1024 characters (which allows for as many input filenames or operations you can squeeze into 1024 chars). However, the limit on input expression length has been raised internally to to 16384 characters in order to accomodate much longer expressions (approaching 1000 filenames) if the expression is read from an input ASCII text file as in "mathpha expr=@expr.txt". Individual filenames must be smaller than 2048 characters. The true constraint is that the final expression it constructs internally (which is much longer than the original input expression) has a limit of 32768 chars. Users should therefore be warned against using long directory paths as part of filenames when the number of files is very large.
Users should adhere to standard mathematical conventions when constructing the input expression to ensure that operations are carried out in the desired order. It should be noted that in the case of a 'sub-expression' consisting of a series of multiplication and/or division operations on filenames and/or numerical constants (ie where the sub-expression does not contain either "+", "-", "(" or ")" characters), these operations will be carried out working from left-to-right. Thus the sub-expression "4/2*6" will result in a value of "12".
At the current time, the input expression can include any number of mathematical operations (within the constraint of the total number of characters specified above).
!!! NOTE !!! currently, a mathematical operator is NOT allowed to immediately follow a "(" character, or immediately preceed a ")" character in the input expression.
It is IMPERATIVE that users remember that the units of any numerical constants added to or subtracted from a PHA dataset assumed to be in the units specified by the parameter "units" (the default of which is COUNTS; see below), IRRESPECTIVE of whether the original PHA data themselves are stored as counts or counts per second.
At the current time, the input expression can include any number of numerical constants (within the constraint of the total number of characters specified above).
!!! NOTE !!! at the current time numerals of the form "1E04" or "1E-04" are NOT supported (and will be interpreted as input filenames). This shortfall will be addressed in the near future.
At the current time, up to 10 pairs of brackets may be included within the input expression.
Within the constraint of the total number of characters specified above, any number of spaces can be added to the input expression.
The following mathematical operations are supported
It should be noted that the parser will interpret all characters besides those listed above and the open/close curved brackets/parentheses characters (ie "(" and ")") as (parts of) input PHA filenames.
There several options open to users regarding the calculation or propagation of errors, controlled by the parameters errmeth and properr.
The first decision to make is whether errors should be propagated (properr='yes') or calculated. If errors are propagated then the errors on the output file are the quadrature sum of the errors on the input files. This option should only be used if the errors in the input files are Gaussian. If the errors are not Gaussian but Poisson then the correct procedure is to set properr='no' and errmeth='POISS-0' which will ensure that the output file has POISSERR=T. Note, however that this is only valid when summing spectra. If spectra are being subtracted then there is no general solution because the difference of two Poisson distributions is not itself Poisson (unlike the Gaussian case).
There are a number of other options for errmeth but they are not recommended and are listed below for historical interest. Assuming that N is the number of counts observed in a given PHA channel:
error = 1.0 + SQRT(N + 0.75)
The value is statistically that of the (larger) +ve error of a Poissonian distribution, but within this task this value is assigned to both the +ve and -ve error on the counts in that channel. For small N, the errors created using this prescription is significantly GREATER (and hence more conservative) than both the true -ve error, and that obtained by simply using SQRT(N). Both differences quickly reduce as one moves to larger N (and the Poissonian distribution becomes more symmetric/Gaussian).
error = SQRT(N - 0.25)
The value is statistically that of the (smaller) -ve error of a Poissonian distribution, but within this task this value is assigned to both the +ve and -ve error on the counts in that channel. This error prescription UNDERESTIMATES the error for small N.
Caution is urged, particularly when using ERRMETH = 'Gauss', as unexpected and/or misleading results can be produced. See OGIP/95-008 for further details.
The following tables enables direct comparisons to be made between these approximations:
N true 1-sigma errors calcd using errors calcd Poisson errors Gehrels approx using SQRT(N) 0 +1.84 -0.00 +1.87 -0.00 +0.00 -0.00 1 +2.30 -0.83 +2.32 -0.67 +1.00 -1.00 2 +2.63 -1.92 +2.66 -1.33 +1.41 -1.41 3 +2.92 -1.63 +2.94 -1.66 +1.73 -1.73 4 +3.16 -1.91 +3.18 -1.94 +2.00 -2.00 5 +3.38 -2.16 +3.40 -2.18 +2.24 -2.24 10 +4.27 -3.11 +4.28 -3.12 +3.16 -3.16 50 +8.12 -7.05 +8.12 -7.05 +7.07 -7.07 100 +11.00 -9.98 +11.00 -9.99 +10.00 -10.00
The (hidden) properr parameter controls whether the errors are to be
propagated during the algebra or (if properr='no') whether the errors are simply calculated from the resultant PHA dataset. In the former case, the errors are propagated in the normal manner (err3 = SQRT[err1**2 + err2**2]). This is an approximation when in the Poissonian regime (ie for low N). Whilst it is true that the variances of the Poissonian distribution are combined in this normal way, confidence limit error bars are not simply related to the variance like they are for Gaussian statistics.
HOWEVER, these approximations work well for all but the smallest N, and is certainly superior to either assuming SQRT(N) errors, or neglecting errors altogther: For example, adding two PHA datasets each with N=5 counts in a given channel will give a value of 10, and errors of:
+4.24 -4.24 (using errmeth='POISS-1') +3.08 -3.08 (using errmeth='POISS-2') +3.95 -3.95 (using errmeth='POISS-3')
compared to:
+4.27 -3.11 (statistically correct values) +3.16 -3.16 (propagating 'Gaussian' errors)
Note that severely misleading and/or incorrect results are possible if the
errors are not propagated (ie if properr=no). If data are not Poisson then error propagation should be turned on.
The value written as the exposure time in the o/p PHA dataset is indicated by a user-defined flag ("exposure"). This facility provides added flexibility to users, but with a risk of some confusion. It should be remembered however, that if desired, the exposure time given for a PHA dataset can always be reset using the "chkey" command within the ftools/heasarc task GRPPHA. The following options are currently available:
The area scaling factor is a numerical constant stored in the OGIP-mandatory FITS keyword AREASCAL in the SPECTRUM extension of a PHA file. Details of its use (and misuse) are too varied and complex to be fully described here, but most often it is used to renormalize the response matrix associated with that PHA file within XSPEC. As such the value of the AREASCAL keyword is often unity, but (particularly in the case of older response matrices) can have values of the order of the effective area (in cm^2) of the telescope/instrument. Note that MATHPHA will not work on files with vector AREASCAL (eg those for XMM-Newton RGS).
The value written as the AREASCAL keyword in the o/p PHA dataset is determined by a hidden parameter, "areascal". The following options are currently available:
if the values of the AREASCAL keywords in the all the i/p files:
In both cases, the user is informed of the value written to the o/p file. Essentially, if you dont see warning messages regarding AREASCAL during execution, don't worry about it. If you do, and the values on the various i/p files vary dramatically and you dont understand why, then you should check whether the input expression was really that which you intended.
There are a number of mandatory keywords required to be present in the header of extension containing the resultant PHA dataset in the o/p file. A number of these are crucial for the correct interpretation of the o/p dataset. The prime example is the value of the EXPOSURE keyword, and as described above, MATHPHA has a parameter 'exposure' which controls how the value to be written to the o/p file is determined. Similarly MATHPHA compares the values of the AREASCAL keywords found in the i/p files, and attempts to determine the most appropriate value to be written to the o/p file (above). MATHPHA also checks the values of the TELESCOP, INSTRUME, DETNAM, FILTER & CHANTYPE keywords. If there are discrepancies with any of these keywords, MATHPHA will warn the user and writes the appropriate null/default value to the o/p file.
However, there are a number of mandatory keywords associated with 'auxiliary' files, for which MATHPHA is in general unable to calculate 'appropriate' values (since it largely depends upon what the user is attempting to do via the input expression). These keywords are:
(see also OGIP/92-007).
However, for convenience MATHPHA does provide users some opportunity to specify the values to be written to the o/p file via the hidden parameters auxfiles, backfile, backscal, corrfile, corrscal, arfile and rmfile. The internal logic for MATHPHA regarding these parameters is as follows:
The parser is written in FORTRAN (!), and converts the user-supplied expression to "IMG" Reverse Polish (a little-known dialect thought to be understood by but one person on the planet). The parser attempts to identify the various elements of the input expression, and order the mathematical operations and the reading of PHA datasets appropriately. The parser attempts to work out from the last-entered (innermost) bracket-pair. The mathematical expression contained therewithin is evaluated and the answer stored. The parser then moves to the 2nd-to-last bracket-pair. The mathematical expression contained therewithin is evaluated (including the results already calculated, if appropriate), and the answer again stored. This process thus continues until the full expression has been evaluated, and this is the result which is finally written as the output PHA dataset.
Unix/ultrix users will encounter a problem if they attempt to specify the path to PHA datasets (not in the current directory): namely that the parser will interpret an input expression of the form
expr = "directory1/directory2/filename" to be "divide the PHA file called 'directory1' by PHA file 'directory2', then divide the result by PHA file 'filename'. The parsing of filenames can be bypassed by surrounding the filename with single quotes ('). This allows one to specify a filename containg ANY arithmetic character, including '/'.
Users who encounter/suspect problems with the task are urged to contact the author, but also to experiment with the input expression (by increasing/moving pairs of brackets, and/or performing the required calculation with two or more passes through MATHPHA).
This task is not designed to replace pocket/on-line calculators, and (currently) will stop if the parser is unable to find something resembling a potential PHA filename within the entire input expression.
The author, USRA, the OGIP & NASA assume no responsibility for errors resulting from either bugs, or the misuse of this task. However, please report any comments/problems or bugs to Ian M George (http://heasarc.gsfc.nasa.gov/cgi-bin/ftoolshelp).
Users should keep in mind the following constraints/behaviour:
This task will STOP (hopefully with an appropriate error message) if:
Caution is urged, particularly when using ERRMETH = 'Gauss', as unexpected and/or misleading results can be produced. See above and OGIP/95-008 for further details.
In the latter two cases, any problems will be reported by the task, and the default value written to the o/p file.
In the latter two cases, any problems will be reported by the task, and the default value written to the o/p file.
In the latter two cases, any problems will be reported by the task, and the default value written to the o/p file.
In the latter two cases, any problems will be reported by the task, and the default value written to the o/p file.
In the latter two cases, any problems will be reported by the task, and the default value written to the o/p file.
In the latter two cases, any problems will be reported by the task, and the default value written to the o/p file.
Due to the current lack of a unit-string parser, the units of the output file will always be either 'count' or 'count/s'. This is true even in cases such as "expr='file1/file2'", or "expr='file1*file2'". This can lead to confusion, and a parser is being worked on.
OGIP/95-008: The MATHPHA User's Guide CAL/GEN/92-002 (George etal 1992 Legacy, 2, 51), CAL/GEN/92-002a
Ian M George HEASARC NASA/GFSC http://heasarc.gsfc.nasa.gov/cgi-bin/ftoolshelp