FunColumnSelect(3) select Funtools columns

DESCRIPTION

The FunColumnSelect() routine is used to select the columns from a Funtools binary table extension or raw event file for processing. This routine allows you to specify how columns in a file are to be read into a user record structure or written from a user record structure to an output FITS file.

The first argument is the Fun handle associated with this set of columns. The second argument specifies the size of the user record structure into which columns will be read. Typically, the sizeof() macro is used to specify the size of a record structure. The third argument allows you to specify keyword directives for the selection and is described in more detail below.

Following the first three required arguments is a variable length list of column specifications. Each column specification will consist of four arguments:

  • name: the name of the column
  • type: the data type of the column as it will be stored in the user record struct (not the data type of the input file). The following basic data types are recognized:
  • A: ASCII characters
  • B: unsigned 8\-bit char
  • I: signed 16\-bit int
  • U: unsigned 16\-bit int (not standard FITS)
  • J: signed 32\-bit int
  • V: unsigned 32\-bit int (not standard FITS)
  • E: 32\-bit float
  • D: 64\-bit float
  • The syntax used is similar to that which defines the TFORM parameter in FITS binary tables. That is, a numeric repeat value can precede the type character, so that ``10I'' means a vector of 10 short ints, ``E'' means a single precision float, etc. Note that the column value from the input file will be converted to the specified data type as the data is read by FunTableRowGet().

    [ A short digression regarding bit\-fields: Special attention is required when reading or writing the FITS bit-field type (``X''). Bit-fields almost always have a numeric repeat character preceding the 'X' specification. Usually this value is a multiple of 8 so that bit-fields fit into an integral number of bytes. For all cases, the byte size of the bit-field B is (N+7)/8, where N is the numeric repeat character.

    A bit-field is most easily declared in the user struct as an array of type char of size B as defined above. In this case, bytes are simply moved from the file to the user space. If, instead, a short or int scalar or array is used, then the algorithm for reading the bit-field into the user space depends on the size of the data type used along with the value of the repeat character. That is, if the user data size is equal to the byte size of the bit\-field, then the data is simply moved (possibly with endian-based byte\-swapping) from one to the other. If, on the other hand, the data storage is larger than the bit-field size, then a data type cast conversion is performed to move parts of the bit-field into elements of the array. Examples will help make this clear:

  • If the file contains a 16X bit-field and user space specifies a 2B char array[2], then the bit-field is moved directly into the char array.
  • If the file contains a 16X bit-field and user space specifies a 1I scalar short int, then the bit-field is moved directly into the short int.
  • If the file contains a 16X bit-field and user space specifies a 1J scalar int, then the bit-field is type-cast to unsigned int before being moved (use of unsigned avoids possible sign extension).
  • If the file contains a 16X bit-field and user space specifies a 2J int array[2], then the bit-field is handled as 2 chars, each of which are type-cast to unsigned int before being moved (use of unsigned avoids possible sign extension).
  • If the file contains a 16X bit-field and user space specifies a 1B char, then the bit-field is treated as a char, i.e., truncation will occur.
  • If the file contains a 16X bit-field and user space specifies a 4J int array[4], then the results are undetermined.
  • For all user data types larger than char, the bit-field is byte-swapped as necessary to convert to native format, so that bits in the resulting data in user space can be tested, masked, etc. in the same way regardless of platform.]

    In addition to setting data type and size, the type specification allows a few ancillary parameters to be set, using the full syntax for type:

     [@][n]<type>[[['B']poff]][:[tlmin[:tlmax[:binsiz]]]]
    

    The special character ``@'' can be prepended to this specification to indicated that the data element is a pointer in the user record, rather than an array stored within the record.

    The [n] value is an integer that specifies the number of elements that are in this column (default is 1). TLMIN, TLMAX, and BINSIZ values also can be specified for this column after the type, separated by colons. If only one such number is specified, it is assumed to be TLMAX, and TLMIN and BINSIZ are set to 1.

    The [poff] value can be used to specify the offset into an array. By default, this offset value is set to zero and the data specified starts at the beginning of the array. The offset usually is specified in terms of the data type of the column. Thus an offset specification of [5] means a 20\-byte offset if the data type is a 32\-bit integer, and a 40\-byte offset for a double. If you want to specify a byte offset instead of an offset tied to the column data type, precede the offset value with 'B', e.g. [B6] means a 6\-bye offset, regardless of the column data type.

    The [poff] is especially useful in conjunction with the pointer @ specification, since it allows the data element to anywhere stored anywhere in the allocated array. For example, a specification such as ``@I[2]'' specifies the third (i.e., starting from 0) element in the array pointed to by the pointer value. A value of ``@2I[4]'' specifies the fifth and sixth values in the array. For example, consider the following specification:

      typedef struct EvStruct{
        short x[4], *atp;
      } *Event, EventRec;
      /* set up the (hardwired) columns */
      FunColumnSelect( fun, sizeof(EventRec), NULL,
                       "2i",    "2I  ",    "w", FUN_OFFSET(Event, x),
                       "2i2",   "2I[2]",   "w", FUN_OFFSET(Event, x),
                       "at2p",  "@2I",     "w", FUN_OFFSET(Event, atp),
                       "at2p4", "@2I[4]",  "w", FUN_OFFSET(Event, atp),
                       "atp9",  "@I[9]",   "w", FUN_OFFSET(Event, atp),
                       "atb20", "@I[B20]", "w", FUN_OFFSET(Event, atb),
                       NULL);
    

    Here we have specified the following columns:

  • 2i: two short ints in an array which is stored as part the record
  • 2i2: the 3rd and 4th elements of an array which is stored as part of the record
  • an array of at least 10 elements, not stored in the record but allocated elsewhere, and used by three different columns:
  • at2p: 2 short ints which are the first 2 elements of the allocated array
  • at2p4: 2 short ints which are the 5th and 6th elements of the allocated array
  • atp9: a short int which is the 10th element of the allocated array
  • atb20: a short int which is at byte offset 20 of another allocated array
  • In this way, several columns can be specified, all of which are in a single array. NB: it is the programmer's responsibility to ensure that specification of a positive value for poff does not point past the end of valid data.

  • read/write mode: ``r'' means that the column is read from an input file into user space by FunTableRowGet(), ``w'' means that the column is written to an output file. Both can specified at the same time.
  • offset: the offset into the user data to store this column. Typically, the macro FUN_OFFSET(recname, colname) is used to define the offset into a record structure.

When all column arguments have been specified, a final NULL argument must added to signal the column selection list.

As an alternative to the varargs FunColumnSelect() routine, a non-varargs routine called FunColumnSelectArr() also is available. The first three arguments (fun, size, plist) of this routine are the same as in FunColumnSelect(). Instead of a variable argument list, however, FunColumnSelectArr() takes 5 additional arguments. The first 4 arrays arguments contain the names, types, modes, and offsets, respectively, of the columns being selected. The final argument is the number of columns that are contained in these arrays. It is the user's responsibility to free string space allocated in these arrays.

Consider the following example:

  typedef struct evstruct{
    int status;
    float pi, pha, *phas;
    double energy;
  } *Ev, EvRec;

  FunColumnSelect(fun, sizeof(EvRec), NULL,
    "status",  "J",     "r",   FUN_OFFSET(Ev, status),
    "pi",      "E",     "r",  FUN_OFFSET(Ev, pi),
    "pha",     "E",     "r",  FUN_OFFSET(Ev, pha),
    "phas",    "@9E",   "r",  FUN_OFFSET(Ev, phas),
    NULL);

Each time a row is read into the Ev struct, the ``status'' column is converted to an int data type (regardless of its data type in the file) and stored in the status value of the struct. Similarly, ``pi'' and ``pha'', and the phas vector are all stored as floats. Note that the ``@'' sign indicates that the ``phas'' vector is a pointer to a 9 element array, rather than an array allocated in the struct itself. The row record can then be processed as required:

  /* get rows -- let routine allocate the row array */
  while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){
    /* process all rows */
    for(i=0; i<got; i++){
      /* point to the i'th row */
      ev = ebuf+i;
      ev->pi = (ev->pi+.5);
      ev->pha = (ev->pi-.5);
    }

FunColumnSelect() can also be called to define ``writable'' columns in order to generate a FITS Binary Table, without reference to any input columns. For example, the following will generate a 4\-column FITS binary table when FunTableRowPut() is used to write Ev records:

  typedef struct evstruct{
    int status;
    float pi, pha
    double energy;
  } *Ev, EvRec;

  FunColumnSelect(fun, sizeof(EvRec), NULL,
    "status",  "J",     "w",   FUN_OFFSET(Ev, status),
    "pi",      "E",     "w",   FUN_OFFSET(Ev, pi),
    "pha",     "E",     "w",   FUN_OFFSET(Ev, pha),
    "energy",  "D",       "w",   FUN_OFFSET(Ev, energy),
    NULL);

All columns are declared to be write\-only, so presumably the column data is being generated or read from some other source.

In addition, FunColumnSelect() can be called to define both ``readable'' and ``writable'' columns. In this case, the ``read'' columns are associated with an input file, while the ``write'' columns are associated with the output file. Of course, columns can be specified as both ``readable'' and ``writable'', in which case they are read from input and (possibly modified data values are) written to the output. The FunColumnSelect() call itself is made by passing the input Funtools handle, and it is assumed that the output file has been opened using this input handle as its Funtools reference handle.

Consider the following example:

  typedef struct evstruct{
    int status;
    float pi, pha, *phas;
    double energy;
  } *Ev, EvRec;

  FunColumnSelect(fun, sizeof(EvRec), NULL,
    "status",  "J",     "r",   FUN_OFFSET(Ev, status),
    "pi",      "E",     "rw",  FUN_OFFSET(Ev, pi),
    "pha",     "E",     "rw",  FUN_OFFSET(Ev, pha),
    "phas",    "@9E",   "rw",  FUN_OFFSET(Ev, phas),
    "energy",  "D",     "w",   FUN_OFFSET(Ev, energy),
    NULL);

As in the ``read'' example above, each time an row is read into the Ev struct, the ``status'' column is converted to an int data type (regardless of its data type in the file) and stored in the status value of the struct. Similarly, ``pi'' and ``pha'', and the phas vector are all stored as floats. Since the ``pi'', ``pha'', and ``phas'' variables are declared as ``writable'' as well as ``readable'', they also will be written to the output file. Note, however, that the ``status'' variable is declared as ``readable'' only, and hence it will not be written to an output file. Finally, the ``energy'' column is declared as ``writable'' only, meaning it will not be read from the input file. In this case, it can be assumed that ``energy'' will be calculated in the program before being output along with the other values.

In these simple cases, only the columns specified as ``writable'' will be output using FunTableRowPut(). However, it often is the case that you want to merge the user columns back in with the input columns, even in cases where not all of the input column names are explicitly read or even known. For this important case, the merge=[type] keyword is provided in the plist string.

The merge=[type] keyword tells Funtools to merge the columns from the input file with user columns on output. It is normally used when an input and output file are opened and the input file provides the Funtools reference handle for the output file. In this case, each time FunTableRowGet() is called, the raw input rows are saved in a special buffer. If FunTableRowPut() then is called (before another call to FunTableRowGet()), the contents of the raw input rows are merged with the user rows according to the value of type as follows:

  • update: add new user columns, and update value of existing ones (maintaining the input data type)
  • replace: add new user columns, and replace the data type and value of existing ones. (Note that if tlmin/tlmax values are not specified in the replacing column, but are specified in the original column being replaced, then the original tlmin/tlmax values are used in the replacing column.)
  • append: only add new columns, do not ``replace'' or ``update'' existing ones

Consider the example above. If merge=update is specified in the plist string, then ``energy'' will be added to the input columns, and the values of ``pi'', ``pha'', and ``phas'' will be taken from the user space (i.e., the values will be updated from the original values, if they were changed by the program). The data type for ``pi'', ``pha'', and ``phas'' will be the same as in the original file. If merge=replace is specified, both the data type and value of these three input columns will be changed to the data type and value in the user structure. If merge=append is specified, none of these three columns will be updated, and only the ``energy'' column will be added. Note that in all cases, ``status'' will be written from the input data, not from the user record, since it was specified as read\-only.

Standard applications will call FunColumnSelect() to define user columns. However, if this routine is not called, the default behavior is to transfer all input columns into user space. For this purpose a default record structure is defined such that each data element is properly aligned on a valid data type boundary. This mechanism is used by programs such as fundisp and funtable to process columns without needing to know the specific names of those columns. It is not anticipated that users will need such capabilities (contact us if you do!)

By default, FunColumnSelect() reads/writes rows to/from an ``array of structs'', where each struct contains the column values for a single row of the table. This means that the returned values for a given column are not contiguous. You can set up the IO to return a ``struct of arrays'' so that each of the returned columns are contiguous by specifying org=structofarrays (abbreviation: org=soa) in the plist. (The default case is org=arrayofstructs or org=aos.)

For example, the default setup to retrieve rows from a table would be to define a record structure for a single event and then call
 FunColumnSelect() as follows:

  typedef struct evstruct{
    short region;
    double x, y;
    int pi, pha;
    double time;
  } *Ev, EvRec;

  got = FunColumnSelect(fun, sizeof(EvRec), NULL,
                        "x",       "D:10:10", mode, FUN_OFFSET(Ev, x),
                        "y",       "D:10:10", mode, FUN_OFFSET(Ev, y),
                        "pi",      "J",       mode, FUN_OFFSET(Ev, pi),
                        "pha",     "J",       mode, FUN_OFFSET(Ev, pha),
                        "time",    "1D",      mode, FUN_OFFSET(Ev, time),
                        NULL);

Subsequently, each call to FunTableRowGet() will return an array of structs, one for each returned row. If instead you wanted to read columns into contiguous arrays, you specify org=soa:

  typedef struct aevstruct{
    short region[MAXROW];
    double x[MAXROW], y[MAXROW];
    int pi[MAXROW], pha[MAXROW];
    double time[MAXROW];
  } *AEv, AEvRec;

  got = FunColumnSelect(fun, sizeof(AEvRec), "org=soa",
                      "x",       "D:10:10", mode, FUN_OFFSET(AEv, x),
                      "y",       "D:10:10", mode, FUN_OFFSET(AEv, y),
                      "pi",      "J",       mode, FUN_OFFSET(AEv, pi),
                      "pha",     "J",       mode, FUN_OFFSET(AEv, pha),
                      "time",    "1D",      mode, FUN_OFFSET(AEv, time),
                      NULL);

Note that the only modification to the call is in the plist string.

Of course, instead of using statically allocated arrays, you also can specify dynamically allocated pointers:

  /* pointers to arrays of columns (used in struct of arrays) */
  typedef struct pevstruct{
    short *region;
    double *x, *y;
    int *pi, *pha;
    double *time;
  } *PEv, PEvRec;

  got = FunColumnSelect(fun, sizeof(PEvRec), "org=structofarrays",
                      "$region", "@I",       mode, FUN_OFFSET(PEv, region),
                      "x",       "@D:10:10", mode, FUN_OFFSET(PEv, x),
                      "y",       "@D:10:10", mode, FUN_OFFSET(PEv, y),
                      "pi",      "@J",       mode, FUN_OFFSET(PEv, pi),
                      "pha",     "@J",       mode, FUN_OFFSET(PEv, pha),
                      "time",    "@1D",      mode, FUN_OFFSET(PEv, time),
                      NULL);

Here, the actual storage space is either allocated by the user or by the FunColumnSelect() call).

In all of the above cases, the same call is made to retrieve rows, e.g.:

    buf = (void *)FunTableRowGet(fun, NULL, MAXROW, NULL, &got);

However, the individual data elements are accessed differently. For the default case of an ``array of structs'', the individual row records are accessed using:

  for(i=0; i<got; i++){
    ev = (Ev)buf+i;
    fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
            ev->x, ev->y, ev->pi, ev->pha, ev->dx, ev->dy, ev->time);
  }

For a struct of arrays or a struct of array pointers, we have a single struct through which we access individual columns and rows using:

  aev = (AEv)buf;
  for(i=0; i<got; i++){
    fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
            aev->x[i], aev->y[i], aev->pi[i], aev->pha[i], 
            aev->dx[i], aev->dy[i], aev->time[i]);
  }

Support for struct of arrays in the FunTableRowPut() call is handled analogously.

See the evread example code and evmerge example code for working examples of how FunColumnSelect() is used.