difx1.5 files - difx/difx GitHub Wiki
DiFX1.5 File Formats
There's 5 main ascii files you need to run a correlation (6 if you're doing pulsar binning) - only one of them is really very complex. I'll start with the easiest and work my way up. Remember, check out the examples for more info, or these examples for some much more complicated setups including pulsar binning. Whenever a keyword/value pair is referred to, the value begins at the 21st character. Also, sorry the tabs don't come out properly in the example delay and uvw file snippets on this page.
The machine file
Simple - 1 line per node of the correlation. If you request more nodes than this file has lines, mpi will wrap back to the start - not efficient. An example for a 10 node correlation on the Swinburne cluster might be:
tera01
tera02
tera03
tera04
tera05
tera06
tera07
tera08
tera09
tera10
The threads file
Just as simple as the machine file, the threads file details how many threads you want for each node that will be a Core. So if there's 10 nodes, and this is 3 station experiment, there will be 10 - 3 (datastreams) - 1 (manager) = 6 Core nodes. That means your threads file should be at least 6 lines long. It starts with one line telling how many Cores there can be, and then has Ncores lines with just a number per line. That number is how many threads for that node. So it looks something like:
NUMBER OF CORES: 6
2
2
2
2
2
2
In this example, tera05, tera06,...tera10 have 2 threads each. This is sensible when you have a dual-core machine, or one with hyperthreading. Being able to specify the threads on a per-node basis lets you squeeze the best performance out of a heterogenous cluster.
The uvw file
This has the precomputed uvw values (in metres) for each telescope in the correlation. It starts off with some keyword/value information on the telescopes and the timerange covered, and then lists the uvw values at a number of points in each scan. In more detail: 6 keyword value lines for the date and time eg:
START YEAR: 2007
START MONTH: 1
START DAY: 27
START HOUR: 0
START MINUTE: 59
START SECOND: 0.0
A keyword/value pair indicating the time in seconds between uvw
measurements in the file eg:
INCREMENT (SECS): 1 A table of the antennas in the file, with their
location, mount and axis offset, eg:
NUM TELESCOPES: 3
TELESCOPE 0 NAME: PKS
TELESCOPE 0 MOUNT: azel
TELESCOPE 0 X (m): -4554231.656
TELESCOPE 0 Y (m): 2816759.097
TELESCOPE 0 Z (m): -3454036.085
TELESCOPE 1 NAME: CATW172
TELESCOPE 1 MOUNT: azel
TELESCOPE 1 X (m): -4751111.121
TELESCOPE 1 Y (m): 2792597.147
TELESCOPE 1 Z (m): -3200491.7
TELESCOPE 2 NAME: MOPRA
TELESCOPE 2 MOUNT: azel
TELESCOPE 2 X (m): -4682768.63
TELESCOPE 2 Y (m): 2802619.06
TELESCOPE 2 Z (m): -3291759.9
A keyword/value pair for the number of scans in the file eg:
NUM SCANS: 442 Then, for each scan, we have the a short header detailing
the start point (in units of the `increment' specified above), number
of points in this scan, and source information:
SCAN 0 POINTS: 3660
SCAN 0 START PT: 0
SCAN 0 SRC NAME: 2255-282
SCAN 0 SRC RA: 6.01309290672888
SCAN 0 SRC DEC: -0.488213469533334
This is followed by N lines of uvw information, where N is three more
than the number of points in the scan. The 3 extra points are 1 before
the start of the scan, the end of the scan, and one after the end of the
scan. They are necessary to allow quadratic interpolation of the uvws in
DiFX.
The lines themselves consist of 3 double precision values per telescope
(u,v,w) separated by tabs:
RELATIVE INC -1: -4422923.400427 -1635977.07993768 4285656.48881794
-4480653.93533355 -1323071.2766 4431 4334515.70411182 -4462108.79816306
-1434876.53542624 4318494.43016203 -3908986.436378 -2596355.30353404
4305222.82165622 -4262266.22917909 -1845427.2806343 4361869.43587185 In
this example, PKS u=-4422923.400427m, v=-1635977.07993768m,
w=4285656.48881794m, CATW172 u=-4480653.93533355m ...
The next scan information follows immediately after the last line of the
previous scan:
RELATIVE INC 3661: -3469314.08270723 -1139659.67209665 5222122.07593112
-3474925.64487348 -822771.36799 4098 5278517.03745916 -3474594.52514974
-935763.342832722 5260248.94742375 -3087756.61648574 -2156383.35833915
5135362.56754033 -3322476.95372888 -1368461.96043286 5261832.2277425
SCAN 1 POINTS: 720
SCAN 1 START PT: 3660
...
The delay file
This has the precomputed geocentric delay values for each telescope, and is very similar in format to the uvw file. The first 7 lines are identical (start date/time and time increment in seconds), and are followed by a telescope table, as with the uvw file. However, in the delay file, only the telescope's names are given (rather than name, xyz and mount) ie:
NUM TELESCOPES: 3
TELESCOPE 0 NAME: PKS
TELESCOPE 1 NAME: CATW172
TELESCOPE 2 NAME: MOPRA
The same NUMBER OF SCANS line follows, and then the actual delay
information. The scan header is identical to the uvw scan header except
with the RA and DEC information removed, ie:
SCAN 0 POINTS: 3660
SCAN 0 START PT: 0
SCAN 0 SRC NAME: 2255-282
As with the uvw file, there are 3 extra lines per scan of delay info.
Each line of the scan block consists of one double precision number per
telescope, tab separated, which is the geometric delay in microseconds
for that telescope:
RELATIVE INC -1: 14296.7612598859 14459.6988020343 14406.2729168216
14362.0826380644 14550.9768424 732 In this example, the geometric delay
for PKS is 14296.7612598859, CATW172 is 14459.6988020343, ...
The correlator input file
This is of necessity a fairly complex file, and fairly long, although typically a lot of it is just repetition for different baselines and telescopes, and easy to generate automatically from the vex file. It is divided into a series of tables, which I will go through in turn.
The common settings table
This contains general information such as time range, and the paths to the other ascii files described above. The necessary keywords are shown below, with notes if the meaning is not obvious:
DELAY FILENAME: {the delay file path}
UVW FILENAME: {the uvw file path}
CORE CONF FILENAME: {the thread file path}
EXECUTE TIME (SEC):
START MJD:
START SECONDS: {offset from 00:00:00 on the start day}
ACTIVE DATASTREAMS:
ACTIVE BASELINES: {usually Ndatastreams*(Ndatastreams-1)/2, unless some aren't worth correlating}
DATA HEADER O/RIDE: {Always false. Provided for future compatibility when baseband files have more metadata}
OUTPUT FORMAT: {Currently, available options are RPFITS, ASCII and SWIN}
OUTPUT FILENAME: {the rpfits filename if FORMAT=RPFITS, or the directory for output files if FORMAT=SWIN}
The config table
This contains info on correlator setup - number of channels per IF,
message sizes etc. This is placed in a separate table to the common
settings so that you can have different setups for different sources -
ie high frequency resolution for a target maser and low frequency
resolution for your continuum phase reference source. It also allows you
to turn pulsar binning on for specific sources. The first line just
informs us how many configs will follow:
NUM CONFIGURATIONS: 6 So then we get one of the following blocks of
information per config - I've shown some typical values:
CONFIG SOURCE: 1604-4441 {The source name}
INT TIME (SEC): 2.0 {Integration time in seconds}
NUM CHANNELS: 64 {Spectral points per subbands}
BLOCKS PER SEND: 1000 {Number of FFT blocks to be processed at a time by a Core}
GUARD BLOCKS: 1 {Extra FFT blocks to tack onto the end of a message}
POST-F FRINGE ROT: FALSE {Whether to do fringe-rotation using a constant approximation across an FFT length}
QUAD DELAY INTERP: TRUE {Whether to do fringe-rotation using a quadratic approximation across an FFT length}
WRITE AUTOCORRS: TRUE {Whether to write autocorrelations to disk}
PULSAR BINNING: FALSE {Does this source need pulsar binning?}
DATASTREAM 0 INDEX: 0 {Refers to the Datastream table, explained below}
DATASTREAM 1 INDEX: 1 {There are 5 since we specified 5 Datastreams in the Common Settings}
DATASTREAM 2 INDEX: 2
DATASTREAM 3 INDEX: 3
DATASTREAM 4 INDEX: 4
BASELINE 0 INDEX: 0 {Refers to the Baseline table, explained below}
BASELINE 1 INDEX: 1 {There are 10 since we specified 10 Baselines in the Common Settings}
BASELINE 2 INDEX: 2
BASELINE 3 INDEX: 3
BASELINE 4 INDEX: 4
BASELINE 5 INDEX: 5
BASELINE 6 INDEX: 6
BASELINE 7 INDEX: 7
BASELINE 8 INDEX: 8
BASELINE 9 INDEX: 9
Usually, at least one of the config sources will be DEFAULT. This is the config used for any source without an individual specification. If no DEFAULT config is specified, then sources which do not appear in the CONFIG table are skipped over and not correlated.
If PULSAR BINNING is TRUE, an extra line is inserted immediately below
the PULSAR BINNING line as shown below:
PULSAR CONFIG FILE:
/nfs/cluster/ska/adeller/v190/v190f/pulseprofiles/2144-3933/2144-3933.gate.binconfig
The format of the pulsar config file is described below.
The frequency table
Lists all the frequencies used in the experiment. Like most of these
tables, it starts with one line listing the number of entries, and then
has three lines per entry: band edge frequency, upper or lower sideband,
and the bandwidth. The frequencies are specified in MHz, and U or L is
used to indicate upper/lower sideband respectively. A sample freq table
is shown below:
FREQ ENTRIES: 4
FREQ (MHZ) 0: 1634.0
BW (MHZ) 0: 16.0
SIDEBAND 0: L
FREQ (MHZ) 1: 1634.0
BW (MHZ) 1: 16.0
SIDEBAND 1: U
FREQ (MHZ) 2: 1666.0
BW (MHZ) 2: 16.0
SIDEBAND 2: L
FREQ (MHZ) 3: 1666.0
BW (MHZ) 3: 16.0
SIDEBAND 3: U
All future tables refer to the freq table when specifying frequency bands.
The telescope table
The telescope table contains a listing of the stations used in the
experiment. The names used must be a subset of those in the delay and
uvw files - the correlator will die gracefully if it cannot find one of
the stations in this table somewhere in the delay and uvw files. Each
station has a clock offset (microseconds) and a clock rate (microseconds
per second). These are in the same sense as the geometric delay ie a
positive clock offset is a *delay*. Thus, if you are looking at the
delay quantity of an SN table in AIPS, the corrections you make to these
numbers are in the same sense as those you see on the TV. An example
telescope table is shown below.
# TELESCOPE TABLE ##!
TELESCOPE ENTRIES: 5
TELESCOPE NAME 0: PKS
CLOCK DELAY (us) 0: 0.0
CLOCK RATE(us/s) 0: 0.0
TELESCOPE NAME 1: CATW172
CLOCK DELAY (us) 1: -49.14
CLOCK RATE(us/s) 1: 6.94E-8
TELESCOPE NAME 2: MOPRA
CLOCK DELAY (us) 2: -3.455
CLOCK RATE(us/s) 2: -1.2199E-6
...
Entries in the telescope table are referred to by the Datastream table entries. Thus, more than one Datastream can reference a single Telescope. This is arranged in this fashion so you don't need to specify the station clocks over and over again, when you have a few different band setups throughout the experiment (ie wideband phase reference, narrowband target etc). It is also useful if one station has recorded separate streams of data - this happens at the LBA in 1 Gbps mode, where the data is recorded in two separate 512 Mbps files. In this situation, you really have two "Datastreams" coming from one "Telescope".
The datastream table
The table starts with the usual number of entries, and then two lines
which affect all Datastreams. These are the factors affecting the size
and breakup of the memory buffer. The size of the buffer is given in
terms of a multiplier for the message size (which is itself a number of
FFT chunks - see the Config table). The memory buffer is then divided
into a number of segments - this must be even and must be at least 4.
# DATASTREAM TABLE #!
DATASTREAM ENTRIES: 5
DATA BUFFER FACTOR: 32
NUM DATA SEGMENTS: 8
The table entries are necessarily complex, as they completely describe the band setup for each datastream. This comprises the format and precision of the recording, the a priori system temperature, the data source (network or disk), whether to use a filterbank instead of an FFT, the number of frequencies, small delay offsets for each frequency, the number of polarisations recorded in each frequency and finally the order of each of the bands within the file.
The introductory stuff (format, tsys etc) goes at the top as shown:
TELESCOPE INDEX: 0
TSYS: 42.0
DATA FORMAT: LBAVSOP
QUANTISATION BITS: 2
FILTERBANK USED: FALSE
READ FROM FILE: TRUE
Choices for the Mode include LBA (2 bit mag sign encoding), LBAVSOP (2 bit offset binary encoding), MKV, and NZ (8 bit linear). If MKV format is used, an additional line is inserted between the DATA FORMAT and the QUANTISATION BITS to tell the correlator the fanout of the data, eg:
FANOUT: 2
This is followed by a frequency section, which lists the number of
frequencies, indexes to the frequency table, small delay offsets for
each frequency (usually 0 - if used, applied in the same sense as AIPS
displays residual delays), and the number of polarisations for each
frequency (1 or 2 obviously):
NUM FREQS: 4
FREQ TABLE INDEX 0: 0
CLK OFFSET 0 (us): 0.01
NUM POLS 0: 2
FREQ TABLE INDEX 1: 1
CLK OFFSET 1 (us): 0.01
NUM POLS 1: 2
FREQ TABLE INDEX 2: 2
CLK OFFSET 2 (us): 0.0
NUM POLS 2: 2
FREQ TABLE INDEX 3: 3
CLK OFFSET 3 (us): 0.0
NUM POLS 3: 2
Now, if you add up the polarisations from each frequency, in the example
above it is clear there are 8 bands total for this datastream. So, the
final part of the entry for this datastream is 8 band entries, each with
a frequency (an index to the "local frequency table" - ie the section
from just above), and a polarisation:
INPUT BAND 0 POL: L
INPUT BAND 0 INDEX: 0
INPUT BAND 1 POL: L
INPUT BAND 1 INDEX: 1
INPUT BAND 2 POL: R
INPUT BAND 2 INDEX: 0
INPUT BAND 3 POL: R
INPUT BAND 3 INDEX: 1
INPUT BAND 4 POL: L
INPUT BAND 4 INDEX: 2
INPUT BAND 5 POL: L
INPUT BAND 5 INDEX: 3
INPUT BAND 6 POL: R
INPUT BAND 6 INDEX: 2
INPUT BAND 7 POL: R
INPUT BAND 7 INDEX: 3
So to work out the actual frequency for a given band takes two lookups:
say band 6 in the example above, that references "local frequency" 2. We
look at FREQ TABLE INDEX 2 from the local frequency table: that
references entry 2 of the actual frequency table. Looking at that, we
see that it is sky frequency 1666 MHz, lower side band, with a bandwidth
of 16 MHz.
If all telescopes are configured identically then the "local frequency
table" is degenerate with the actual frequency table and hence pretty
boring - its when you have multiple setups and telescopes with different
recording modes that it gets useful. Its really a convenience thing for
the way the correlator looks stuff up internally anyway.
For use in the baseline section, I'll also show the second Datastream (CATW172):
TELESCOPE INDEX: 1
TSYS: 68.0
DATA FORMAT: LBAVSOP
QUANTISATION BITS: 2
FILTERBANK USED: FALSE
READ FROM FILE: TRUE
NUM FREQS: 4
FREQ TABLE INDEX 0: 0
CLK OFFSET 0 (us): 0.0
NUM POLS 0: 2
FREQ TABLE INDEX 1: 1
CLK OFFSET 1 (us): 0.0
NUM POLS 1: 2
FREQ TABLE INDEX 2: 2
CLK OFFSET 2 (us): 0.0
NUM POLS 2: 2
FREQ TABLE INDEX 3: 3
CLK OFFSET 3 (us): 0.0
NUM POLS 3: 2
INPUT BAND 0 POL: R
INPUT BAND 0 INDEX: 0
INPUT BAND 1 POL: R
INPUT BAND 1 INDEX: 1
INPUT BAND 2 POL: L
INPUT BAND 2 INDEX: 0
INPUT BAND 3 POL: L
INPUT BAND 3 INDEX: 1
INPUT BAND 4 POL: R
INPUT BAND 4 INDEX: 2
INPUT BAND 5 POL: R
INPUT BAND 5 INDEX: 3
INPUT BAND 6 POL: L
INPUT BAND 6 INDEX: 2
INPUT BAND 7 POL: L
INPUT BAND 7 INDEX: 3
The baseline table
The baseline table starts with the usual "number of entries" line.
# BASELINE TABLE ###!
BASELINE ENTRIES: 10
Each entry then consists of two Datastreams (references to the
Datastream table), the number of frequencies, and the number of
polarisation products per frequency, as shown below:
D/STREAM A INDEX 0: 0
D/STREAM B INDEX 0: 1
NUM FREQS 0: 4
POL PRODUCTS 0/0: 2
D/STREAM A BAND 0: 2
D/STREAM B BAND 0: 0
D/STREAM A BAND 1: 0
D/STREAM B BAND 1: 2
POL PRODUCTS 0/1: 2
D/STREAM A BAND 0: 3
D/STREAM B BAND 0: 1
D/STREAM A BAND 1: 1
D/STREAM B BAND 1: 3
POL PRODUCTS 0/2: 2
D/STREAM A BAND 0: 6
D/STREAM B BAND 0: 4
D/STREAM A BAND 1: 4
D/STREAM B BAND 1: 6
POL PRODUCTS 0/3: 2
D/STREAM A BAND 0: 7
D/STREAM B BAND 0: 5
D/STREAM A BAND 1: 5
D/STREAM B BAND 1: 7
If we look up the Datastream table, we see that Datastream 0 and 1 reference telescope 0 and 1, which are PKS and CATW172 respectively. Each of these Datastreams has 4 frequencies, so it is unsurprising that we are choosing to correlate all 4. Each frequency here has two polarisation products, and if we again follow the references back through the Datastream table, we see that in each case the products correspond to RR and LL. Eg for the first frequency, band 2 of PKS is 1634 LSB, polarisation R, and band 0 of CATW172 is 1634 LSB, polarisation R, so this product is 1634 RR. Band 0 of PKS is 1634 LSB, polarisation L, and band 2 of CATW172 is 1634 LSB, polarisation L, so this product is 1634 LL. Naturally, most of the time you would use a script to set this table up, looking for all bands that overlap for the given baseline.
The data table
This table must be included if one or more datastreams read from a file.
It is implicitly the same length as the datastream table (there is no
"number of entries" line). Each datastream has one line to say the
number of files N, and then N lines with filenames:
# DATA TABLE #######!
D/STREAM 0 FILES: 8639
FILE 0/0: /nfs/cluster/raid9/v190f/v190f-Pk_027_020000.lba
FILE 0/1: /nfs/cluster/raid9/v190f/v190f-Pk_027_020010.lba
...
The network table
This table must be included if one or more datastreams read from a network connection (READ FROM FILE: FALSE). It is implicitly the same length as the datastream table (there is no "number of entries" line). Each datastream has two lines - a port number and a TCP window size in kB.
# NETWORK TABLE ####!
PORT NUM 0: 10001
TCP WINDOW SIZE 0: 250
PORT NUM 1: 10002
TCP WINDOW SIZE 1: 250
...
Probably best to contact me if you have interest in trying out the network-fed correlator, as you'll need to set up the sending side of things as well which isn't covered here.
The pulsar configuration file
This is pretty simple - it gives links to the polyco file(s) containing
pulse prediction information (see the program
TEMPO
for a description of the polyco file format), and specifies where the
bin end-points are set. It also gives the option to "scrunch" the binned
data. If SCRUNCH is true, each bin is scaled by its corresponding weight
and the bins are summed before writing to disk: thus only one "bin" is
recorded per time integration. This can be used to implement a matched
filter for each pulsar, recovering maximum S/N. If SCRUNCH is false,
each bin is written out separately and the weights are ignored. This
mode is not well tested, and may have bugs.
NUM POLYCO FILES: 3
POLYCO FILE 0: /nfs/cluster/ska/adeller/v190/v190f/pulseprofiles/0630-2834/0630-2834_54126_200000.polyco
POLYCO FILE 1: /nfs/cluster/ska/adeller/v190/v190f/pulseprofiles/0630-2834/0630-2834_54127_120000.polyco
POLYCO FILE 2: /nfs/cluster/ska/adeller/v190/v190f/pulseprofiles/0630-2834/0630-2834_54127_200000.polyco
NUM PULSAR BINS: 2
SCRUNCH OUTPUT: TRUE
BIN PHASE END 0: 0.58
BIN WEIGHT 0: 0.0
BIN PHASE END 1: 0.665
BIN WEIGHT 1: 1.0
This example shows a simple gate, where only data falling between pulse phase 0.58 and 0.665 is retained.
The SWIN output data format
Okay, so this isn't an ascii control file, but it is a file format so I'll describe it briefly here. The purpose of this file is to hold a bunch of visibilities in a relatively easy to understand format, which you can then translate into your favourite flavour of FITS or similar. At Swinburne we're working on AIPS++ measurement sets right now.
You create "SWIN" style output data by specifying OUTPUT FORMAT: SWIN in the common table of your correlator input file. When creating SWIN style data, the OUTPUT FILENAME keyword in the common table must refer to a non-existent directory that you want to create to store the visbility files in. The root directory of the directory you specify must exist eg if you want to use /tmp/experiment/binary/ as your output directory, /tmp/experiment/ must exist but /tmp/experiment/binary/ must not.
In this directory, one or more SWIN style visibility files will be created. Each file will have a name of the form
DIFX_MJD_SECONDS.nnnnn
where MJD is the MJD of the first visibility point in the file, SECONDS
is the number of seconds since the start of MJD for the first visibility
point, and nnnnn is the number of visibility entries in the file.
Each visibility entry consists of a short ascii header (containing the
kind of info that is held in the random group headers of RPFITS),
followed by the visibility data in 32 bit complex floats, and optionally
weights as 32 bit complex floats. Each header has 13 keyword/value
pairts and looks like this:
BASELINE NUM: 258
MJD: 54044
SECONDS: 3600.5
CONFIG INDEX: 0
SOURCE INDEX: 1
FREQ INDEX: 0
POLARISATION PAIR: RR
PULSAR BIN: 0
FLAGGED: 0
DATA WEIGHT: 1.0
U (METRES): -4422923.400427
V (METRES): -1635977.07993768
W (METRES): 4285656.48881794
The header is immediately followed by the binary real and imag for each
point. The length will be 2*numchannels floats, packed as re im re im
re ...
The value numchannels can be found from the input file, looking at the
correct entry in the config table as specified by CONFIG INDEX. The end
of the visibilities is immediately followed by the next header, and so
on.