Image Subtraction - EranOfek/AstroPack GitHub Wiki
Introduction
Image subtraction (differencing) is the method of choice for the detection of transients, variability, or motion, in the presence of a complicated background (e.g., galaxies, blended stars). We will refer to the pair of images we would like to subtract as New ($N$) and Reference ($R$). The image formation model is given by:
$N = F_{n} T \otimes P_{n} + \epsilon_{n}$
$R = F_{r} T \otimes P_{r} + \epsilon_{r}$
Here $T$ is the true image of the sky, which is generally unknown, $\otimes$ is the convolution operator, $F$ is the flux normalization of the images, $P_{n}$ and $P_{r}$ is the PSF of the New and Reference images, respectively, and $\epsilon$ corresponds to the noise. For simplicity, we will assume the noise is Gaussian and independent and identically distributed (i.i.d.). This assumption is usually correct in the background-noise-dominated regime, and later on we will discuss this assumption some more.
For simplicity, we will assume that the images are background subtraction, and they were flux normalized such that $F_{n}=F_{r}=1$. Since in many case the PSF of the two images is not identical, simple subtraction of the two images is not useful.
One of the first solutions to this problem was suggested by Phillips et al. - assuming that the PSF of the Reference image is sharper than the PSF of the New image over all axes, then we can look for a kernel $k$ that convolved with $R$ will mimic the appearance of $N$, such that:
$N - R\otimes k \approx 0$.
Taking the Fourier transform (denoted by hat) of this expression, one can write a simple expression for $k$:
$\widehat{k} = \frac{\widehat{N}}{\widehat{R}} $
The main problem with the last expression is that it is numerically unstable (i.e., de-convolution). In other words, it involves division by zero at low frequencies, where $\widehat{R}$ approaches to zero. Alarad and Lupton (1998) and Bramich (2006) solved this problem by regularizing the problem. For example, Alard and Lupton forced the $k$ kernel to be a linear combination of Gaussians and solved for it using linear least squares, while Bramich used linear least squares to find the pixelized $k$ in a finite stamp.
Gal-Yam (2002) proposed a different solution:
$N \otimes P_{r} - R \otimes P_{n} \approx 0$.
The nice feature of this solution is that it does not involve de-convolution, and its anti-symmetric (i.e., the assumption that $P_{r}$ is sharper, compared to $P_{n}$, in all axes, can be relaxed). In fact, one can see that there are an infinite number of solutions to this problem of the form: $N \otimes k_{n} - R \otimes k_{r} \approx 0$.
Where, for each $k_{n}$ one can find an appropriate $k_{r}$.
These methods suffer from several issues: (1) They lack proof of optimality - i.e., it is not clear which method is better. (2) The classical methods are not anti-symmetric. I.e., one can not exchange the role of $R$ and $N$. (3) The error propagation suggested by these methods was partial. This results in an underestimation of the noise in the subtraction image. (4) Except for the Gal-Yam method, these methods are numerically unstable, which may cause subtraction artifacts. (5) Except for the Gal-Yam method, these methods require linear least squares which is slow.
ZOGY
In Zackay et al. (2016) [ZOGY] formalism, the problem of transient detection is approached via simple hypothesis testing, for which an optimal solution is known (Neyman & Pearson 1934) - specifically, the likelihood ratio test.
Specifically, we are interested in separating between the two following hypotheses: For both hypotheses: $R= T\otimes P_{r} + \epsilon_{r}$.
$H_0 : N = T\otimes P_{n} + \epsilon_{n}$,
$H_1 : N = T\otimes P_{n} + \alpha \delta_{q} \otimes P_{n} + \epsilon_{n}$,
where $\delta_{q}$ is a delta function at position $q$ (the new source we are interested in detecting), and $\alpha$ is the source flux. Although there are two free parameters ($q$ and $\alpha$), the hypothesis is simple (see ZOGY for details). The likelihood is function is given by:
$L(q,\alpha) = \frac{ P(N, R\vert H_{0}) }{ P(N, R\vert H_{1})} $.
After some algebra (see ZOGY, Appendix A), we can write, the optimal statistics for source detection:
$\widehat{S} = \widehat{\frac{1}{\alpha} L} = \widehat{D} \bar{\widehat{P}}_{D}$
Here $S$ is given by a matched filtering between two terms, where $D$ (the proper subtraction image) is given by:
$\widehat{D}=\frac{\widehat{P} _{r} \widehat{N} - \widehat{P} _{n} \widehat{R}}{\sqrt{\sigma _{n}^{2} \vert\widehat{P} _{r}\vert ^{2} + \sigma _{r}^{2} \vert\widehat{P} _{n}\vert ^{2}}}$
and the PSF of the difference image:
$\widehat{P} _D = \frac{\widehat{P} _r \widehat{P} _n}{\sqrt{\sigma _{n}^{2} \vert\widehat{P} _{r}\vert ^{2} + \sigma _{r}^{2} \vert\widehat{P} _{n}\vert ^{2}}}$
Properties of $D$ and $S$
$D$ and $S$ have several important properties:
- They are numerically stable (no devision by zero), due to the fact that the denominator in their expressions converging to zero faster than their numerator.
- They are anti-symmetric - i.e., one can exchange the role of $R$ and $N$ and get the same value with the opposite sign.
- The pair $D$ and $P_D$ are sufficient statistics, meaning that they can be used for any statistical question on the difference between $R$ and $N$ without lose of information.
- $S$ is the optimal statistics for source detection under the following assumptions: (i) the image is in the background-noise dominated regime; (ii) The registration and flux matching are perfect; and (iii) the PSFs of $N$ and $R$ are known.
- Since the expressions for $D$ and $S$ involve only simple operators and the fast Fourier transform, their calculation is very fast.
Violation of the assumptions
ZOGY, and for that matter all the past image subtraction methods, relies on several assumptions:
- The images are Nyquist sampled. For most practical cases, the requirement is that the pixel scale is smaller than about FWHM/2.35.
- The noise in the $R$ and $N$ images is Gaussian and i.i.d. This assumption is valid only in the background-noise-dominated case. For most astronomical images, this assumption is correct almost everywhere in the image, with the exception of bright stars (i.e., whose flux is larger than the variance of the background).
- The flux matching is accurately known and the images are flux-matched.
- The New and Reference image has the same wavelength response. Breakup of this assumption, due to e.g., atmosphere, will lead to the effective violation of the flux matching assumption.
- The images are perfectly registered.
- The PSF of the New and Reference images are accurately known.
In the following sections, we discuss some solutions to some of the major problems listed here.
Translient and non-perfect registration
Image subtraction methods assume perfect registration. In practice, the breakdown of this assumption is unavoidable for several reasons: (i) For ground-based observations atmospheric scintillation shifts the position of stars by an amount typically larger than the measurement error due to the Poisson noise. Furthermore, these shifts are correlated within the $<1$,arcmin-size isoplanatic patch, meaning that in practice the stars' positions are shifted, (almost) randomly, with respect to their mean position. (ii) Color refraction in the atmosphere or in a telescope, along with the variance of stars' colors, introduces variations in the relative positions of stars, as a function of the airmass or the focal plane position, respectively. (iii) Poisson errors are limiting the accuracy of image registration; and (iv) Proper motion of astrophysical objects.
Translient (Translational transient) is a hypothesis-testing method for detecting source motion between a reference and new images. Detection and measurement of sub-pixel motions of sources in astronomical images is required for two major science cases: (i) Detecting motion due to any kind of astrometric noise, or astrophysical motion, that leads to subtraction artifacts seen in image differencing methods, and in turn, leads to false alarms in transient detection; (ii) Measuring astrophysical motion.
Our model for the reference image ($R$) and the new image ($N$) is given by:
$R = (T + \alpha_r\delta_{\vec{q}})\otimes P_r + \epsilon_r$
$N = (T + \alpha_n\delta_{\vec{p}})\otimes P_n + \epsilon_n$
Here $P$ denotes the PSF, $\epsilon$ is i.i.d. noise, $\delta$ is the Dirac delta function at position $\vec{p}$ or $\vec{q}$. Here we assume that the images are registered and flux-matched.
We would like to separate the following two hypotheses:
$H_0 : \vec{q}=\vec{p}$
$H_1 : \vec{q}\neq \vec{p}$
In Springer et al. (2024) we use the Neyman-Pearson lemma to derive the Translient statistics $Z^{2}$.
$Z^{2} = \vert\vert \Im ( F^{-1} [ \frac{4\pi\vec{k}}{m}\frac{\bar{\widehat{P}}_n \bar{\widehat{P}}_r(\widehat{P}_n\widehat{R} - \widehat{P}_r \widehat{N})}{|\widehat{P}_r|^2\sigma_n^2 + |\widehat{P}_n|^2\sigma_r^2} ] ) \vert\vert^{2}$.
Here $\Im$ is the imaginary-part function, $F^{-1}$ is the invers Fourier Transform operator, and $\sigma^{2}$ is the image variance. $Z^{2}$ distributed like $\chi^{2}$ distribution with two degrees of freedom.
Therefore, Translient provides an optimal method to tell if some subtraction artifact is due to flux variability or due to motion. This can be achived by comparing $S^{2}+1$ to $Z^{2}$ (i.e., $Z^{2} distributed like $\chi^{2}$ distribution with two degrees of freedom, while $S^{2}$ distributed like $\chi^{2}$ distribution with one degree of freedom). If $Z^{2}>S^{2}+1$, then the motion model for the source is more likely, while if $S^{2}+1>Z^{2}$, then the flux variability/transient model is more likely. In other words, the Translient algorithm provides a solution to the non-perfect registration of astronomical images.
Source noise
The assumption that the image is background-noise-dominated is, in most cases, a good assumption. The reason for this is that in optical surveys the flux of the source is smaller than the variance of the background. Even for bright transients, this assumption is harmless because the transient will still be detected at high significance (but the significance will be biased). However, if we would like to detect the variability of bright sources, then we must account for the source noise.
Here, instead of using the ZOGY propagation of source-noise, we suggest another approach. For each transient found in $S$ (i.e., local maxima or minima in $S$), we can correct the value of $S$ for the source noise. This correction is roughly given by
$S_{corr} \approx \frac{S}{1 + 2f_{n}/(\sigma_{n}^{2} \eta)}$
where $f_{n}$ is the flux of the source in $N$ image, $\sigma_{n}^{2}$ is the variance of the background in the $N$ image, and $\eta$ is given by
$\eta = \sqrt{ \sum{ P_{n}^{2} }}$