meeting 2024 09 26 n57 - JacobPilawa/TriaxSchwarzschild_wiki

Context

Following up from the meeting yesterday where we discussed the difference in the results for the s = 1.00 models, likely originating from using a different triaxnnls binary. I've re-minimized the s=1.00 models and the results are looking much more reasonable/in line with expectations, but I still cannot figure out what's going on with GPR and dynesty.
Takeaways:
- There seems to be a strong systematic difference in the results when using the new binaries compared to the old binaries of a chi2 difference of about ~6.
- Using only scalings which were produced with the same binary gives results that we would expected (very continuous chi2 shifts, "scallops" visible in the 1d panels). However, GPR and Dynesty on these scalings is still all over the place and totally inconsistent with what the 1d panels are showing. M/L is still off the grid, and the black hole mass does not appear consistent with where I would put things.
- We could still try re-running the other scalings (s=0.97, s=0.99, s=1.01, s=1.02, s=1.03) if we'd like, but I'd like to figure out what's going on with the three scalings we have so far that seem reasonable. I'm sending Emily a copy of the points to see if she is finding the same results as I am.

Plots

First, here's a direct comparsion of our fiducial chi2 case -- NNLS chi2, no outer 4 bins, no dummy moments. The left panel shows simply a scatter plot of the chi2s against each other, and the right is a histogram showing the difference for all the models. The two values correlate EXTREMELY tightly, with a fixed offset of ~6 in chi2. There's a bit of variation around this 6, which can be seen in the histogram on the right.

Plot

I've also replotted the chi2 histogram we were looking at yesterday (again, for the fiducial case), and the results are much more reasonable. The s=1.00 is nicely straddled/in line with the s=0.995 and s=1.005 cases.
ONE HUGE NOTE HERE: The other scalings (s=0.97, s=0.99, s=1.01, s=1.02, s=1.03) were also run at the same time as the s=1.0 case, so it's a bit misleading to keep those on the plot. I haven't reminimized these yet (unclear if we want to re-minimize all of them or just a subset), but left them here so we have something to relate to yesterday's results.

Plot

As an additional sanity check, we also were looking at these style plots yesterday which tipped us off in the first place that the s=1.0 case was an outlier. The results now look much more smooth, without the central bump that we were seeing before:
ANOTHER HUGE REMINDER FROM ABOVE: The other scalings (s=0.97, s=0.99, s=1.01, s=1.02, s=1.03) were also run at the same time as the s=1.0 case, so it's a bit misleading to keep those on the plot. I haven't reminimized these yet (unclear if we want to re-minimize all of them or just a subset), but left them here so we have something to relate to yesterday's results. At least things are broadly smooth and consistent around s=1.00.

Plot

I've also gone through and re-selected the best scalings in light of the fixed s=1.00 chi2 values, and the resulting distribution is looking much more reasonable when using only the "equivalent" scalings.
- There is still a slight tilt which seems to favor models near s=1.005, but there are s=1.00 and s=0.995 models which are included in the mix still.
- Including only the s=0.995,s=1.0, and s=1.005 cases give essentially exactly what we would expect.
Again, in the case where I'm including all the scalings, reminder that the s=0.995, s=1.0, and s=1.005 are different from the others, which I suspect is part of the reason the s=1.03 are still included. I think we'd need to re-minimize these higher scalings if we want to take these results seriously.

Case	Best Scales	s=0.97	s=0.99	s=0.995	s=1.00	s=1.005	s=1.01	s=1.02	s=1.03
Only Rerun Scalings (s=0.995, s=1.0, s=1.005)
All Scalings

I also extracted the black "best scale" points only for the "Only Rerun Case". Click here to expand that plot, felt unnecessary to include above

Plot

We were looking at histograms yesterday showing what fraction of models get selected for the two cases above. With the updated s=1.00 case, I think the distributions are looking much more reasonable, but again with an overall tilt toward s=1.005 models.
- Again, just posting the note here that in the "All Scalings" case, I am inclduing s=0.97, s=0.99, s=1.01, s=1.02, and s=1.03 which were run with the old triaxnnls binaries. In large part, these models are excluded and thus should not impact the results greatly. I suspect that this is why, despite the overall improvement in the landscapes, I am still running into the boundary issues with M/L in particular.

Only Rerun Scalings	All Scalings

And here's the result from GPR + dynesty on the new points plotted above. In the "Only Rerun Case", I'm only including the best scales selected from the s=0.995, s=1.00, and s=1.005 case. I am selecting from everything in the "All Scalings" case. Note that these use the same settings for GPR and dynesty (K=60, nu=1.5).

Case	Plot
Only Rerun Scalings
All Scalings

Following Up

Some initial takeaways:

After sending around some emails, we wanted to minimize a set of models closer to s=1.005 with the same set of binaries. I've gone ahead and done this, bringing the final grid of scalings to s = 0.99, 0.995, 1.0, 1.005, 1.01, 1.015, and 1.02.
The resulting chi2s are definitely different from the old ones as we know by now, and the results are as we expected: we see the scallop shapes in the 1d panels, and the chi2 as a function of scale factor is extremely smooth for these models. We can also see that the chi2 distributions are genearlly much closer, with a much smoother transition from badly-fit to well-fit.
There still seems to be an overall tilt in the landscapes when selecting the best chi2 models, with the s=1.01 and s=1.02 getting a much larger fraction of "best scaled" models than the s=1.00 or s=0.99 models.
- I think this is a sign that the mass parameters that our grid was initially cenetered around were definitely not optimal, and we're relying on the high scale factor models to narrow down our parameters. I think this asymmetry + lack of covereage in the well-fitting region is causing the dynesty + GPR to behave weirdly, because I'm still seeing the issues in the posteriors (though the issue does seem to have been improved quite a bit, at least the black hole is somewhat reliably constrained now!).

Plots

First, here are a few more test cases comparing the old and new binary chi2 values and they're virtually all identical, showing a shift of ~6 between the old and new chi2 values:

Scale	0.99	1.00	1.01	1.02

Here are two quick diagnostics plots that we've been looking at, but now updated to only the scalings with the new binaries. The models that I picked to show chi2 as a function of scale factor are random but representative of the smoothness between adjacent scales. We can also see that the chi2 distributions are much more similar. I've included the median chi2 of each distribution in the legend since it's a bit tough to see the positions of the distributions from the plot.

Some Representative Models	Chi2 Distributions

Here are the main results, showing the 1d chi2 vs. parameters for all the different scalings, a combined plot, and a plot showing ONLY the "best scaled" models which end up going into GPR + dynesty. Note that the characterstic "scallop" shapes are there, and the lowest chi2 model (shown in black) is correctly identified among these scallops:

Case	Best Scales	All Scales + Best Scales	s=0.99	s=0.995	s=1.00	s=1.005	s=1.01	s=1.015	s=1.02

And here's the distribution of which models are "best"; again, there seems to be an overall preference for the high scale factor models, and I think this is what is driving the issues in the posteriors. Our grid points were chosen to be centered around a totally different set of parameters, and so the current points we're working with are preferentially requiring high scale factor models which by its nature will be towards the edge of our grids. I suspect we just don't have great sampling of the 6D volume and GPR and dynesty is extrapolating somewhere strange.

Plot

Lastly, here's the result of running GPR + dynesty on the "best scaled" models selected the scalings above. We still see the lingering push to the upper M/L boundary (though it appears less serious), but the black hole recovery has substantially improved.
- Maybe the best option is to run a small set of models closer to where the minimum current is but a slightly higher mass to light range? This might fill in this range in a more sensible manner than just adding more scalings since that is limited by the initial set of points/the range chosen for the initial grids.

GPR + Dynesty

An additional follow up: Changing K/nu and Additional Rep. Models

Adding a few more results here following discussions over email. I've tried GPR and dynesty for a few different K's and nu's now with essentially no change in the paramaeters as a function of K, but nu=0.5 seems to fix our GPR issue almost entirely? By K = 60, I essentially have all ~2000 points in the fit, so further increasing this wouldn't do much. Changing nu to be greater than 1.5 (current default) in generally makes the fits "stiff" so I don't think that would help either. I've also uploaded the "representative models" plot but for the low-chi2 models.