ChatGPT guide - uchicago-bfi-gnlab/lab_manual GitHub Wiki

Works well

Debugging

Debugging code
- Simply copy and paste the code into the textbox and write after it: "debug this"
- gpt works better with more context so if you copy paste the theory section of the manuscript before asking coding questions, it’ll often be more accurate. You can copy paste the latex of the theory section; you don’t have to try to copy paste from pdf.
- ChatGPT will sometimes run simple Python scripts to help debug. Be wary of this output, as it isn't always useful when debugging complex code
Verifying code logic
- it does better the verification task just by doing the copy paste of the latex code and giving it the original prompt you would have otherwise (and then turn on a reasoning model). And to do even better, I find adding to the prompt instructions such as “refer to the math in the paper in your answer” quite useful.
- For scripts calling lots of libraries: the context window for most models we can access is very large, so feel free to liberally copy-paste large scripts (e.g. scripts with tons of functions/classes). This will greatly improve ChatGPT's understanding of the code's logic, but if the issue is too complex performance can be mixed (see below).
Interpreting errors, specifically for coding inside restricted data environments
- Since we can't copy out large chunks of code into GPT if working in restricted data environments, we can hand-type error messages. This has been helpful for pyspark errors in builds. Here is one example.
For all of the above, the "mini" reasoning models work particularly well. As of writing this, the best model i terms of speed-performance tradeoff is likely o4-mini-high

`ggplot`

Drafting plots
- Do this by a) describing the data b) describing the plot. It is important here to take the extra time to be precise, such as asking questions as if you were asking another human being on e.g. StackOverflow
Explaining how to add features you are unfamiliar with
- For example, you can ask "how do I place and arrow inside of a ggplot bar plot?"
- Alternatively, you can paste code in and write "add an arrow to this plot"

Issues with Git

Explaining and understanding GitHub messaging
- Prompt: "how do I do xxx in git?" or "what does xxx mean in git?" or "how do I fix xxx error in git?"
  - Example here for error

LaTeX tables

Translating tables on paper to a passable LaTeX table
- Performs poorly at the final polishing step

Brainstorming

Proposing explanations for economic phenomena
- GPT performs better than expected:example here

Answering basic questions

Programming questions
- How to translate log changes into percent changes (here)
Optimizing code for speed (fine)
- Pasting in code and prompting GPT to "make this run faster"
- Strong at data.table optimization, which will increase speed a lot for slow code
Functional questions
- Example here about understanding dentistry: this was a much more useful/organized response than Google gave
Economics questions
- This is relevant for a referee report or understanding something we are doing. For example, here is a prompt about understanding the difference between a notch and a kink in bunching estimators. Be careful here because it might hallucinate, but in this case it does a very nice job.
For all other and more complicated questions, you will have more success searching on Google and looking for StackOverflow results.

Cross Referencing Studies

see here

Works poorly

Explaining econometrics

GPT can explain a basic difference-in-difference model, but will struggle with anything more complex, often making mistakes.

Coding in SQL Athena, Pyspark

GPT returns code that does not run
- PN addendum: I have found that it is quite helpful in explaining and remediating errors from Pyspark code though.

Understanding data structure

GPT cannot compute something with lags and first differences in a simple way

Debugging inside Chase VDI

The environment is proprietary so GPT is unable to answer basic questions about where things are and how things work

Translating from one coding language to another

Often GPT will say the file is too long or that it needs more information because the code types do not easily map

Answering involved coding questions

GPT struggles to answer longer questions involving larger segments of code, and it's very hit-or-miss whether it produces correct output.
If it produces a long block of code on its own, you must check every single line. It has a high error rate on long blocks of code.

Data documentation

Can't answer questions about IPUMS or other public data sources

Plot digitization

ChatGPT is not able to look at an image of a plot and give you the numbers that correspond to the data points.