Questions ESM2 - ufal/NPFL095 GitHub Wiki

Proteins are the building blocks of life. Each protein is a chain of individual amino acids, of which there are 20. A central problem of biology is to take this amino acid sequence and predict the 3D shape of the corresponding protein. This paper tries to attack this problem with NLP. Can you describe how they model the NLP task? (What method, from a paper that was presented in class, do they use? What is the input and output? What metric is used?)
"Attention is all you need" yielded three architectures that are commonly used: Encoder-only, Decoder-only, and Encoder-Decoder. Protein language models, including this paper, tend to use only one approach one of those three. Which one? Any guess on why? Any ideas when the different approaches might be used (in protein language models)?
Even though the domain is completely different from NLP, the problem shares an important assumption with NLP. What is it? (You can find multiple parallels:))
In the context of protein structure prediction, the metric of choice tends to be RMSD (root mean square deviation). To compute this metric, you compare each predicted point in the structure to the true position (found out experimentally through X-rays and the like), and report the average distance. In Figure 1F, they report this metric, in the hope of proving/showing a property of LLMs. Can you name the property? Did the authors convince you?

HINT (the image below, same thing from a different context):
HINT2 (please don't look unless desperate) https://arxiv.org/pdf/2206.07682.pdf

BONUS: If you read into the details, you will note that the model doesn't predict the structure itself. Rather a module takes in the output of the model and computes the 3D structure. Do you think an end2end approach will be feasible, where a language model predicts the structure directly, within the near future? Why, why not?