Project ideas - camfort/camfort GitHub Wiki

This page lists project ideas for undergraduate and masters students. For University of Cambridge students please contact either Mistral Contrastin or Dr Andrew Rice; for University of Kent students please contact Dominic Orchard. If you are not a Cambridge or Kent student, but interested in extending CamFort with any of these ideas please open an issue and we can discuss further.

Undergraduate project ideas

These projects are interesting ideas that have some precedent one way or another. They are selected mainly for ease of evaluation.

Index data files

At the moment CamFort works on a whole code base at a time. This is plainly not scalable to larger code bases. What would be better would be to process one source file at a time. However, there are dependencies between files e.g. imports through modules. It would be better to parse one file at a time and output the results of the analysis to a file on disk. So you build one of these metadata files for each on your analyse and when the analysis process needs information from other source files it can pick it out of the metadata. the project would be to design the binary format for the metadata file so as to optimise performance, you'd need various indexes etc. and you'd need to think about representation formats. Then you could evaluate it by measuring performance of different layout tradeoffs that you made.

Code base exploration

Use the analyses in CamFort to build a code base exploration tool that lets developers see how variables are used and find dependencies etc. You would be thinking of ways to surface the results of the various analysis passes. I'm thinking something like the kind of information that Eclipse can give for java. This would probably be evaluated with a user study.

Preprocessor parsing

We have some code in our corpus which uses the C preprocessor. The project would be to extend CamFort to be able to to parsing and analysis in the presence of preprocessor statements. Lots of preprocessor use should be relatively easy to parse but sometimes people do odd things like

#define BEGIN if (...) {

then the preprocessor fragments are not welformed trees. You could start with just detecting and ignoring these cases and then progress on to dealing with them. There's other stuff here too like automatically determining valid configuration combinations for #defined constants etc. one way to evaluate this would be to use our corpus and do a study of how much of the preprocessor you need to implement to parse some percentage of the corpus.

Masters level (e.g., Part III/ACS or MSc)

These are the projects better suited for research, as they more open-ended.

Classification of annotation based verification techniques

Verification through annotations is a broad generalisation of the subject. Annotations widely differ from each other in their expressive power, terseness, and scope. CamFort for example aims to use lightweight specifications meaning one-liner specifications qualifying some limited property of the source code. However, this is a loose definition and there is not a metric for ranking various annotations. This project would aim to study various methods of verification, compare, and contrast them to understand tradeoffs involved in each method and investigate possible useful metrics for various annotation schemes. Units of measure verification [^0] in CamFort is a good example for lightweight verifications, while Sylvie Boldo's verification on floating point averages [^1] is an example to a comprehensive and aggressive one.

[^0] Dominic A. Orchard, Andrew C. Rice, Oleg Oshmyan. Evolving Fortran types with inferred units-of-measure. J. Comput. Science 9: 156-162 (2015)

[^1] Sylvie Boldo. Formal Verification of Programs Computing the Floating-Point Average. In Michael Butler, Sylvain Conchon, and Fatiha Zaïdi, editors, 17th International Conference on Formal Engineering Methods, volume 9407 of Lecture Notes in Computer Science, pages 17-32, Paris, France, November 2015. Springer International Publishing.