Future Work - LucPrestin/Hidden-Modularity GitHub Wiki

Integrate Knowledge Into Squeak

The current workflow has the issue that the gained insights remain here, in the wiki, and do not benefit squeak or its users directly. This problem can be solved in multiple ways. Integration methods in existing systems are listed below. For new tooling ideas, have a look at the respective wiki page.

Class Comments

The first idea would be to utilize the documentation features that squeak already has, namely class comments. We could trace a given test suite and automatically extract information like which classes are close to each other (#27) or which interface methods are used mainly. All these insights could then be molded into a auto-generated section of the respective class comments.

For the MorphExtension class it could look like this:

MorphExtension provides access to extra instance state that is not required in most simple morphs.
This allows simple morphs to remain relatively lightweight while still admitting more complex structures as necessary.
The otherProperties field takes this policy to the extreme of allowing any number of additional named attributes, albeit at a certain cost in speed and space.

--- start hidden-modularity section ---

According to these scenarios:
- HMScenarioManager>>windowLayouting
- HMScenarioManager>>objectExplorerOpening
- HMScenarioManager>>morphOpening
- HMScenarioManager>>codeBrowserOpening
- HMScenarioManager>>codeBrowserClosingWithClickEvent

The mainly used interface of the MorphExtension is (in decreasing order)
- accessing
- initialization

Classes that often use the MorphExtension are (in decreasing order):
- RectangleMorph
- Scrollbar
- PluggableButtonMorphPlus
- PluggableSystemWindow
- PluggableListMorphPlus
- ...

Classes that were often used by the MorphExtension are (in decreasing order):
- IdentityDictionary

--- end hidden-modularity section ---

Improve the Auto-Completion

The knowledge of who often sent which message to whom can also be used to improve the auto-completion.

For this, we could trace the test suits of each package. If written well, these give a good indicator of all the functionality a package offers. We would then construct a graph whose nodes are classes and the edges the sent and received methods. The edges would be weighted by how many times this method was sent or received. The graph could then be used to give better answers to what is most likely to be used next.

For example if we have a morph already typed into the workspace, we would then look at all outgoing edges of the morph class, rank them by edge weight and display the first n in the auto-completion.

Improve the Dependency Browser

The knowledge of who often sent which message to whom can also be used to improve the dependency browser.

As with the auto-completion, we would trace each test suite. The graph, however, would be constructed by which class sent any message to another. Which message and how often is irrelevant for this.

The outgoing edges of each node would then be added to the dependencies of the node's class.

Pull the Visualization Into Squeak

As mentioned on the workflow page, we have some system borders that we have to cross during the workflow. Especially the border between squeak, where we trace dynamic behavior and d3, where we display the data has major drawbacks:

  1. It lengthens the feedback loop
  2. It disconnects the visualization from the data it is based on

The first one is mainly an inconvenience. Sure it makes it harder to work on it, but in the end we can deal with it. The second one is where we actually get into problems. When creating the graphs we need to aggregate on some level. The nature of aggregation is that we actively remove detailed data. However, this data is useful when we want to look at a sub-structure in the large graph. Yet, it is simply no longer present in the visualization.

Bringing the visualization back to squeak would allow us to store for each aggregation what data it is based on and thus still have the objects at hand, even though it looks like they are aggregated away. The objects could then be further explored by utilizing the full power of squeak as a live programming system.

Use Different Insight Gaining Methods on Dynamic Behavior

During this project we focused on the communication structure within a trace. We could also use some different approaches to get further insights out of the dynamic behavior. Ideas include:

Statistical Approaches

The idea here is to extract key indicators of the system from the traces.

For example, we could calculate for each class how big the proportion of each part of the interface (the method categories) is compared to the total interface. This could be used to make statements such as "The main use case of this class is to compare strings".

Another metric could be the size of the result of the traces. This would make less of a statement about squeak and more about the specific scenarios. Particularly large traces can (but do not have to) indicate a more complex scenario.

Utilizing the Chronology

At the moment we are only looking at the communication as a whole. It might also be interesting to look at a trace within certain time slices and compare them with each other (#39).

Use Different (Semi- or structured) Data

Ultimately, the goal of this project was to discover structures that cross-cut the static structure (like the inheritance tree). Using dynamic behavior is only one way to do this. In the future, different other data sources could be used as well.

Authorship

Squeak has an authorship system in which at least an abbreviation, if not the full name, of the author is stored to each code artifact. Thus a metric could be "Which code artifacts were produced by the same author?".

The assumption here is that the same author has certain interests and would mainly work on these topics. Of course it does not cover one author working on multiple projects, but it could be an easy to implement starting point to get at least a rough idea.

Git History

To have a finer granularity, we can enrich the authorship information with the git history. The idea here is that code artifacts that were made or changed in the same commit are related to each other.

This metric could also be extended. For example we could look at windows of n commits or windows of a certain difference in the timestamps.

Tokenized Comments

Most classes and some methods are enriched through comments. These can be used as a metric too. Before bringing in the heavy artillery like NLP, we could also start with the tokenization of comments. Here the idea is to split the comments at word boundaries. Then we can create a histogram of all the words in the comment. Histograms that look similar could indicate a similarity in the underlying classes / methods.