Auto conversion investigation - soartech/jsoar GitHub Wiki

As part of the SoarTech jSoar IRAD effort, we're looking into possible automation support for parts of the process. Bob Bechtel suggested using a source-to-source transformation tool (like TXL) to implement translations that would take program structure into account, at least when compared to simpler text-pattern based find-and-replace (e.g., sed). The discussion was broader than this, considering possibilities like a library of conversions, user interface issues like highlighting candidate code snippets for conversion, learning possible conversions by examining already translated code sections, and introducing comments into the code to describe what had been done or to make suggestions about what manual editing might be possible or necessary.

Following the initial conversation, Bob Marinier had the following suggestions:

  • Gather a list of code changes that are typically required. Possible sources include looking at code that has already been ported, perform knowledge acquisition sessions with people who have experience porting code
    • Recent code that has been touched includes
      • epmem (not finished),
      • cdps stuff (see cdps port notes.txt),
      • wma stuff (see wma porting notes.txt) and
      • chunking cleanup (related to the cdps stuff, see chunking cleanup.txt)
    • Note that there are at least two classes of porting:
      1. porting entire files (e.g., the wma and epmem stuff) and
      2. porting updates to individual functions (e.g., the cdps / chunking stuff). Of course, even when porting entire files, there are still tendrils from those files that invade other existing code, so there is overlap between these.
  • When applying transformations to the code, keeping a list of annotations about why each change was made/suggested can help the user understand what the tool did
  • Ideally, any solution will be modular so that translation units can easily be added / removed
  • In most cases, actually translating the code will be too hard, but the tool can still insert comments about the kinds of code changes that might be required. For this to be useful, the suggestions have to be pretty targeted (if it always suggests doing lots of different things, the user will end up ignoring the suggestions)

Bob Marinier also provided a list of "porting patterns" based on his experience:

  • -> becomes . (find/replace works well for this)
  • agent param is passed into most csoar functions, but is usually unnecessary in jsoar
  • when agent is necessary, use Adaptable interface instead (see Adaptable notes, currently on basecamp site)
  • jsoar tries to group related csoar functions into classes and make as much stuff private or package private as possible (this is one of the reasons the agent param is usually not necessary -- the stuff from the agent struct gets moved into the class, and thus the members already have access to them)
  • make params final if possible
  • csoar tends to define all vars at top of func. Jsoar prefers to define them closer to where they are used
  • csoar tends to reuse the same var many times (e.g., in each loop, use the same counter). jsoar tends to define new vars with limited scope (e.g., a different counter for each loop that only exists for that loop)
  • csoar often uses for loops to iterate over all items in some collection. In jsoar, use a foreach-style loop, e.g., for(Wme w : myWmes)
  • csoar often uses char arrays for strings; in jsoar use String objects
  • c++ map is usually a java hashmap, but sometimes it's a treemap (if the order matters)
  • jsoar prefers to declare variables with abstract types (e.g., List) but sometimes members are needed from more concrete classes
    • e.g., when using a List, keep in mind that add is an O(n) operation (it adds it to the end) while push is an O(1) operation (it adds it to the front). But push is only defined for LinkedList, so you need to use the more concrete type to use push.
      • this specific case has actually been fixed in java 7, but for now we're still targeting java 6
  • in jsoar, reference counts have been eliminated from Symbol and Wme, but still exist for other classes, like Preference
  • most intrusive lists from csoar have been maintained in jsoar (it turns out they really are faster)
  • sometimes csoar passes objects by reference; in jsoar, use the ByRef<> wrapper class if really need pass by reference semantics
  • jsoar usually maintains csoar variable/class/structure names, and always annotates (in comments) where functions came from (e.g., wma.cpp:207:wma_init)
  • jsoar does not maintain csoar typedefs; basic types are not renamed/wrapped, and structs/classes use whichever name is best (or possibly a new, java-style name for the major components)
  • most stuff from a single file in csoar will go into the same class or family of classes, but sometimes there will be a function that is primarily operating on some other structure. This often becomes obvious when you try to port the function and find out that it wants access to a bunch of private stuff in some other class. In these cases, that function probably belongs in the other class. (Of course, this is a grey area; if it only needs access to one member, it might make sense to make that member public or add a getter. And if it needs access to a bunch of private stuff from more than one class, then more thought is required.)

Chris Kawatsu had some specific examples:

  • Translate C++ class field to java:

    (*w_p)->value->id.epmem_valid

    to

    wme.value.asIdentifier().epmem_valid

    There is no id field in wme.value in Jsoar. The value field has to be cast to Identifier type using asIdentifier(). A tool that remembers (wme type object)->value.id.epmem_valid has been mapped to (Java WME object).value.asIdentifier().epmem_valid would be very useful. (There may be multiple choices, the tool should provide the user with a list of substitutions they have used in the past.)

  • Another class field example:

    my_id_repo =& (*(*my_agent->epmem_id_repository)[ parent_id ])[ new_id_reservation->my_hash ];

    to

    my_id_repo = epmem_id_repository.get(parent_id).get(new_id_reservation.my_hash);

    In this case epmem_id_repository is a class member in Java so the my_agent is dropped. The epmem_id_repository is a HashMap in Java instead of a Vector in C++.

  • If statements example:

    if ( !(*w_p)->value->id.smem_lti )

    C++ allows casting long to boolean while Java does not, so in Java this becomes:

    if ( !(wmeValueId.smem_lti != 0) )

    or

    if (wmeValueId.smem_lti == 0)

    Its easy to make a mistake and have the wrong number of negations.

Additional notes gathered during the process of porting Semantic and Episodic Memory:

  • When making small inner classes to represent things like pairs, they should almost always be static inner classes. Non static classes keep a reference to an instance of the outer class, which can result in odd leaks or errors.
  • Keep an eye out for primitives passed as references. This looks like: void foobar(int& count)
    Jsoar includes a ByRef (org.jsoar.util.ByRef) class to help handle this pattern.
  • For handling C compile time flags that include or omit logging, check the Logger.isXEnabled() functions.
  • In some cases, it is less efficient to add a field to a data structure than to maintain a map from the struct to the value. This is particularly true if the field is often null, or it is only used in a one class.

The Adaptables Pattern:

Credit: Bob Marinier

In JSoar, the Agent class holds various objects that need to work together, e.g., RecognitionMemory, Decider, Trace, EpisodicMemory, etc. (This is analogous to CSoar's use of the agent structure.)

In CSoar, in order to give various functions access to these objects, a pointer to the agent structure is passed in. This makes it very difficult to test components in isolation, since you need an entire agent to call most interesting functions.

JSoar works around this limitation using the Adaptable pattern. Instead of passing around an Agent object, an Adaptable object is passed around (it is usually called context). Static methods in the Adaptables class can be called to extract whatever is wanted from the Adaptable object. For example, if you want a Trace object, you do this:

Trace trace = Adaptables.adapt(context, Trace.class);

The Agent class extends the AbstractAdaptable class, which implements the Adaptable interface. Thus, you can pass in a full Agent instance wherever an Adaptable is needed. However, for unit testing purposes, you can also construct a minimal Adaptable instance that only contains the parts needed by whatever you are testing. Thus, you avoid needing a full agent for testing. For example, the DefaultSemanticMemory unit tests construct an Adaptable that just contains a SymbolFactoryImpl and PropertyManager like this:

context = AdaptableContainer.from(new SymbolFactoryImpl(), new PropertyManager());

(where context is an AdaptableContainer, which extends AbstractAdaptable just like the Agent class).

Internally, when you call Adaptables.adapt, it is basically looping over all of the registered classes to find the one you want, and thus takes time linear in the number of registered classes. Thus, you will sometimes see classes in JSoar that cache the classes they care about in local members (e.g., they do the lookup in their constructors). However, note that the number of classes is small, and so even without caching, you are unlikely to see any noticeable performance hit (and thus you also see lots of places that don't bother caching).

Finally, in addition to Adaptables.adapt, there is also Adaptables.require. The difference is that adapt will return a null pointer if the class you ask for doesn't exist in the Adaptable, whereas require will throw an exception. It seems that if the class you want really is required, then you should call require instead of adapt. But there seems to have been some sloppiness here, so that most places call adapt instead (I'm guilty of this, too). So anyway, try to use the right one.

⚠️ **GitHub.com Fallback** ⚠️