Writing a CalcJobImporter plugin - aiidateam/aiida-core GitHub Wiki

The CalcJobImporter class is designed to allow a calculation job, that was completed without AiiDA, to be imported into an existing AiiDA database. The problem and the design of the solution have been described in AEP 004. Please refer to that AEP for more detailed information. In this document, we will give concrete instructions on how to implement the CalcJobImporter for a particular CalcJob plugin.

Quick overview

The process of importing a completed calculation job works as follows. An implementation of the CalcJobImporter (which is specific to a corresponding CalcJob plugin) parses the files of the completed job into a dictionary of nodes that, when used as inputs to run an actual CalcJob, would result in more or less the same input files. These parsed inputs, together with the folder of the completed job itself, are then actually run as a normal job would be ran. The presence of the folder with the completed results will cause the engine to realize it concerns an import job and import the job into the provenance graph. In code (taking the ArithmeticAddCalculation plugin as an example), this looks like the following:

computer = load_computer('localhost')
remote_data = RemoteData('/absolute/path/to/job', computer=computer)

inputs = ArithmeticAddCalculation.get_importer().parse_remote_data(remote_data)  # This is functionality that will have to be provided by the specific calculation job plugin
inputs['remote_folder'] = remote_data
results, node = run.get_node(CalcJobPlugin, **inputs)

assert node.get_attribute('imported')

The ArithmeticAddCalcJobImporter class, takes the remote_data argument, which contains the input and output files of the completed job, and parses the input files to reconstruct the AiiDA input nodes that the ArithmeticAddCalculation plugin would have taken.

Implementing the CalcJobImporter

To provide an import mechanism for any CalcJob plugin, one should implement the CalcJobImporter class, which can be imported from the aiida.engine module. All that one has to implement is the parse_remote_data method, whose signature looks as follows.

class CalcJobImporter:

    @staticmethod
    @abstractmethod
    def parse_remote_data(remote_data: RemoteData, **kwargs) -> Dict[str, Union[Node, Dict]]:
        """Parse the input nodes from the files in the provided ``RemoteData``.

        :param remote_data: the remote data node containing the raw input files.
        :param kwargs: additional keyword arguments to control the parsing process.
        :returns: a dictionary with the parsed inputs nodes that match the input spec of the associated ``CalcJob``.
        """

In essence, this method has to do the exact inverse of what the CalcJob.prepare_for_submission does. The latter takes a set of AiiDA input nodes, and transforms it into input files. The CalcJobImporter.parse_remote_data method has to parse these input files and reconstruct the corresponding input nodes. Note that the input files of a job that is to be imported wasn't actually written by the AiiDA plugin, so a 100% correct inversion may not always be possible, but the importer should try to reconstruct the input nodes from the input files as good as possible. For an example, you can have a look at the implementation of the ArithmeticAddCalcJobImporter included within aiida-core.

Registering the importer

As with other AiiDA plugins, an importer can be registered with an entry point. Specifically, importers should be registered in the aiida.calculations.importers entry point group, for example:

"aiida.calculations.importers": [
    "core.arithmetic.add = aiida.calculations.importers.arithmetic.add:ArithmeticAddCalculationImporter"
]

It is suggested to take the exact same entry point name of the corresponding CalcJob plugin. That is to say, in the example above, the ArithmeticAddCalculation calculation job plugin has the entry point name core.arithmetic.add and so the same name was used for the importer plugin. Of course this should be the choice for the "main" importer implementation. It is perfectly fine and possible to implement multiple importers for a single calculation job plugin. The advantage of using the same entry point name, is that it makes it easy to retrieve the importer from the calculation job plugin, e.g.:

importer = ArithmeticAddCalculation.get_importer()

If an importer is registered with the same entry point name, the method will automatically find it and return it. Alternatively, the entry point name can be passed directly:

importer = ArithmeticAddCalculation.get_importer('core.arithmetic.add.custom')

Of course, get_importer is a simple wrapper around the CalcJobImporterFactory and one can always use that directly:

from aiida.plugins import CalcJobImporterFactory
importer = CalcJobImporterFactory('core.arithmetic.add')