Roles of Maven, Eclipse and Git in PDQ - ProofDrivenQuerying/pdq GitHub Wiki

IDEs such as Eclipse or IntelliJ

Eclipse and IntelliJ are IDEs (integrated development environments): advanced text editors that give you many features that make developing software easier, such as:

  1. Syntax highlighting
  2. Static analysis (to catch bugs in your code as you are writing it)
  3. Tool integrations such as for Git and Maven (see below)
  4. Advanced refactoring
  5. Many other plugins

Note that nothing in PDQ depends on an IDE. You can code perfectly well in your favourite text editor.

Importing the projects to Eclipse

The following instructions explain how to import PDQ into Eclipse, which is very similar to the process for importing into other IDEs such as IntelliJ.

From Eclipse menu bar, open File > Import. Then, on the popup window, choose Git > Projects from Git

From the following window, we can either import from a local repository or directly from GitHub. Here we describe the former.

Choose Existing local repository, then Add. Now you get to select where your repository is stored locally. Tick the box corresponding to /path/to/git/folder/pdq/.git, press Finish, then select pdq.

The next window lets you decide which level of the project hierarchy you want to import into Eclipse. Choose the top level (Working tree...), and Import as general project.

The next window will ask how you want to name the project in Eclipse. Leave it as pdq and press Finish.

Now go back to the Eclipse menu bar's File > Import, and choose Maven > Existing Maven Projects.

In the next windows, set the Root directory to /path/to/git/folder/pdq, then check all PDQ sub-projects.

Voila. You are done.

Git

Git is a version control system that tracks changes in the source code of PDQ. It is perfectly possible to download PDQ as a zip file and use it without ever interacting with Git. However, Git is required for developers as it is the mechanism by which new code gets pushed to the GitHub repository.

Making changes on branches

If you are using Git, from now on you never have to create additional "copies" of the project anywhere. You can create and switch branches, either from Eclipse or from the command line. Note that, even if you switch branches from command line (i.e. outside Eclipse), Eclipse will recognise and update the branch display near each project (although this often takes a few seconds).

When switching branches from Eclipse, you have to right click on any of the projects, and tell which branch you want to switch to. This will switch branch for all the projects at the same time, even if you right-clicked on a single project.

Maven

Maven is a build automation tool for Java projects and one of its main features is dependency management. Again, it is entirely possible to use PDQ without Maven if you are willing to manage the dependencies yourself. If you use an IDE you will likely get all the benefits of using Maven without realising it: the IDE will interact with Maven on your behalf.

PDQ developers use Maven to manage all dependencies (the other Java libraries on which PDQ relies). This means that there is no compiled library stored in git nor in individual PDQ sub-projects. If you see one, post an issue about it: it should not be there.

How does Maven work (in PDQ)

  1. Each sub-project has a pom.xml file at its root which contain various information about that sub-project including the version number, plugins, dependencies (internal and external), and build instructions (what to build, how and where). If you are familiar with Ant, think of it as the build.xml.

  2. In addition, there is a pom.xml file at the root of the tree hierarchy, under pdq/, which instructs that all these sub-projects form a single entity: PDQ. When you run mvn install on the top pom.xml, it will simply run mvn install on all pom.xml of the sub-projects, in the correct order.

  3. When you run mvn install on a sub-project, it does the following:

    It looks at the dependencies listed in the pom. These are simply listed as IDs, e.g. the Guava library shows as:

    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
    </dependency>

    Maven checks in its local repository (a directory under ~/.m2/) if the corresponding JAR is present. If not, it will download it from a remote server, and store it locally there. On subsequent builds, it will never download guava again, unless for instance, you decide to use a newer version.

    Once all the dependencies are available locally, the sources are compiled, the unit tests are run, and a JAR is put under the project's target/ directory with the version specified in the pom. When you went to have a newer version of PDQ, you update the pom, and JARs will be built under a new name, in the target/.

Maven ID management

In addition to version, Maven needs certain identifiers to name projects and sub-projects:

  • Group Id: is an ID that allows gathering components under a single umbrella. You can think of is as a namespace. The one we use in PDQ is uk.ac.ox.cs.pdq
  • Artifact Id: is an ID for individual component, typically projects and sub-projects. In our case, the top level pdq project has Artifact Id pdq, and each individual sub-project pdq-<sub-project-name>

These actually never need to be changed, but they are important as they are used by Maven for (internal) dependency management. They will also be used as global identifier, if we want to have PDQ available on some public repository one day.

Editing pom.xml

For routine development of PDQ it should be rare to make changes to the pom.xml, but the following would require changes to be made:

  • Adding a new dependency to a sub project:

    If the dependency is already used in PDQ, but not in the sub project, it should be specified in the <dependencies> block, e.g.

    <!-- In pdq/sub-project/pom.xml -->
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
    </dependency>

    If the dependency is new to PDQ, you must also specify it in the base pom.xml in the <dependencyManagement> block, with a version number, as well as in the sub project's pom.xml without a version number (as above), e.g.:

    <!-- In pdq/pom.xml -->
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>16.0.1</version>
    </dependency>
  • Changing a plugin or dependency version. All version information is kept in the top level pom.xml and only requires changing there.

  • Changing the version number of PDQ. Every pom.xml including at the top level and in sub projects declares a version number. These should (probably) all be changed at once.

  • Refactoring the main class entry point for a sub project. If, for instance, you changed PdqRegression to RegressionPdq you would need to update the pom.xml in pdq/regression and update:

    <mainClass>uk.ac.ox.cs.pdq.regression.RegressionPdq</mainClass>

External resources

Importing a maven project into eclipse from git

If you get an error installing the m2e-git connector, a possible way to correct it is here

⚠️ **GitHub.com Fallback** ⚠️