Bisecting Jsoup - bootstraponline/meta GitHub Wiki

Using git bisect to identify which commit introduced a bug.

jsoup-1.5.2 is a known good tag. Check out jsoup-1.5.2 and verify that the exit code returned is zero (good).

git checkout jsoup-1.5.2 ;\
./build.sh

Repeat the same with jsoup-1.7.1 to verify it currently fails on the latest tag.

git checkout jsoup-1.7.1 ;\
./build.sh

With both a good and bad commit found, we're ready to start bisecting.

git bisect start ;\
git bisect good jsoup-1.5.2 ;\
git bisect bad jsoup-1.7.1

The run command must be ./build.sh and not build.sh or the build script will not be found.

git bisect run ./build.sh

After bisecting, remember to reset.

git bisect reset

Bisecting succeeded.

8749726a79c22451b1f01b14fb2137f734e926b4 is the first bad commit
commit 8749726a79c22451b1f01b14fb2137f734e926b4
Author: Jonathan Hedley <[email protected]>
Date:   Tue May 10 22:13:23 2011 +1000

    Reimplementation of parser and tokeniser, to make jsoup a HTML5 conformat parser, against the
    http://whatwg.org/html spec.

:100644 100644 bffc3f44e5d6209de7b3776fbb92eeda79c9c1f4 d7804d0c0e6360521774e7c4688a767ee9b613a9 M	CHANGES
:040000 040000 d4a4bba4819036fb82482124a719f10165334a2c 677c41662eb3a6619fbfd6687a1fa0bd80e2209f M	src
bisect run success

Let's find out which tag includes this commit.

$ git tag --contains 8749726a79c22451b1f01b14fb2137f734e926b4
jsoup-1.6.2
jsoup-1.6.3
jsoup-1.7.1

On the master branch.

$ git branch --contains 8749726a79c22451b1f01b14fb2137f734e926b4
master

The commit can be referenced based on the 1.6.2 tag location. 76 commits behind.

$ git describe --contains 8749726a79c22451b1f01b14fb2137f734e926b4
jsoup-1.6.2~76
$ git show jsoup-1.6.2~76

Has the commit been cherry picked? If not then there'll be no output for the following cmd.

$ git cherry -v 8749726a79c22451b1f01b14fb2137f734e926b4
+ a28fb8eae225ac16279a6086420f310145bf6100 Added test to verify that solidus as end of unquoted attribute in tag is handled as part of attribute, and not a self-closing tag, which was the old behaviour of jsoup.
...

The commit is in 1.6.0 even though git contains --tags doesn't say so. Now that we know it has been cherry picked, let's look for that second commit.

$ git log --grep="Reimplementation of parser and tokeniser" --all
commit 8749726a79c22451b1f01b14fb2137f734e926b4
Author: Jonathan Hedley <[email protected]>
Date:   Tue May 10 22:13:23 2011 +1000

    Reimplementation of parser and tokeniser, to make jsoup a HTML5 conformat parser, against the
    http://whatwg.org/html spec.

commit 45a3cc68a2d44b9be2cfac65d075f399060cf65b
Author: Jonathan Hedley <[email protected]>
Date:   Tue May 10 22:13:23 2011 +1000

    Reimplementation of parser and tokeniser, to make jsoup a HTML5 conformat parser, against the
    http://whatwg.org/html spec.

Finally, we have our answer that the commit is indeed present in 1.6.0.

git tag --contains 45a3cc68a2d44b9be2cfac65d075f399060cf65b
jsoup-1.6.0
jsoup-1.6.1

You can't find 45a3cc68a2d44b9be2cfac65d075f399060cf65b from 8749726a79c22451b1f01b14fb2137f734e926b4 without using grep. A great reason to avoid cherry-pick, read more on stackoverflow

JSoupTest.java

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class JSoupTest {

  // HTML code from dma https://github.com/jhy/jsoup/issues/249
  public static String html() {
    StringBuilder _builder = new StringBuilder();
    _builder.append("<html>");
    _builder.append("\n");
    _builder.append("<body>  ");
    _builder.append("\n");
    _builder.append("  ");
    _builder.append("<table>");
    _builder.append("\n");
    _builder.append("      ");
    _builder.append("<form action=\"/hello.php\" method=\"post\">");
    _builder.append("\n");
    _builder.append("      ");
    _builder.append("<tr><td>User:</td><td> <input type=\"text\" name=\"user\" /></td></tr>");
    _builder.append("\n");
    _builder.append("      ");
    _builder.append("<tr><td>Password:</td><td> <input type=\"password\" name=\"pass\" /></td></tr>");
    _builder.append("\n");
    _builder.append("      ");
    _builder.append("<tr><td><input type=\"submit\" value=\"login\" /></td></tr>");
    _builder.append("\n");
    _builder.append("   ");
    _builder.append("</form>");
    _builder.append("\n");
    _builder.append("  ");
    _builder.append("</table>");
    _builder.append("\n");
    _builder.append("</body>");
    _builder.append("\n");
    _builder.append("</html>");
    _builder.append("\n");
    return _builder.toString();
  }

  // Modified version of dma's test code https://github.com/jhy/jsoup/issues/249
  public static void main(final String[] args) throws IOException {
    Document doc = Jsoup.parse(html());
    Elements forms = doc.select("form");
    System.out.println(forms.toString());
    int exitCode = forms.toString().contains("<form action=\"/hello.php\" method=\"post\"></form>") ? 1 : 0;
    System.out.println(exitCode + " exit code");
    System.exit(exitCode);
  }
}

Build.sh

# clean
mvn clean > /dev/null 2>&1

# build jsoup and skip tests
mvn package --quiet -Dmaven.test.skip=true > /dev/null 2>&1

# Seperate build failures from code producing invalid results.
EXIT=$?

# If mvn package did not exit with 0
if [ $EXIT -ne 0 ];
then
  # log build failure and exit as 0
  # this prevents git bisect from marking
  # the commit as failed
  echo $EXIT " mvn build failure"
  exit 0
fi

# remove old jar if it exists
if [ -f ./target/jsoup.jar ];
then
    rm ./target/jsoup.jar
fi

# rename jar
# force move to overwrite
mv -f ./target/jsoup*.jar ./target/jsoup.jar

# run test
javac -classpath ./target/jsoup.jar JSoupTest.java
java -classpath .:./target/jsoup.jar JSoupTest

# Git wil use the exit code of the last command.
# 0 = success
⚠️ **GitHub.com Fallback** ⚠️