Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException thrown for invalid ending XML string when using JDK default Stax XML parser #618

Closed
arthurscchan opened this issue Dec 4, 2023 · 2 comments
Labels
2.17 Issues planned at earliest for 2.17
Milestone

Comments

@arthurscchan
Copy link
Contributor

arthurscchan commented Dec 4, 2023

In XmlTokenStream::_collectUntilTag method, there is an infinite while loop to loop through the provided XML string (through _xmlReader) character by character. The loop only exits by return statements when a valid character (XMLStreamConstants.START_ELEMENT, XMLStreamConstants.END_ELEMENT or XMLStreamConstants.END_DOCUMENT) is found. If the provided XML string is invalid without those characters, it will continue to loop through the whole XML String and eventually throw ArrayIndexOutOfBoundsException when _xmlReader has no more characters that can be returned by the next() method. Besides, there are also some other methods depends on those END_ELEMENT to stop looping out-of-bound. The suggested fix could be simply wrapping the ArrayIndexOutOfBoundsException with the JsonParseException.

        CharSequence chars = null;
        while (true) {
            switch (_xmlReader.next()) {
            case XMLStreamConstants.START_ELEMENT:
                return (chars == null) ? "" : chars.toString();

            case XMLStreamConstants.END_ELEMENT:
            case XMLStreamConstants.END_DOCUMENT:
                return (chars == null) ? "" : chars.toString();

            // note: SPACE is ignorable (and seldom seen), not to be included
            case XMLStreamConstants.CHARACTERS:
            case XMLStreamConstants.CDATA:
                // 17-Jul-2017, tatu: as per [dataformat-xml#236], need to try to...
                {
                    String str = _getText(_xmlReader);
                    if (chars == null) {
                        chars = str;
                    } else  {
                        if (chars instanceof String) {
                            chars = new StringBuilder(chars);
                        }
                        ((StringBuilder)chars).append(str);
                    }
                }
                break;
            default:
                // any other type (proc instr, comment etc) is just ignored
            }
        }
    }

We found this issue by OSS-Fuzz and it is reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=64655, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=64667 and https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=64659.

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 4, 2023

Just to be pedantic: loop is not character-by-character but token-by-token.
Tokens refer to things like start element (like <tag>) and text segments (CDATA).

But I agree with the fix wrt checking that loop ends if there are no more tokens available.

@cowtowncoder
Copy link
Member

After cleaning things up, including removal of try-catch block, this is not reproducible by given test cases.

Looking at https://oss-fuzz.com/testcase-detail/5439718040666112, however, it looks like test is running using JDK Stax implementation (sjsxp) which is not what this module is configured to use -- Woodstox.
The problem with this stack trace, however, is that SJSXP code is one throwing exception, not Jackson.
As such I am tempted to consider this a problem with OSS-Fuzz set up: it probably should include Woodstox as dependency. No one should really default to SJSXP as it's known to have remaining bugs, not be nearly as XML compliant as Woodstox, and is slower as well.

Now: it'd be nice (but IMO not required) to also work on SJSXP to some degree. I'll see if I can tease out reproduction: possibly by temporarily removing Woodstox dependency. But I don't know yet how to be able to have alternate CI run with such set up (I know how to override dependency versions but not sure how to exclude).

@cowtowncoder cowtowncoder changed the title Possible ArrayIndexOutOfBoundsException thrown for invalid ending XML string ArrayIndexOutOfBoundsException thrown for invalid ending XML string when using JDK default Stax XML parser Dec 6, 2023
@cowtowncoder cowtowncoder added the 2.17 Issues planned at earliest for 2.17 label Dec 8, 2023
@cowtowncoder cowtowncoder added this to the 2.17.0 milestone Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.17 Issues planned at earliest for 2.17
Projects
None yet
Development

No branches or pull requests

2 participants