Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BytesToNameCanonicalizer can mishandle leading NUL #148

Closed
rjmac opened this issue Aug 10, 2014 · 10 comments
Closed

BytesToNameCanonicalizer can mishandle leading NUL #148

rjmac opened this issue Aug 10, 2014 · 10 comments
Milestone

Comments

@rjmac
Copy link

rjmac commented Aug 10, 2014

More from randomized testing: deserializing the document { "\u0000abc" : "a", "abc" : "b" } via the UTF8StreamJsonParser will produce the same text for the second field name as the first. This is because the BytesToNameCanonicalizer will left-pad "abc" with a 0 byte to fit it in an int, which is indistinguishable from a string which really has a leading zero char.

Here's a standalone test case for it:

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParser;

public class Tokens {
    public static void main(String[] args) throws Exception {
        String[] expectedTexts = { "{", "\u0000abc", "a", "abc", "b", "}" };
        JsonParser parser = new JsonFactory().createParser("{\"\\u0000abc\" : \"a\", \"abc\" : \"b\"}".getBytes());
        for(String expected : expectedTexts) {
            parser.nextToken();
            if(!parser.getText().equals(expected)) throw new Exception(parser.getText() + " != " + expected);
        }
    }
}
@cowtowncoder
Copy link
Member

Hmmh. I hate the fact that null Unicode character is technically valid. Almost enough to rather throw an exception if one is seen, and indicate Jackson will not allow it...

Now: I am guessing this is found via generated test combinations, and not with actual usage? Just want to figure out likelihood of this being something found in the wild.

@rjmac
Copy link
Author

rjmac commented Aug 11, 2014

That guess is correct. I'm not actually deliberately testing Jackson directly; I'm testing my own code and coming up with places where the test fails because the property being checked is forall x. decode(encode(x)) == x, and finding places where that's not true in Jackson.

Of course, it's possible come up with a situation where it's a potential security issue, due to allowing a user to change the interpretation of a chunk of JSON depending on what kind of parser is used. I admit it'd be hard to come up with a real case, but a super-contrived one's like this: say there's a service processor that operates in two phases. Phase one checks the credentials stored in a request, and if they're good, passes it unmodified to a second service that assumes the creds are good (because it knows the first layer has already handled it) and uses the username to decide whether to allow or deny a certain action. A valid but malicious user called "unprivileged" sends {"usr":"privileged","\u0000usr":"unprivileged","psw":"whatever",...other stuff...}. Auth layer is using byte-decoding, service layer uses text-decoding. Then the first layer will check the password against "unprivileged" and the second layer will see the "privileged" username. Now, that is obviously a stupidly designed system for a host of reasons, but it also only took about 10 seconds to come up with.

rjmac added a commit to rjmac/jackson-core that referenced this issue Aug 11, 2014
0xff can't occur in well-formed UTF-8, so it will not cause false-positives.
Might resolve FasterXML#148
@rjmac
Copy link
Author

rjmac commented Aug 11, 2014

I'm not sure that patch is correct, but it seems to solve the immediate issue and pass the tests. I don't much like it; it feels to me like the padding should be handled in the BytesToNameCanonicalizer itself rather than in the user of the BTNC.

rjmac added a commit to rjmac/jackson-core that referenced this issue Aug 11, 2014
rjmac added a commit to rjmac/jackson-core that referenced this issue Aug 11, 2014
@cowtowncoder
Copy link
Member

Right, this is problematic, and good catch no matter how things end up.

I started thinking that there may actually be relatively easy and inexpensive fix by using different padding value: since UTF-8 encoded content can not have couple of highest-valued bytes (for example, 0xFF is illegal), it might be possible to just initialize padding to -1 to be able to distinguish valid (if unusual) values of zero unit.

cowtowncoder added a commit that referenced this issue Aug 12, 2014
@cowtowncoder
Copy link
Member

Heh. I should have read your update before adding a comment. Great minds think alike etc. :-)

I see why you feel conflicted about patch: I agree that ideally it'd be hidden in symbol table. But then again, patch is delightfully small. One potential improvement could be to try to initialize quad value with padding, instead of applying patch in the end; but then again that'd actually do bit more work. And I suspect performance-wise difference probably couldn't even be measured.

I'll think about this for a bit; I checked in (failing) unit test. I am still wondering if there is any real risk in adding this in patch version, but your patch seems so elegantly small that it seems unlikely there could be any risk.

@cowtowncoder
Copy link
Member

Actually looking at patch again, I think it is pretty optimal (from efficiency POV). No point in tweaking, at most could inline if jmh showed that to matter (I am guessing it should not, easy enough to verify).

@cowtowncoder
Copy link
Member

@rjmac Do you think patch is complete, that is, can I just apply it as is?

@rjmac
Copy link
Author

rjmac commented Aug 13, 2014

I think so. It passes Jackson's own test suite, and the snapshot jar I built also looks like it passes my tests that showed this up in the first place.

@cowtowncoder
Copy link
Member

Ok thanks. My main concern really is with patch release; 2.4.2 is finally resolving couple of regressions 2.4 had (due to various low-level performance optimizations). One possibility would be to do this for 2.5.0, just to eliminate even the small risk.

@cowtowncoder
Copy link
Member

I switched master branch to be 2.5.0-SNAPSHOT, and will target fix for 2.5, just to be extra sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants