Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array element and field token spans include previous comma #229

Closed
latkin opened this issue Nov 12, 2015 · 5 comments
Closed

Array element and field token spans include previous comma #229

latkin opened this issue Nov 12, 2015 · 5 comments
Milestone

Comments

@latkin
Copy link

latkin commented Nov 12, 2015

Consider JSON blob

[1,    2,    3]

The spans from getTokenLocation to getCurrentLocation when sitting on each VALUE_NUMBER_INT are currently

1
,    2
,    3

I would expect them to each be 1-char wide, i.e.

1
2
3

Is this expected/per-spec?

@cowtowncoder
Copy link
Member

That does not look right.

getTokenLocation should point to the first character that is logically part of the token (first digit in this case of numbers). Current location is less well defined, and it can only be relied on giving useful information for error reporting, mostly showing where an unexpected input character was found. During normal operation, current location should point to the character past end of the logical token, as long as lazy parsing is not affecting the result: for example, JSON Strings are not fully decoded until they are actually requested. But additionally current location will be unlikely to give end location of property names, because parsing may eagerly decoded part of the value token that follows.

So, first things first: getTokenLocation() seems wrong if it points to leading comma.

@latkin
Copy link
Author

latkin commented Nov 12, 2015

@cowtowncoder thanks for the confirmation. Here's a repro in Scala (using Jackson 2.6.3).

val json = "[1,     2,     3]"
val parser = new JsonFactory().createParser(json)
while(parser.nextToken() != null) {
  val tok = parser.getCurrentToken
  val tokStart = parser.getTokenLocation.getCharOffset.toInt
  val currLoc  = parser.getCurrentLocation.getCharOffset.toInt
  val slice = json.substring(tokStart, currLoc)

  println(f"$tok%-20s Start: $tokStart%2d End: $currLoc%2d Slice: {$slice}")
}

Output:

START_ARRAY          Start:  0 End:  1 Slice: {[}
VALUE_NUMBER_INT     Start:  1 End:  2 Slice: {1}
VALUE_NUMBER_INT     Start:  2 End:  9 Slice: {,     2}
VALUE_NUMBER_INT     Start:  9 End: 16 Slice: {,     3}
END_ARRAY            Start: 16 End: 17 Slice: {]}

@latkin
Copy link
Author

latkin commented Nov 12, 2015

And for reference, the corresponding behavior for object fields also occurs, and is mentioned here #37 (comment)

@cowtowncoder
Copy link
Member

@latkin Thank you! Field name case was known already, and can hopefully be fixed for 2.7.0. Array element one is new to me.

@cowtowncoder
Copy link
Member

Yes, I can reproduce this, and can see where it goes wrong, as well as how it can be fixed.
Now the only (?) question is how to coordinate fixing this, along with fix for #37.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants