Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

Impossible to turn off Alphanumeric Sort for bean properties #40

Closed
dmitry87 opened this issue May 6, 2014 · 11 comments
Closed

Impossible to turn off Alphanumeric Sort for bean properties #40

dmitry87 opened this issue May 6, 2014 · 11 comments

Comments

@dmitry87
Copy link

dmitry87 commented May 6, 2014

package com.fasterxml.jackson.dataformat.csv;
....
public class CsvFactory extends JsonFactory{
.....
 // Yes; CSV is positional
  @Override
    public boolean requiresPropertyOrdering() {
        return true;
    }
....
}

Because of this code AlphanumericSort is set twice. In ObjectMapper

public ObjectMapper(JsonFactory jf,
            DefaultSerializerProvider sp, DefaultDeserializationContext dc)
    {
....
        final boolean needOrder = _jsonFactory.requiresPropertyOrdering();
        if (needOrder ^ _serializationConfig.isEnabled(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY)) {
            configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, needOrder);
        }

....        
    }

and in CsvMapper

public CsvMapper(CsvFactory f)
    {
        super(f);
        // As per #11: default to alphabetic ordering
        enable(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY);
    }

so it's impossible to turn it off. Probably SORT must be applied once or made pluggable.

@cowtowncoder
Copy link
Member

Property ordering is one thing where I actually wish that I had started with default of alphabetic ordering; and only allowed other stable orderings. Current default is unpredictable because it is based on ordering of Fields and Methods that JDK uses; which as of JDK7 is arbitrary.

This is the background for change, to ensure that all data formats that use index-based encoding (as opposed to name-based, as JSON) -- for now, CSV and Avro -- will use a stable ordering; and alphabetic sorting is the only such default choice.

But note that this is just the default ordering: explicit ordering by @JsonPropertyOrder, or via Schema will have higher priority.

Having said that, I think it would make sense to allow other methods of defining default ordering; perhaps similar to how one can specify property name strategies for defining name transformations (between Java camel-case and other notations). I am open to suggestions in good, simple, extensive way of doing this.

But, before talking about improvements, what is the specific problem you are trying to solve here?

@dmitry87
Copy link
Author

dmitry87 commented May 7, 2014

Thank you for quick response!
I've seen #11 and I agree that it's a good fix if property ordering will change in JDK. And I think it's better not to turn on that feature for now or leave the possibility for a user to disable ordering.
Found out how to disable it - will try to resolve my other issues
Hope helps someone

(CsvMapper) new CsvMapper().disable(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY);

Well moving to jackson 2.3.3 caused Junit tests failures because of ordering. Yes I noticed that there is a @JsonPropertyOrder and I'm trying to generate CsvSchema with

mapper.typedSchemaFor(JaxBGeneratedFromXML.class) 

I didn't find a way to generate property ordered schema from POJO without annotations. If there was a way not to be dependent on annotations it would be excellent.

The main issue is #26 - some of headers are quoted and some are not (Probably depends on header's name length - moving to recent version could help here :) ). Also need to remove " - double quotes mark at all for CSV generated file and replace line separator with "\r\n" which is implemented in recent version ( #32 and #17 ) .

I can close the issue then, because there is a possibility to turn the sorting off

@dmitry87
Copy link
Author

dmitry87 commented May 7, 2014

On #26

package com.fasterxml.jackson.dataformat.csv.impl;
...
public final class CsvWriter
{
...
 /**
* Helper method that determines whether given String is likely
* to require quoting; check tries to optimize for speed.
*/
    protected final boolean _mayNeedQuotes(String value, int length)
    {
        // 21-Mar-2014, tatu: If quoting disabled, don't quote
        if (_cfgQuoteCharacter < 0) {
            return false;
        }
        // let's not bother checking long Strings, just quote already:
        if (length > MAX_QUOTE_CHECK) {
            return true;
        }
        for (int i = 0; i < length; ++i) {
            if (value.charAt(i) < _cfgMinSafeChar) {
                return true;
            }
        }
        return false;
    }
....
}

If quoting is enabled and we have long string it will be quoted - certainly a BUG, I think. Cannot do anything with that :(

@cowtowncoder
Copy link
Member

No, I don't view that as a bug but feature. Adding quoting is never wrong; choice is more that of aesthetics, or perhaps optimization (to avoid adding quotes). And checking as to whether quoting is needed adds to processing overhead, so you trade speed for more accurate assessment.

I am open to improvements here; perhaps a simple Feature for "always check the whole String to determine whether to use quoting" would make sense? It could be turned on to make sure that optimal (size-/style-wise) determination is made.
Another option would be to make max-check-length configurable; setting it to Integer.MAX_VALUE (or, negative?) would force checks; setting 0 would force escaping for everything, all the time.

@cowtowncoder
Copy link
Member

Back to ordering: it should be simple to add an optional argument for typedSchemaFor(), to define ordering. Actually, perhaps even better, there could be another variant that just takes Comparator<String> as argument; and convenience method that takes List<String> for ordering (which may be partial).

@dmitry87
Copy link
Author

dmitry87 commented May 7, 2014

  1. About quoting - it's doubtfull - if headers from one symbol to any length long - then why some are quoted and some are not.... If you want to import the result file into excel it will not understand your additional quotes (must be all quoted or not at all)... :(
  2. About max check length - it's not obvious that you have such property to improve speed. I was surprised actually.
  3. Ordering - I agree - out of the box it should use with no effort working model. And if you want it to make really fast you have to tune it a little bit - decrease length when quotes are added, add ordering and so on. If someone really needs speed - he will take a look at your wiki page and find all the solutions and improvements.
    B.t.w. You've updated Property filters - will there be any manual as it was on your previous wiki page? (Have to mention the manuals there were really great!)

@cowtowncoder
Copy link
Member

  1. Ok, I was not aware of Excel having problems with mixed entries; some quoted, others not, for header line. If this is the case, I would like to change this. Do you have links to this, or just sample document? I have Excel on my Mac so I should be able to verify it. I mean, CSV as format does not require such uniformity, but if Excel does I guess I'm ok with different working.
  2. Point is that having to traverse input twice -- once to see if quoting is needed, second time to quote -- takes about twice as long as simply quote. And if long(er) content is likely to need quoting anyway, why not just quote it?
  3. I think ordering is something that won't affect performance too much

One suggestion: since handling of quoting is bit different from that of ordering, could we focus on ordering in this issue, and create another issue (or use existing one, there may be one open) for quoting? Especially if it affects quoting of header line; I would like to understand Excel's expectations.

@dmitry87
Copy link
Author

dmitry87 commented May 8, 2014

yes, I think It's a good idea to focus on ordering of properties here and continue with quoting in #26 .

  1. Unfortunately I do not have excel file we generated with jackson at home - I can look for it on Monday.
  2. I understand for someone speed is the critical issue so I recommed to create another implementation or not to make method final.
  3. Agree here. When you are using CSV not in alphabetical order and you do not want or cannot use annotations for ordering - you can hope that JDK order will not change :) I mean if there is possibility to turn it of - it's ok and this issue can be closed. Do not know if it's a good idea to enable ordering by default, probably need to mention it in Javadocs then.

cowtowncoder added a commit that referenced this issue May 16, 2014
@cowtowncoder
Copy link
Member

I made CsvWriter non-final, as well as _mayNeedQuotes, as the simplest starting point.
Will probably add a feature to allow checking for optimal quoting, regardless of string value / header name length: related to this need to perhaps do full character check too -- current check takes conservative approach, since there are multiple characters that may indicate need for quoting.

@cowtowncoder
Copy link
Member

I created #42 for functionality to specify sorting via CsvSchema.

@cowtowncoder
Copy link
Member

#40 implemented, #26 separate, closing this entry.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants