Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

Inject "missing" trailing columns as nulls (CsvParser.Feature.INSERT_NULLS_FOR_MISSING_COLUMNS) #137

Closed
Schaka opened this issue Nov 11, 2016 · 4 comments
Labels

Comments

@Schaka
Copy link

Schaka commented Nov 11, 2016

Imagine the following scenario:

  • you have a CSV with 8 columns
  • your POJO has 9 fields

All 8 fields are filled, but the 9th, missing field is completely ignored. The setter is never called.
I would expect the setter to be called, so that I can use some custom validations, but it seems none of the DeserializationFeatures allow for this.

Is there any way to throw an exception if the length doesn't matter, or to automatically treat missing fields as null values (or throw an exception for them), currently?

@cowtowncoder
Copy link
Member

What existing options from 2.8 have you tried? Specifically meaning CsvParser.Features, and settings in CsvSchema? Additional entries may be ignored; and "missing", in the sense of having empty String, should be mappable to null (although default to whatever type column is declared to have). And it is possible to ignore values of columns for which there is no corresponding pojo property.

@Schaka
Copy link
Author

Schaka commented Nov 14, 2016

I have tried several features, including IGNORE_TRAILING_UNMAPPABLE.
I created a test to more clearly illustrate what I am expecting (an exception similar to "given String cannot be converted to Double, or at least a CALL to my setter so that I can produce this exception, through a custom Deserializer e.g.)

@RunWith(JUnit4.class)
public class RawCsvParserTest {

    private String invalidText =
        "1;44;m;01.10.1994;5;2;Master / Diplom Uni;50.734,44\n" +
        "2;40;m;01.11.1997;5;2;Diplom Uni;49.714,44\n" +
        "4;53;m;01.04.1989;6;5;Diplom Uni;91.145,88\n" +
        "5;51;w;22.03.1977;2;1;keine;26.557,83\n" +
        "6;37;m;01.07.2003;3;1;Berufsausbildung;36.322,48\n" +
        "46;41;m;15.09.1994;5;2;Studium;48.850,44\n" +
        "47;42;m;01.09.1994;5;3;Studium;61.294,44\n" +
        "143;29;w;01.08.2001;2;1;keine;9.444,90\n" +
        "144;44;m;01.10.1997;6;4;Bachelor / Diplom FH;58.174,44\n" +
        "145;24;m;01.08.2005;2;1;Berufsausbildung;24.334,44\n" +
        "261;31;w;15.03.2004;5;3;Studium;53.878,44\n" +
        "262;45;m;01.09.2004;0;1;Diplom FH;40.102,44\n" +
        "263;38;w;01.06.1998;3;1;Meister;38.542,44\n" +
        "264;22;m;01.08.2003;2;1;Berufsausbildung;28.402,44\n" +
        "265;39;m;01.04.1998;4;1;Diplom FH;33.466,44\n" +
        "266;30;m;01.04.2004;5;3;Berufsakademie;49.797,24\n" +
        "267;42;m;01.12.1994;6;4;Bachelor / Diplom FH;60.334,44\n" +
        "268;39;m;01.06.1999;5;3;weiterführende Berufsausbildung (Meister, Fach-, Betriebswirt);58.822,44\n" +
        "269;47;m;01.04.1992;3;1;weiterführende Berufsausbildung (Meister, Fach-, Betriebswirt);29.422,44\n" +
        "270;42;w;01.12.1987;2;1;Berufsausbildung - abgeschlossene Lehre oder Abitur;9.727,15";

    @Test
    public void shouldParseCsvStringWithNull() throws IOException {
        CsvMapper csvMapper = new CsvMapper();
        csvMapper.enable(DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES);
        csvMapper.enable(DeserializationFeature.ACCEPT_EMPTY_STRING_AS_NULL_OBJECT);

        CsvSchema schema = csvMapper.schemaFor(ParseAbleJob.class).withColumnSeparator(';').withAllowComments(true);
        CsvSchema rawSchema = csvMapper.schemaFor(ParseAbleJobRaw.class).withColumnSeparator(';').withAllowComments(true);

        InputStream inputStream = new ByteArrayInputStream(invalidText.getBytes());
        Stream<String> csvLines = new BufferedReader(new InputStreamReader(inputStream)).lines();

        List<ParseAbleJob> toParse = new LinkedList<>();
        List<ParseAbleJobRaw> rawParse = new ArrayList<>();

        csvLines.forEach(s -> {
            try{
                //expecting exception because primitive double cannot be null/empty, but setter is never even called
                toParse.add(csvMapper.readerFor(ParseAbleJob.class).with(schema).readValue(s));
            }catch(Exception e){
                try{
                    rawParse.add(csvMapper.readerFor(ParseAbleJobRaw.class).with(rawSchema).readValue(s));
                    //all "raw" data (as Strings) can then be validated and returned in a "result"
                    //object that includes a one line of raw data and the corresponding "Error" object
                }catch (IOException ioEx){
                    ioEx.printStackTrace();
                }
            }
        });
        Assert.assertEquals(toParse.size(), 0);
        Assert.assertEquals(rawParse.size(), 20);
    }

    @JsonPropertyOrder({ "id", "alter", "geschlecht", "eintrittsDatum", "anforderung", "beruflicheStellung", "ausbildung", "gehalt", "teilzeitFaktor" })
    private static class ParseAbleJob {

        private String id;
        private String alter;
        private String geschlecht;
        private String eintrittsDatum;
        private String anforderung;
        private String beruflicheStellung;
        private String ausbildung;
        private String gehalt;
        private double teilzeitFaktor;

        public String getId() {
            return id;
        }

        public void setId(String id) {
            this.id = id;
        }

        public String getAlter() {
            return alter;
        }

        public void setAlter(String alter) {
            this.alter = alter;
        }

        public String getGeschlecht() {
            return geschlecht;
        }

        public void setGeschlecht(String geschlecht) {
            this.geschlecht = geschlecht;
        }

        public String getEintrittsDatum() {
            return eintrittsDatum;
        }

        public void setEintrittsDatum(String eintrittsDatum) {
            this.eintrittsDatum = eintrittsDatum;
        }

        public String getAnforderung() {
            return anforderung;
        }

        public void setAnforderung(String anforderung) {
            this.anforderung = anforderung;
        }

        public String getBeruflicheStellung() {
            return beruflicheStellung;
        }

        public void setBeruflicheStellung(String beruflicheStellung) {
            this.beruflicheStellung = beruflicheStellung;
        }

        public String getAusbildung() {
            return ausbildung;
        }

        public void setAusbildung(String ausbildung) {
            this.ausbildung = ausbildung;
        }

        public String getGehalt() {
            return gehalt;
        }

        public void setGehalt(String gehalt) {
            this.gehalt = gehalt;
        }

        public double getTeilzeitFaktor() {
            return teilzeitFaktor;
        }

        public void setTeilzeitFaktor(double teilzeitFaktor) {
            this.teilzeitFaktor = teilzeitFaktor;
        }
    }

    @JsonPropertyOrder({ "id", "alter", "geschlecht", "eintrittsDatum", "anforderung", "beruflicheStellung", "ausbildung", "gehalt", "teilzeitFaktor" })
    private static class ParseAbleJobRaw {

        private String id;
        private String alter;
        private String geschlecht;
        private String eintrittsDatum;
        private String anforderung;
        private String beruflicheStellung;
        private String ausbildung;
        private String gehalt;
        private String teilzeitFaktor;

        public String getId() {
            return id;
        }

        public void setId(String id) {
            this.id = id;
        }

        public String getAlter() {
            return alter;
        }

        public void setAlter(String alter) {
            this.alter = alter;
        }

        public String getGeschlecht() {
            return geschlecht;
        }

        public void setGeschlecht(String geschlecht) {
            this.geschlecht = geschlecht;
        }

        public String getEintrittsDatum() {
            return eintrittsDatum;
        }

        public void setEintrittsDatum(String eintrittsDatum) {
            this.eintrittsDatum = eintrittsDatum;
        }

        public String getAnforderung() {
            return anforderung;
        }

        public void setAnforderung(String anforderung) {
            this.anforderung = anforderung;
        }

        public String getBeruflicheStellung() {
            return beruflicheStellung;
        }

        public void setBeruflicheStellung(String beruflicheStellung) {
            this.beruflicheStellung = beruflicheStellung;
        }

        public String getAusbildung() {
            return ausbildung;
        }

        public void setAusbildung(String ausbildung) {
            this.ausbildung = ausbildung;
        }

        public String getGehalt() {
            return gehalt;
        }

        public void setGehalt(String gehalt) {
            this.gehalt = gehalt;
        }

        public String getTeilzeitFaktor() {
            return teilzeitFaktor;
        }

        public void setTeilzeitFaktor(String teilzeitFaktor) {
            this.teilzeitFaktor = teilzeitFaktor;
        }
    }
}

@cowtowncoder
Copy link
Member

I think #140 is sort of similar (add a feature to throw exception for missing (trailing) columns).

Injecting additional nulls would be another possibility / alternative. What I wonder is whether that should be done automatically (without configurability), or adding yet another CsvParser.Feature for such fillers.

@cowtowncoder cowtowncoder changed the title Can't parse missing columns as null Inject "missing" trailing columns as nulls instead of skipping Feb 24, 2017
@cowtowncoder
Copy link
Member

I think a feature is warranted; whether it should be enabled by default or not is another question.

As to the "why is setter not called": Jackson only maps data it sees, and not data that is not there -- this stems from JSON where data can theoretically have any keys, and challenge is to map those to POJOs.
But not POJO properties necessarily have values from JSON; often null valued properties are just omitted for compactness.
Keeping track of properties for which values have seen would add processing overhead, without adding much value for general case.

Having said that, however, some users would prefer ability to force existence, to define "required" properties. Currently, as of Jackson 2.8, this is possible to do for special type of properties, ones that:

  1. have annotation @JsonProperty(required=true) AND
  2. are defined as "Creator properties"; that is, passed via constructor or factory method annotated with @JsonCreator

If so, exception is thrown if value is not encountered for such property.
It is possible that this functionality will be expanded in future, although it probably will not happen for 2.9.

Now... since CSV format is conceptually little bit different, and set of properties is more rigid, I think it makes sense to be able to add constraints differently. As such, both failure (see #140) and "auto-fill with null" (this issue) make sense. These features may be seen to overlap with higher level databinding features, but I don't think that is a big problem -- format-level checks have their benefits, including ability to allow recovery during incremental processing: something that may be more difficult to do at higher level.

@cowtowncoder cowtowncoder changed the title Inject "missing" trailing columns as nulls instead of skipping Inject "missing" trailing columns as nulls (JsonParser.Feature.INSERT_NULLS_FOR_MISSING_COLUMNS) Mar 16, 2017
@cowtowncoder cowtowncoder changed the title Inject "missing" trailing columns as nulls (JsonParser.Feature.INSERT_NULLS_FOR_MISSING_COLUMNS) Inject "missing" trailing columns as nulls (CsvParser.Feature.INSERT_NULLS_FOR_MISSING_COLUMNS) Sep 14, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants