Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing columns from header line (compare to CsvSchema) not detected when reordering columns (add CsvParser.Feature.FAIL_ON_MISSING_HEADER_COLUMNS) #285

Closed
bjmi opened this issue Aug 23, 2021 · 12 comments
Labels
csv has-failing-test Indicates that there exists a test case (under `failing/`) to reproduce the issue
Milestone

Comments

@bjmi
Copy link

bjmi commented Aug 23, 2021

When reading given CSV with jackson-dataformat-csv 2.11.4

name
Roger
Chris

using following snippet

CsvMapper csvMapper = new CsvMapper();
csvMapper.configure(CsvParser.Feature.FAIL_ON_MISSING_COLUMNS, true);
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true)
        .addColumn("name").addColumn("age").build();
List<Person> persons = csvMapper
        .readerFor(Person.class)
        .with(csvSchema)
        .<Person> readValues(csv)
        .readAll();
...
class Person {
    public String name;
    public int age;
}

doesn't throw a CsvMappingException although age column is missing in CSV.

@mamos-helloself
Copy link

The same thing is true of the strict headers feature. I really don't care about the order of the columns, or any additional ones, I just care that the ones I want are present.

This should be a common requirement it is a shame it cannot be enforced

@mamos-helloself
Copy link

There is also no way to have an error if a column name is repeated in the header when reordering is in use. It silently chooses one.

@cowtowncoder
Copy link
Member

cowtowncoder commented Nov 24, 2021

I have hoped to look into this for a while now but for past 3 months my paying job has taken over almost all of my coding time, since I do not get paid to work on Jackson (not part of my daytime job).

So, @mamos-helloself PRs would obviously be welcome.

I may be able to at least add the failing test soon now tho.

@cowtowncoder
Copy link
Member

Hmmh. Looking at this now I wonder if the reason is due to "use header" -- header really only has 1 column and if that is to be considered authoritative, there is nothing missing?
So perhaps it goes to precedence/definition of how explicit schema should interact with header line...

I'll have to look into this more and will first add a failing test just to reproduce what is seen here.

cowtowncoder added a commit that referenced this issue Nov 24, 2021
@cowtowncoder cowtowncoder added the has-failing-test Indicates that there exists a test case (under `failing/`) to reproduce the issue label Nov 24, 2021
@cowtowncoder cowtowncoder changed the title FAIL_ON_MISSING_COLUMNS doesn't work with column reordering. FAIL_ON_MISSING_COLUMNS doesn't work with column reordering, header line Nov 24, 2021
@bjmi
Copy link
Author

bjmi commented Nov 25, 2021

Hmmh. Looking at this now I wonder if the reason is due to "use header" -- header really only has 1 column and if that is to be considered authoritative, there is nothing missing?

The one column example was probably to simple.
What about this two column CSV:

name,age
Roger,27
Chris,53
CsvMapper csvMapper = new CsvMapper();
csvMapper.configure(CsvParser.Feature.FAIL_ON_MISSING_COLUMNS, true);
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true)
        .addColumn("name").addColumn("weight").addColumn("age").build();
List<Person> persons = csvMapper
        .readerFor(Person.class)
        .with(csvSchema)
        .<Person> readValues(csv)
        .readAll();
...
class Person {
    public String name;
    public int weight;
    public int age;
}

doesn't throw a CsvMappingException although weight column is missing in CSV.

@bjmi
Copy link
Author

bjmi commented Nov 25, 2021

Sorry I misunderstood your last comment, but the example is valid.

@bjmi
Copy link
Author

bjmi commented Jan 7, 2022

My daily use case has 3 requirements

Is there a real chance that this issue will be fixed?

@cowtowncoder
Copy link
Member

cowtowncoder commented Jan 8, 2022

Yes, if and when someone has time to look into it. I have been hoping to address it but I am hopelessly swamped at work and helping others with PRs with Jackson, releases all that stuff.
But I do hope to help here in some form.

It is likely that for this to move in near future, a PR would be needed.

On plus side, as per my earlier comment, a failing test is now included in repo (has to be run via IDE, excluded from default test run).

@cowtowncoder
Copy link
Member

cowtowncoder commented Aug 8, 2022

Hmmm. I suspect that the issue is with the actual header line only having one column, and thereby "extra" columns of CsvSchema being discarded. I think it is reasonable to expect either that:

  1. If header line has fewer columns than schema, exception would be thrown OR
  2. If header line has fewer columns the "missing" columns from CsvSchema would simply be appended in the order they were given

I think I'll proceed with (2), although I wonder if some configurability might be needed anyway, to alter behavior.

And yes, I am finally circling back here, hoping to resolve this for 2.14(.0).

EDIT: re-reading full discussion, it does sound like (1) is what is wanted actually.
So I will try doing that instead; this would also make the test I added pass.

@cowtowncoder
Copy link
Member

Ok: changed the behavior to optionally throw an exception if column headers are missing (compared to registered CsvSchema), controller by new

CsvParser.Feature.FAIL_ON_MISSING_HEADER_COLUMNS

which defaults to true (that is, by default this problem is reported).

Will be in 2.14.0 release

@cowtowncoder cowtowncoder added this to the 2.14.0 milestone Aug 21, 2022
@cowtowncoder cowtowncoder changed the title FAIL_ON_MISSING_COLUMNS doesn't work with column reordering, header line Missing columns from header line (compare to CsvSchema) not detected when reordering columns (add CsvParser.Feature.FAIL_ON_MISSING_HEADER_COLUMNS) Aug 21, 2022
@norrisjeremy
Copy link

norrisjeremy commented Nov 10, 2022

Perhaps I'm misunderstanding something, but I'm not seeing anywhere in which the new FAIL_ON_MISSING_HEADER_COLUMNS feature is actually checked to decide whether this new behavior should occur?

@norrisjeremy
Copy link

I.e., should there be something like this?

diff --git a/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java b/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java
index b7b050e4..c3621ae9 100644
--- a/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java
+++ b/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java
@@ -862,14 +862,16 @@ public class CsvParser
             }
         }
         // [dataformats-text#285]: Are we missing something?
-        int diff = schemaColumnCount - newColumnCount;
-        if (diff > 0) {
-            Set<String> oldColumnNames = new LinkedHashSet<>();
-            _schema.getColumnNames(oldColumnNames);
-            oldColumnNames.removeAll(newSchema.getColumnNames());
-            _reportCsvMappingError(String.format("Missing %d header column%s: [\"%s\"]",
-                    diff, (diff == 1) ? "" : "s",
-                            String.join("\",\"", oldColumnNames)));
+        if (CsvParser.Feature.FAIL_ON_MISSING_HEADER_COLUMNS.enabledIn(_formatFeatures)) {
+            int diff = schemaColumnCount - newColumnCount;
+            if (diff > 0) {
+                Set<String> oldColumnNames = new LinkedHashSet<>();
+                _schema.getColumnNames(oldColumnNames);
+                oldColumnNames.removeAll(newSchema.getColumnNames());
+                _reportCsvMappingError(String.format("Missing %d header column%s: [\"%s\"]",
+                        diff, (diff == 1) ? "" : "s",
+                                String.join("\",\"", oldColumnNames)));
+            }
         }
 
         // otherwise we will use what we got

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csv has-failing-test Indicates that there exists a test case (under `failing/`) to reproduce the issue
Projects
None yet
Development

No branches or pull requests

4 participants