Some basic modifications done in the GetRecord logic - thiscode/CsvHelper GitHub Wiki
The modification here was done in the file "CsvReader.cs". The file includes beyond this, all the other modifications of this fork. If you want to merge only this modification you can use the file "_CsvReader.cs".
To map data of a CsvRow into a class by an expression is a wonderful idea. As reading and understanding how the method CreateReadRecordFunc works, i had to inspect frequently the body of the expression. In my opinion the using of lamda expressions for reference columns and using of parameter makes the whole thing difficult to analyse, debug and to extend. The complexity raises the more reference columns we use and in special with the new features like collection references.
The only reason using lamdba expressions is to pass the ExpressionParameter "reader" or "reader2". And the value of this parameter is always a reference to the own CsvReader-Object. Replacing the parameter with a ConstantExpression make the using of lambda expressions within the "record creating expression" superfluous. This makes debugging easier and
Instead of generating an expression like this (which have to be enclosed in a lambda expression with parameter $reader):
.New CsvHelper.SpeedTests.Program+TestClassWithReference(){
IntColumn = (System.Int32).Call .Constant<CsvHelper.TypeConversion.Int32Converter>(CsvHelper.TypeConversion.Int32Converter).ConvertFromString(
.Constant<System.Globalization.CultureInfo>(de-DE),
.Call $reader.get_Item(0)),
ReferenceColumn = .Call .Constant<System.Func`2[CsvHelper.ICsvReader,CsvHelper.SpeedTests.Program+TestClassReference]>(System.Func`2[CsvHelper.ICsvReader,CsvHelper.SpeedTests.Program+TestClassReference]).Invoke(.Constant<CsvHelper.CsvReaderOrigin>(CsvHelper.CsvReaderOrigin))
}
Inner Lambda for Reference Column (which have to be enclosed in a lambda expression with parameter $reader2):
.New CsvHelper.SpeedTests.Program+TestClassReference(){
RefIntColumn = (System.Int32).Call .Constant<CsvHelper.TypeConversion.Int32Converter>(CsvHelper.TypeConversion.Int32Converter).ConvertFromString(
.Constant<System.Globalization.CultureInfo>(de-DE),
.Call $reader2.get_Item(5))
}
We have now this clean one (which can be enclosed in a lamda expression WITHOUT parameter):
.New CsvHelper.SpeedTests.Program+TestClassWithReference(){
IntColumn = (System.Int32).Call .Constant<CsvHelper.TypeConversion.Int32Converter>(CsvHelper.TypeConversion.Int32Converter).ConvertFromString(
.Constant<System.Globalization.CultureInfo>(de-DE),
.Call .Constant<CsvHelper.CsvReader>(CsvHelper.CsvReader).get_Item(0)),
ReferenceColumn = .New CsvHelper.SpeedTests.Program+TestClassReference(){
RefIntColumn = (System.Int32).Call .Constant<CsvHelper.TypeConversion.Int32Converter>(CsvHelper.TypeConversion.Int32Converter).ConvertFromString(
.Constant<System.Globalization.CultureInfo>(de-DE),
.Call .Constant<CsvHelper.CsvReader>(CsvHelper.CsvReader).get_Item(5))
}
}
I leaved the origin CsvReader within the project under the name "CsvReaderOrigin". I have added a speed test to compare the two methods. The new parameterless expression was 3 seconds faster if reading 10.000.000 records. I do not think it is a relevant speed improvement, but also the new method it is not slower.
How the parameter "reader" was eliminated:
In the method "AddPropertyBindings" you use a ParameterExpression to call the Item-Get-Method to retrieve the value for a property. Instead of using a parameter, we use a constant expression representing the current CsvReader directly:
// Get the field using the field index.
var method = typeof( ICsvReaderRow ).GetProperty( "Item", typeof( string ), new[] { typeof( int ) } ).GetGetMethod();
Expression fieldExpression = Expression.Call( readerParameter, method, Expression.Constant( index, typeof( int ) ) );
Changed to:
// Get the field using the field index.
var method = typeof( ICsvReaderRow ).GetProperty( "Item", typeof( string ), new[] { typeof( int ) } ).GetGetMethod();
Expression fieldExpression = Expression.Call( Expression.Constant(this), method, Expression.Constant( index, typeof( int ) ) );
Now the Method do not use the argument "readerParameter" anymore:
protected virtual void AddPropertyBindings( ParameterExpression readerParameter, CsvPropertyMapCollection properties, List<MemberBinding> bindings )
Changed to:
protected virtual void AddPropertyBindings( CsvPropertyMapCollection properties, List<MemberBinding> bindings )
And the both call-statements was changed of course, too. Now we have a look on the building loop for reference bindings:
var referenceReaderParameter = Expression.Parameter( typeof( ICsvReader ), "reader2" );
var referenceBindings = new List<MemberBinding>();
AddPropertyBindings( referenceMap.ReferenceProperties, referenceBindings );
var referenceBody = Expression.MemberInit( Expression.New( referenceMap.Property.PropertyType ), referenceBindings );
var referenceFunc = Expression.Lambda( referenceBody, referenceReaderParameter );
var referenceCompiled = referenceFunc.Compile();
var referenceCompiledMethod = referenceCompiled.GetType().GetMethod( "Invoke" );
Expression referenceObjectExpression = Expression.Call( Expression.Constant( referenceCompiled ), referenceCompiledMethod, Expression.Constant( this ) );
bindings.Add( Expression.Bind( referenceMap.Property, referenceObjectExpression ) );
Here you use a MemberInitExpression to instantiate the reference, which you enclose within a LambdaExpression. This expression is called using the ParameterExpression that is not needed anymore. Even if it was necessary, i recommend not to enclose every single reference-binding into a LambdaExpression. I would enclose the whole expression in one big LambdaExpression at the end.
Changed to:
var referenceBindings = new List<MemberBinding>();
AddPropertyBindings( referenceMap.ReferenceProperties, referenceBindings );
var referenceBody = Expression.MemberInit( Expression.New( referenceMap.Property.PropertyType ), referenceBindings );
bindings.Add(Expression.Bind(referenceMap.Property, referenceBody));
Now we do not use the ParameterExpression "reader" nowhere. So we can change the argument "expressionCompiler" of the method "CreateReadRecordFunc":
protected virtual void CreateReadRecordFunc( Type recordType, Func<Expression, ParameterExpression, Delegate> expressionCompiler )
Changed to:
protected virtual void CreateReadRecordFunc( Type recordType, Func<Expression, Delegate> expressionCompiler )
And all the references to this, of course. These are:
- Assignment in method "CreateReadRecordFunc" (var func)
- Method calls and return values in both methods "GetReadRecordFunc"
- Method calls of "GetReadRecordFunc" within all methods "GetRecords"
Last but not least we eliminate the "var readerParameter" within the method "CreateReadRecordFunc".