MergeJoin - lobodava/artisan-orm GitHub Wiki
MergeJoin is an in-memory helper for stitching together flat result sets into a parent / child object graph. It walks the master and detail enumerables in lock-step a single time — O(N + M) total work, zero extra allocations — and calls your delegate for each linked pair.
Pairs naturally with ReadToLists for assembling graphs from a single round-trip.
You query parent + child tables in one round trip:
select * from dbo.GrandRecords order by Id;
select * from dbo.Records order by GrandRecordId;You get back two flat lists. You want them wired up so each parent has its Records collection populated and each child has its GrandRecord back-reference set. The natural LINQ approach allocates a Dictionary keyed by the parent id and probes it once per child:
var byParent = grandRecords.ToDictionary(gr => gr.Id);
foreach (var r in records)
{
var gr = byParent[r.GrandRecordId];
r.GrandRecord = gr;
gr.Records ??= new List<Record>();
gr.Records.Add(r);
}MergeJoin does the same wiring without the Dictionary — it walks both lists in parallel and matches consecutive runs:
grandRecords.MergeJoin(
gr => gr.Records ??= new List<Record>(),
records,
isMasterDetailLink: (gr, r) => gr.Id == r.GrandRecordId,
joinedMasterDetailAction: (gr, r) => { r.GrandRecord = gr; gr.Records.Add(r); });Both lists must be sorted by the same join key, in the same direction.
This is typically guaranteed by adding the matching order by clause to the SQL that produced the detail list. If the detail list is not sorted, MergeJoin will silently miss matches — any detail whose master appears later than the cursor's current position is skipped forever.
This is a real bug-foot-gun. Treat the order by in your stored procedure as part of the API contract.
public static void MergeJoin<TMaster, TDetail>(
this IEnumerable<TMaster> masterList,
Action<TMaster>? eachMasterAction, // optional per-master init
IEnumerable<TDetail> detailList,
Func<TMaster, TDetail, bool> isMasterDetailLink,
Action<TMaster, TDetail> joinedMasterDetailAction);The eachMasterAction is called before any details are joined to that master — useful for allocating the children collection (gr.Records ??= new List<Record>()) so masters with zero details still have an empty (not null) list.
When a master has two unrelated child collections — say, an order has both line items and payments:
grandRecords.MergeJoin(
records,
(gr, r) => gr.Id == r.GrandRecordId,
(gr, r) => { r.GrandRecord = gr; gr.Records .Add(r); },
payments,
(gr, p) => gr.Id == p.GrandRecordId,
(gr, p) => { p.GrandRecord = gr; gr.Payments.Add(p); });Single pass over the masters; each master consumes its details from list 1, then from list 2. Both detail lists must be sorted by the master FK.
For hierarchies like GrandRecord → Record → ChildRecord:
var (grandRecords, records, childRecords) =
repo.ReadToLists<GrandRecord, Record, ChildRecord>("dbo.GetThreeLevelGraph");
grandRecords.MergeJoin(
records,
(gr, r) => gr.Id == r.GrandRecordId,
(gr, r) => { r.GrandRecord = gr; gr.Records.Add(r); },
childRecords,
(r, cr) => r.Id == cr.RecordId, // note: keyed on Record, not GrandRecord
(r, cr) => { cr.Record = r; r.ChildRecords.Add(cr); });Required ordering:
-
recordssorted byGrandRecordId(so they walk in lock-step with the masters) -
childRecordssorted byRecordId(so they walk in lock-step with the records)
| Aspect | MergeJoin |
LINQ GroupJoin
|
|---|---|---|
| Time complexity | O(N + M) | O(N + M) average, hash collisions worst |
| Memory overhead | None | Hash dictionary, one entry per master |
| Precondition | Both lists sorted by the join key | None |
| API ergonomics | Imperative — wires graphs in one expression | Declarative — returns IEnumerable to materialise |
If your details are already coming back from SQL with an explicit order by, MergeJoin is strictly cheaper. If they aren't and you don't want to control sort order, GroupJoin is fine and the dictionary cost is rarely the bottleneck.
-
Forgot
order byin the SQL. Symptoms: details silently dropped, parent collections shorter than expected, no exception. Fix: always sort the detail SELECT by the foreign key it joins on. -
Sorted by the wrong column. If
Recordsare sorted byIdinstead ofGrandRecordId, the algorithm scans the detail list from the start for each master and never matches anything past the first run. Same symptom as above. -
Different direction. Both lists must be sorted in the same direction (both ascending, typically). Mixing ASC and DESC will silently miss matches.
-
Forgot
eachMasterActionon the empty-children case. A master with zero matching details will keep itsChildrenproperty atnull(or whatever initial value the mapper assigned). UseeachMasterActionto initialise the collection unconditionally.
public IList<GrandRecord> GetAll()
{
return GetByCommand(cmd =>
{
cmd.UseProcedure("dbo.GetGrandRecordsWithEverything");
// Procedure returns:
// select * from GrandRecords order by Id;
// select * from Records order by GrandRecordId;
// select * from ChildRecords order by RecordId;
var (grandRecords, records, childRecords) =
cmd.ReadToLists<GrandRecord, Record, ChildRecord>();
grandRecords.MergeJoin(
gr => gr.Records ??= new List<Record>(),
records,
(gr, r) => gr.Id == r.GrandRecordId,
(gr, r) => { r.GrandRecord = gr; gr.Records.Add(r); },
childRecords,
(r, cr) => r.Id == cr.RecordId,
(r, cr) => {
r.ChildRecords ??= new List<ChildRecord>();
cr.Record = r;
r.ChildRecords.Add(cr);
});
return grandRecords;
});
}One round trip, three result sets, three lists, fully linked object graph.
See also:
- ReadToLists — multi-result-set tuples
- Negative identities and object graph saving — the inverse direction (graph -> tables)
- What's New in v4