MergeJoin - lobodava/artisan-orm GitHub Wiki

MergeJoin is an in-memory helper for stitching together flat result sets into a parent / child object graph. It walks the master and detail enumerables in lock-step a single time — O(N + M) total work, zero extra allocations — and calls your delegate for each linked pair.

Pairs naturally with ReadToLists for assembling graphs from a single round-trip.

What problem it solves

You query parent + child tables in one round trip:

select * from dbo.GrandRecords order by Id;
select * from dbo.Records      order by GrandRecordId;

You get back two flat lists. You want them wired up so each parent has its Records collection populated and each child has its GrandRecord back-reference set. The natural LINQ approach allocates a Dictionary keyed by the parent id and probes it once per child:

var byParent = grandRecords.ToDictionary(gr => gr.Id);
foreach (var r in records)
{
    var gr = byParent[r.GrandRecordId];
    r.GrandRecord = gr;
    gr.Records ??= new List<Record>();
    gr.Records.Add(r);
}

MergeJoin does the same wiring without the Dictionary — it walks both lists in parallel and matches consecutive runs:

grandRecords.MergeJoin(
    gr => gr.Records ??= new List<Record>(),
    records,
    isMasterDetailLink:        (gr, r) => gr.Id == r.GrandRecordId,
    joinedMasterDetailAction:  (gr, r) => { r.GrandRecord = gr; gr.Records.Add(r); });

Critical precondition: sorted inputs

Both lists must be sorted by the same join key, in the same direction.

This is typically guaranteed by adding the matching order by clause to the SQL that produced the detail list. If the detail list is not sorted, MergeJoin will silently miss matches — any detail whose master appears later than the cursor's current position is skipped forever.

This is a real bug-foot-gun. Treat the order by in your stored procedure as part of the API contract.

Three overloads

1. Master + Detail

public static void MergeJoin<TMaster, TDetail>(
    this IEnumerable<TMaster> masterList,
    Action<TMaster>?          eachMasterAction,    // optional per-master init
    IEnumerable<TDetail>      detailList,
    Func<TMaster, TDetail, bool> isMasterDetailLink,
    Action<TMaster, TDetail>     joinedMasterDetailAction);

The eachMasterAction is called before any details are joined to that master — useful for allocating the children collection (gr.Records ??= new List<Record>()) so masters with zero details still have an empty (not null) list.

2. Master + two parallel Details

When a master has two unrelated child collections — say, an order has both line items and payments:

grandRecords.MergeJoin(
    records,
    (gr, r) => gr.Id == r.GrandRecordId,
    (gr, r) => { r.GrandRecord = gr; gr.Records .Add(r); },

    payments,
    (gr, p) => gr.Id == p.GrandRecordId,
    (gr, p) => { p.GrandRecord = gr; gr.Payments.Add(p); });

Single pass over the masters; each master consumes its details from list 1, then from list 2. Both detail lists must be sorted by the master FK.

3. Master + Detail + SubDetail (three levels nested)

For hierarchies like GrandRecord → Record → ChildRecord:

var (grandRecords, records, childRecords) =
    repo.ReadToLists<GrandRecord, Record, ChildRecord>("dbo.GetThreeLevelGraph");

grandRecords.MergeJoin(
    records,
    (gr, r) => gr.Id == r.GrandRecordId,
    (gr, r) => { r.GrandRecord = gr; gr.Records.Add(r); },

    childRecords,
    (r, cr) => r.Id == cr.RecordId,         // note: keyed on Record, not GrandRecord
    (r, cr) => { cr.Record = r; r.ChildRecords.Add(cr); });

Required ordering:

  • records sorted by GrandRecordId (so they walk in lock-step with the masters)
  • childRecords sorted by RecordId (so they walk in lock-step with the records)

When to choose this vs LINQ GroupJoin

Aspect MergeJoin LINQ GroupJoin
Time complexity O(N + M) O(N + M) average, hash collisions worst
Memory overhead None Hash dictionary, one entry per master
Precondition Both lists sorted by the join key None
API ergonomics Imperative — wires graphs in one expression Declarative — returns IEnumerable to materialise

If your details are already coming back from SQL with an explicit order by, MergeJoin is strictly cheaper. If they aren't and you don't want to control sort order, GroupJoin is fine and the dictionary cost is rarely the bottleneck.

Common pitfalls

  • Forgot order by in the SQL. Symptoms: details silently dropped, parent collections shorter than expected, no exception. Fix: always sort the detail SELECT by the foreign key it joins on.

  • Sorted by the wrong column. If Records are sorted by Id instead of GrandRecordId, the algorithm scans the detail list from the start for each master and never matches anything past the first run. Same symptom as above.

  • Different direction. Both lists must be sorted in the same direction (both ascending, typically). Mixing ASC and DESC will silently miss matches.

  • Forgot eachMasterAction on the empty-children case. A master with zero matching details will keep its Children property at null (or whatever initial value the mapper assigned). Use eachMasterAction to initialise the collection unconditionally.

Full end-to-end example

public IList<GrandRecord> GetAll()
{
    return GetByCommand(cmd =>
    {
        cmd.UseProcedure("dbo.GetGrandRecordsWithEverything");
        // Procedure returns:
        //   select * from GrandRecords order by Id;
        //   select * from Records      order by GrandRecordId;
        //   select * from ChildRecords order by RecordId;

        var (grandRecords, records, childRecords) =
            cmd.ReadToLists<GrandRecord, Record, ChildRecord>();

        grandRecords.MergeJoin(
            gr => gr.Records ??= new List<Record>(),

            records,
            (gr, r) => gr.Id == r.GrandRecordId,
            (gr, r) => { r.GrandRecord = gr; gr.Records.Add(r); },

            childRecords,
            (r, cr) => r.Id == cr.RecordId,
            (r, cr) => {
                r.ChildRecords ??= new List<ChildRecord>();
                cr.Record = r;
                r.ChildRecords.Add(cr);
            });

        return grandRecords;
    });
}

One round trip, three result sets, three lists, fully linked object graph.


See also:

⚠️ **GitHub.com Fallback** ⚠️