Linq joining to in-memory collections is bad

I found an interesting bug this morning, and figured I’d better write about it so that maybe I’ll remember to avoid it in the future.  We have a Linq query that involves a couple Joins.  The main table we’re interested in is already provided for us by a "FilteredNotes" property like this:

IQueryable<Note> notes = DataContext.Notes;
if (!canViewAllNoteTypes)
{
    if (canViewLegalNoteTypes)
        return DataContext.Notes.Where(n => legalCategoryIds.Contains(n.CategoryId));
    else
        return new List<Note>().AsQueryable();
}
...

The results of this pre-filtered set of data are then joined to two other tables to provide some additional filtering like so:

IQueryable<NoteSummary> results =
    from d in this.FilteredNotes
    join c in this.DataContext.Claims on d.ParentId equals c.ClaimId
    join e in this.DataContext.Enrollees on c.EnrolleeId equals e.EnrolleeId
    where d.ResponsiblePersonId == userId &&
          d.CompletedDate == null &&
          d.ScheduledDate >= startDate &&
          d.ScheduledDate <= endDate
    orderby d.ScheduledDate ascending
    select new NoteSummary
               {
                   ...
}; return results.ToList();

This code has been working great, until I started testing a new security role type.  When I am under the guise of this new type, the query would run for a long time, and then explode with an OutOfMemoryException.  The problem is actually really simple once you see it.  In that first section of code, the author wanted to return an empty list to join against, since the user wasn’t authorized to view any of the categories of notes that we’re interested in here.  The trouble is the way the empty list was created.  Creating a new list with nothing in it, and then calling AsQueryable() generated an empty IQueryable, but it did it in memory.  Since this is the main list to which we’re joining the others, Linq wants to pull them ALL into memory and do the join there.  Normally you might not notice, but since this database has millions of rows in each of the tables we’re joining to, it tends to make things bog down a bit.

The solution is to create our empty list, but to do it in a different way.  We want an IQueryable of Notes that will become part of a single Linq query sent to the database.  We can solve this by changing the FilteredNotes property to retrieve the IQueryable<Note> in a way that defers its execution, but still guarantees it will be blank:

IQueryable<Note> notes = DataContext.Notes;
if (!canViewAllNoteTypes)
{
    if (canViewLegalNoteTypes)
        return DataContext.Notes.Where(n => legalCategoryIds.Contains(n.CategoryId));
    else
        return DataContext.Notes.Where(n => false);
}
...

There, by changing the condition to be simple "false", the retrieval of the Notes can now be part of one big query, but we know that it’ll be empty when that happens.  Problem solved.

Technorati Tags: ,
Advertisement
This entry was posted in Computers and Internet. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s