Entity Framework 4 – Using Eager Loading

May 24 2009May 24, 2009

When Linq To Sql was released we were told that it did support eager loading.
Which was a bit misleading, it did allow us to fetch the data we wanted upfront, but it did so by issuing one database query per object in the result set.
That is, one query per collection per object, which is a complete performance nightmare. (Ripple loading)

Now in Entity Framework 4, we can actually do true eager loading.
EF4 will issue a single query that fetches the data for all the objects in a graph.
This have been possible in other mappers for a long time, but I still think it is awesome that Microsoft have finally listened to the community and created a framework that from what I’ve seen so far, does exactly what we want.

So how do you use eager loading in EF4 ?

Eager loading is activated by calling “ObjectSet[of T].Include(“Details.Product”)”, that is, a dot separated property path.
You can also call include multiple times if you want to load different paths in the same query.

There are also a few attempts out in the blog world to try to make it easier to deal with eager loading, e.g. by trying to remove the untyped string and use lambda expressions instead.

I personally don’t like the lambda approach since you can’t traverse a collection property that way; “Orders.Details.Product” , there is no way to write that as a short and simple lambda.

My own take on this is to use extension methods instead.
I always use eager loading on my aggregates, so I want a simple way to tell my EF context to add the load spans for my aggregates when I issue a query.
(Aggregates are about consistency, and Lazy Load causes consistency issues within the aggregate, so I try to avoid that)

Here is how I create my exstension methods for loading complete aggregates:

public static class ContextExtensions
{
  public static ObjectQuery<Order> 
           AsOrderAggregate(this ObjectSet<Order> self)
  {
    return self
        .Include("Details.ProductSnapshot")
        .Include("CustomerSnapshot");
  }
}

This makes it possible to use the load spans directly on my context without adding anything special to the context itself.
(You can of course add this very same method inside your context if you want, I simply like small interfaces that I can extend from the outside)

This way, you can now issue a query using load spans like this:

var orders = from order in context.OrderSet.AsOrderAggregate()
             select order;

And if you want to make a projection query you can simply drop the “AsOrderAggregate” and fetch what you want.

HTH.

//Roger

Published by Roger Johansson

View all posts by Roger Johansson

15 Comments

wim says:

May 24, 2009 at 5:54 pm

hey,

first of all, I haven’t tried using this. So I don’t know how ‘hard’ it is to replace the string with a lambda, but I think lambda’s are the better approach for one simple reason: compile time evaluation of the expression. spelling mistakes/refactoring of properties can lead to run time errors easily this way.
I like the idea of using extension methods to attach more includes though. that makes the whole think a lot more easy to “bulk” load objects!

Reply
Fred Morrison says:

May 26, 2009 at 12:53 pm

Your example uses too many “magic strings” (no compile-time checking) in the form of “Details.ProductSnapshot” and “CustomerSnapshot”. Until EF4 supports strong typing for eagar loading, I will ignore it. Please tell me I’m wrong with another example.

Reply
Roger Alsing says:

May 26, 2009 at 1:43 pm

Compile time checks are always nicer than magic strings.
But due to how lambdas work, you CAN’t write a short expression which means the same as “Details.ProductSnapshot”.

Simply because productsnapshot is not a property of the details collection, it is a property of the _content_ of the details collection.

So that would be an invalid C# expression.
And you would have to resort to something like:

.Include(o => o.Details).And(d => d.ProductSnapshot);

That would work, but IMO, it’s ugly like crap and alot worse to write than the short untyped string.

Also, magic strings that cause problems should be found in unit tests.
So as long as you actually have tests, strings are all OK.

Also, you should really think about the options here.

1) Use eager loading and get consistent data

2) Use lazy loading and get data inconsistency due to the fact that data is loaded at different points of time.

Lazy Loading is evil and should only be used if you really have to or don’t care about data correctness.

Reply
Fred Morrison says:

May 26, 2009 at 2:15 pm

Lazy loading may be appropriate for systems where the security level of the user is such that all data is going to be updatable *AND* the data is deeply structured in the form of a tree. Betting that not all leaves of every node in a tree structure that appears on a web page will be visited by the user is a performance trade-off bet I’d be willing to make in exchange for a faster/better user experience. Throw in some intelligent (local as well as server) data (not just query) caching and I might be persuaded to avoid lazy loading.

Reply
Roger Alsing says:

May 26, 2009 at 3:51 pm

>>Betting that not all leaves of every node in a tree
>>structure that appears on a web page will be visited by the user

You can easily deal with this by only eager loading the nodes that you _know_ that you want to touch and leave all other paths unloaded.

I also think that “explicit” lazy load is OK.
Atleast the developer is aware that something is going on there.

“Implicit” Lazy Load is dangerous because it hides explicit behavior which can lead to data consistency problems and severe performance issues.

I say “can” because if you know enough about LL, then it is not much of a problem.
But many many developers are far to unaware about the inner workings of O/R mappers, and thus doesn’t know how to deal with LL.

I’ve seen systems where a single web page issues aprox 1000 DB requests just because the developers didn’t know how to reason about LL.

So what I’m saying is that it is not so much LL itself that is the problem , but the lack of knowledge around the topic in general.

Reply
Fred Morrison says:

May 26, 2009 at 4:18 pm

100% agree with the danger of implict lazy load.

Reply
Stijn says:

July 14, 2009 at 6:10 pm

There is already a good way for using lambda’s for eager loading on collections without losing readability:

1. var query = from p
2. in context.ProductSet
3. .Include(p => p.PriceHistory)
4. .Include(p => p.Suppliers.First().Address)
5. where p.SalePrice > 1000
6. select p;

http://www.codetuning.net/blog/post/Entity-Framework-compile-safe-Includes.aspx

The only difference is you have to use First() on collections, if your coverage is not 100%, which is most often the case in most projects, I would definitely recommend lambda’s over strings. The downside is that it uses reflection, if you care about “micromilliseconds” you should not use this.

But nonetheless, if people prefer strings this is a very maintainable method.

Reply
Roger Alsing says:

July 15, 2009 at 9:57 pm

The problem with the above solution is that it is only type safe, but not “intention” safe.

There is no compiletime checks for what you pass into a lambda expression.

eg.
I could just as well do:

.Inclue(p => “yeehaa”)
.Include(p => p.Supplicers.Sum().ToString())

etc.

There is _nothing_ that prevents the developer to put crap inside those statements.
The only thing it guarantees at compile time is that is a valid expression.

Reply
Stijn says:

July 15, 2009 at 10:34 pm

Definitely true, but there is a guarantee -if used correctly- that future changes to your edm will always keep your queries working. String based includes will never guarantee you on that one.

Reply
Marco says:

August 15, 2009 at 9:36 pm

For what it’s worth, I have been using a hybrid: static readonly strings that get their value from a lambda turned into a string. Good performance (only have the reflection overhead once), I keep programmers from ‘putting crap in there left and right’ and the actual path (lambda or string) only has to be maintained in one spot: if it changes, only one spot to fix it. Kind of comes down to “do whatever works for you”.

Reply
bdgbd says:

August 17, 2009 at 2:39 pm

Why is this in the context of EF4? Current version of EF already has .Include()

Reply
Roger Alsing says:

August 21, 2009 at 8:51 am

@Marco, sounds interesting, got any public samples of that?

Do you simply do something like:

public static readonly string SomePath = GetPath<Order>(o => o.Details);

or how is it done?

Reply
M.W. says:

January 8, 2010 at 2:16 pm

I created an extension method to traverse multiple paths – going from a multiple – to – one set does need something else but this method allows you to pass in multiple properties in a lambda expression:

http://howdoinetmw.blogspot.com/2009/12/how-do-i-create-type-safe-includes-with.html

Reply
Pingback: Talking about Entity Framework 4 – Using Eager Loading « Roger Alsing Weblog « EdenSoft Space
Sagan Internet Marketing says:

June 13, 2013 at 5:50 am

Just an FYI, to load collections you can do:

1. var query = from p
2. in context.ProductSet
3. .Include(p => p.PriceHistory)
4. .Include(p => p.Suppliers.Select(x =>x.Phones.Select(y => y.Mobiles.Select(z => x.Cell))))
5. where p.SalePrice > 1000
6. select p;

In any case, I’ve worked extensively with EntityFramework and NHibernate and, God willing, will never again do another one in Entity Framework. It is horrific compared to NHibernate.

Reply