Wednesday, February 22, 2012

Entity Framework Code First - Relationship between Entities III [Loading Related Entities]

This post is amongst the series of our discussions about Entity Framework Code First. As an example, we have been building on top of Institute Entities. There are two things which you might have noticed in the example that we have been following. Let me bring them up:
  1. Using virtual with associated entities in an entity's definition, e.g. for Student entity:
  2. public virtual ICollection StudentCourses { get; set; }
    public virtual Vehicle VehicleDetails { get; set; }
    public virtual Department StudentDepartment { get; set; }
    
  3. Using Include in the iterator for loading courses as follows:
  4. foreach (var course in instituteEntites.Courses
                                .Include("CourseStudents")
                                .Include("CourseStudents.VehicleDetails")
                                .Include("CourseStudents.StudentDepartment")
                                .Include("CourseOfferedAt"))
Both of these are kind of related to this discussion.

Why different data loading mechanisms for related entities?
First of all, we need to understand why we need these different mechanisms for loading data. The reason is that we want optimized database access in order to avoid unnecessary number of round trips and the amount of data loaded. We must remember that database resides in an outside system with network in between. It is an I/O operation which is costly. Entity framework must translate the relevant operations on entities to SQL statements. This feature is to guide the framework to generate more efficient SQLs.

It is to suggest to Entity framework about related entities whenever an entity is loaded. If A is an entity which has an EntityReference or an EntityCollection of another entity B. This feature allows to direct EF API if the related B's data should be loaded when A is loaded. If we need to just load A's data and loading B's data only when it is accessed first time then we need Lazy loading. If we know that we would always be needing the related B's data when A is loaded then we need Eager loading. On the other hand if we don't want to the related B's when A is loaded and request it separately if required independently or for related A's then it comes under Explicit Loading. Since EF Code First is mostly convention based, we need our POCO entities to identify these options using some conventions.

Types of Loading
Based on the above discussion it is apparent that rhere are three ways that relationship entities can load when one of the source entity is loaded. They are as follows:
  1. Explicit
  2. Lazy loading
  3. Eager loading

No Implicit Eager Loading
It must be remembered that that turning off lazy loading does not mean an implicit eager loading. There is nothing like implicit eager loading of entities in Entity Framework except for complex types. This is because all entities might be related to each other in some way or the other [Foreign key relationships in Databases]. If EF allows implicit eager loading then loading one entity would mean loading the whole database which might be very costly. Since this is implicit so the developers would have no idea what is going on behind the scenes unless SQL profiler is hooked up.

Lazy loading & Better Performance
The contrary is also not true i.e. lazy loading might or might not result in better performance. If we don't need data of related entities then the lazy loading would save us from querying additional data from the database. But if we need the navigational data then it means that there would be additional queries to the database when the navigational property is accessed. This might be unnecessary.

Proxy Is Entity Decorator
These types of loading is enabled in Entity Framework by introducing Proxies. Proxy is the implementation of Decorator Design pattern to provide additional capabilities to user defined entities. Both proxy and entity are not of certain type of abstraction but proxy does extend the entity. It inherits from the same POCO entity and creates a field for the POCO type it is decorating. Then it decorates it by adding some features. One of such feature is the loading behavior of related entities. Since Entity Framework is design with the idea of convention over configuration. Defining our POCO entities in a certain way is basically communicating to Entity framework about how we expect the property behavior in the proxy.

A public non-sealed navigation property defined as virtual in POCO entity is telling the Entity framework to enable lazy loading for the specified relationship entity. The relationship entity is not loaded when the main entity is loaded.

In the case of Student, since we have defined CourseStudents and CourseOfferedAt navigation properties as virtual. This would result in the behavior that these entities are not loaded when related Student entity is loaded. If we just load a Student entity like this, then the related Course(s) and CourseLocation(s) would not be part of the query. EF loads them lazily when required.

As a consumer of the Object Relational Mapping [ORM] tool we don't want our entities to be adulterated by inheriting from the types provided by Entity Framework. It used to be EntityObject in earlier version of Entity Framework. We also don't want to decorate it with certain attributes which makes us bind to a particular ORM tool. Entity framework still supports it by reusing some attributes provided by Data annotations library and some custom attributes developed for Entity Framework team at Microsoft. In order to understand various loading options of related entities we need to understand the concept of proxies in Entity Framework. We need to understand how it adds certain features of our entities without modifying them. The various features of proxies includes lazy loading and change tracking of entity's data. This makes it very clear that lazy loading is just a decoration of our entities provided by adding a proxy, so if we disable creation of proxy, no lazy loading is available. This is exactly what happens.
this.Configuration.LazyLoadingEnabled = false;
Additionally, the POCO classes must be designed such that they support proxy creation and lazy loading.The requirements include POCO entity must be defined as public and navigation property should support late binding by being virtual. The details can be found here.
http://msdn.microsoft.com/en-us/library/gg715126%28v=vs.103%29.aspx

The proxies created by entity framework for our entities is certainly the decoration of these entities. This adds the feature of lazy loading and change tracking to the entity. This is added without modifying the entity which is based on OPEN / CLOSE principle. The proxy can be used wherever the original entity is expected. This is based on LISKOV Substitution principle. Remember that Entity framework is smart enough to recognize the need of proxy creation. If the entity is defined in such a way that creating it would not add any value i.e. change tracking or lazy loading then EF would not create a proxy and work with the entity type itself. Proxy creation can also be disabled by DbContext which would also result in use of entity types directly.

Choice of Loading option and its Consequences
Generally the settings for lazy loading is specified in the constructor of sub-type of DbContext. In the following example, we are turning on the lazy loading for all entities. This would be the default behavior of entities which they can override.

class InstituteEntities : DbContext
{
     //...
     public InstituteEntities()
     {
        this.Configuration.LazyLoadingEnabled = true;          
     }
     //...
}

Now individual entities can override this behavior for their navigational properties as follows:
  1. Defining a navigational property as virtual would cause lazy loading of the navigational entity.
  2. Defining a navigational property as non-virtual would not cause lazy loading. Now this may be loaded explicitly by using Load method on EntityCollection / EntityReference, or using Include method and specifying query path.
On the other hand, if we define the DbContext not to support lazy loading then we can not override this behavior if they need lazy loading of related entities. So the best option is to support lazy loading and eagerly / explicitly loading whenever required so that you could save extra cost of additional database queries to the database. In order to support lazy loading, the cost is the creation of proxies by the framework.
Disabling lazy loading will not support deferred loading of entities when they are needed using related entities. Now the only options remaining are explicit and eager loading. Both of them are specified through code.
Eager Loading

How to make a decision?
The decision between the pattern for loading related entities should be a deliberate one. This is because it would impact the overall performance of the system. Consider the difference as to be the two different concepts of locality of references. As we know there are two ideas for locality of reference, they are:
  1. Temporal Locality
  2. Spatial Locality
The whole caching idea is based on these two ideas of locality of reference. Based on the first one, there are more chances that the data accessed once would be needed again. So instead of bringing that data again, it should be stored in a readily available storage. The latter suggests that there are certain requirement of data for each different process. so if some data is accessed, then there are more probability that some data in the neighborhood would be required next so bring that in some readily available storage and make it available if needed. The usage of either or both depends on your implementation of caching. The eager loading is based on spatial locality. If we need data of an entity then there is higher probability that the data of related entities would also be needed. Database access is basically requesting data from an external system called DBMS [DataBase Management System] which is obviously a costly operation. If you think that your data requirement would be based on spatial locality and your data access requirements suggest the requirement of data access of related entities is more probable then use Eager Loading, otherwise, use implicit or lazy loading.

Entity Definition
Let's see what changes in Student entity would be needed to support lazy loading. First we have changed the access modifier to public. Additionally we have three related entities of Student. They are StudentCourse, VehicleDetails and StudentDepartment of type Course, Vehicle and Department respectively. Here VehicleDetails and StudentDepartment are defined as virtual to support lazy loading.
namespace EFCodeFirstDatabaseCreation.Entities
{
    using System.Collections.Generic;

    public class Student
    {
        public int StudentId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public int GradePointAverage { get; set; }
        public bool IsOutStanding { get; set; }
        
        //no lazy loading
        public ICollection<Course> StudentCourses { get; set; }
        
        //support lazy loading
        public virtual Vehicle VehicleDetails { get; set; }
        public virtual Department StudentDepartment { get; set; }
    }
}
Similarly, Course entity is also updated as follows:
namespace EFCodeFirstDatabaseCreation.Entities
{
    using System.Collections.Generic;

    public class Course
    {
        public int CourseId { get; set; }
        public string CourseName { get; set; }

        //no lazy loading
        public ICollection<Student> CourseStudents { get; set; }
        
        //support lazy loading
        public virtual ICollection<CourseLocation> CourseOfferedAt { get; set; }
    }
}

Let's see how LocationAddress is defined. Since this is a ComplexType, it does not support lazy loading.
namespace EFCodeFirstDatabaseCreation.Entities
{
    using System.Collections.Generic;
    using System.ComponentModel.DataAnnotations;

    public class CourseLocation
    {
        public int CourseLocationId { get; set; }
        public string LocationName { get; set; }
        public LocationAddress Address { get; set; }
        public virtual ICollection<Course> CoursesOffered { get; set; }
    }

    public class LocationAddress
    {
        public string StreetAddress { get; set; }
        public string Apartment { get; set; }
        public string City { get; set; }
        public string StateProvince { get; set; }
        public string ZipCode { get; set; }
    }
}

Explicit Loading
Explicit loading would result in fetching the required entity's data from database if required. We can use Load method on an EntityReference or EntityCollection. ObjectContext's Load property can also be used for the same purpose.
private static void PrintStudentNames()
{
    using (var context = new InstituteEntities())
    {
        //explicit loading of all students
        context.Students.Load();
        foreach (var student in context.Students)
        {
            Console.WriteLine("Student Name: {0}", student.FirstName);
        }
    }
}
Let's see another example of Explicit loading. Here we are doing conditional explicit loading of students belonging only to a particular department.
private static void PrintDepartmentAndStudentsDetails()
{
    using (var context = new InstituteEntities())
    {
        //Explicit loading with query 
        foreach (var department in context.Departments)
        {
            Console.WriteLine(string.Format("Department Name: {0}", department.DepartmentName));
            context.Entry<Department>(department)
                .Collection(d => d.Students)
                .Query()
                .Where(s => s.StudentDepartment.DepartmentId == department.DepartmentId)
                .Load();

            foreach (var student in department.Students)
            {
                Console.WriteLine(string.Format("         Student: {0}", student.FirstName));
            }
        }

    }
}
In the following example, we are using Linq to Entities to query Department and its related Students entities. Here stds is of type DbQuery.
private static void PrintStudentAndTheirDepartments()
{
    using (var context = new InstituteEntities())
    {
        //Loading using Linq to Entities
        var stds = from d in context.Departments
                   from st in d.Students
                   select new { d.DepartmentName, st.FirstName };

        foreach (var s in stds)
        {
            Console.WriteLine(
                string.Format("Department: {0}, Student: {1}",
                    s.DepartmentName, s.FirstName));
        }
    }
}
Here DbQuery is the same class as the parent type for DbSet used with DbContext.


We can use an iterator to go through the items. It is defined as follows:


Lazy Loading Students
In the following example, we are getting any arbitrary department returned. We are then querying for all the students in the department and printing their details.
private static void LazyPrintStudentNames()
{
    using (var context = new InstituteEntities())
    {
        var firstDepartment = context.Departments.First<Department>();

        //lazy loading of all students
        foreach (var student in firstDepartment.Students)
        {
            Console.WriteLine("Student Name: {0}", student.FirstName);
        }
    }
}
Eager Loading
In Entity Framework Code First, Eager Loading is supported by using Include. It, not only supports, loading the direct relationship entities but we can traverse the hierarchy of relationship. See how we are loading the vehicle details of all the students for all departments. We can also ensure type safety by using expressions. Here we are loading Students data using that.
private static void PrintInstituteDetails()
{
    using (var context = new InstituteEntities())
    {
        //Eager loading of Vehicle and Studentsinformation
        foreach (var department in context.Departments
                                           .Include(d => d.Students)
                                           .Include("Students.VehicleDetails"))
        {
            Console.WriteLine(
                string.Format("Department Name: {0}", 
                        department.DepartmentName));

            foreach (var student in department.Students)
            {
                Console.WriteLine(
                    string.Format("***Student: {0}", student.FirstName));
                if (student.VehicleDetails != null)
                {
                    //Explicit loading using Load
                    context.Entry<Student>(student)
                        .Collection<Course>(s => s.StudentCourses)
                        .Load();

                    foreach (var c in student.StudentCourses)
                    {
                        Console.WriteLine("*********Course Name: {0}", c.CourseName);

                        //loading navigation property to a complex type
                        Console.WriteLine(
                            string.Format("Also offered at city: {0}", 
                            c.CourseOfferedAt.First<CourseLocation>().Address.City));
                    }
                }
            }
        }
    }
}

It must be remembered that complex types are always loaded in eager fashion. We don't need to use an Include for them as this is the default behavior of EF Code First. In the above example LocationAddress is a complex type. It is defined as complex type based on convention. We don't need to use complex type attribute if we are following the conventions.
public static void Main(string[] args)
{
    Database.SetInitializer<InstituteEntities>(new InstituteDatabaseInitializer());

    //Lazy loading
    LazyPrintStudentNames();

    //Linq to Entities
    PrintStudentAndTheirDepartments();

    //Explicit Loading
    PrintStudentNames();

    //Explicit loading with query
    PrintDepartmentAndStudentsDetails();

    //Explicit, Eager and Lazy Loading
    PrintInstituteDetails();

    Console.ReadLine();
}

All of these methods can be called in the Main method as follows:

You can see the definition of InstituteDatabaseInitializer in the attached code as this would not add any value to this discussion. When we run this we get the following output:


Download Code

No comments: