Tuesday, May 18, 2010

Methods for Comparing Lists of Objects Based on a Single Property

Recently I had the need to get a new list of objects that results in a new list with all of the items in 'MyList1' that don't already exist in 'MyList2'. At 1st glace I thought I could use the Enumerable.Except function to accomplish this. However I soon came to realize that if the class representing this list has 10 properties, all 10 properties are checked to find the difference. This does make sense and works as intended, but in my case the results were not what I wanted because one of the properties was a timestamp, and even though an object in MyList1 had the same 'ID' as an object from 'MyList2', its timestamp property was different thus retuning that object into the new list as well. What I needed to do was base the returned list results on a single property: 'ID'. This is all that I cared about, so the default overload of the .Except method of an IEnumberable type was not going to work.

A few methods presented themselves on how to solve this issue. If you are reading this through and you do not need to discriminate differences of objects for a single or few properties and it is a strict 1:1 comparison, you are done! Just use the .Except method as shown below:

Dim MyList1 As New List(Of Customer)
Dim MyList2 As New List(Of Customer)
'...populate the above lists with data
Dim ItemsInList1NotInList2 As New List(Of Customer)
'Get all of the items in MyList1 that do not already exist in MyList2 using the .Except() method
ItemsInList1NotInList2 = MyList1.Except(MyList2)
The 1st method in solving this issue is to use the second overload of the .Except method to define the IEqualityComparer(Of T) to compare values. This involves implementing the IEqualityComparer on the class being compared and defining the methods required by the interface for a custom comparer. Now in my case the comparison was for a specific case and I wanted a solution that was more inline. I wouldn't want to define a compare method for the class unless it was definite that this was always how the class was to be compared. In my situation this was not true so I did not implement the interface, but check out the link below for an explanation and code example on implementing the IEqualityComparer interface:

Enumerable.Except(Of TSource) Method

The 1st of (2) methods that worked for a more localized inline solution where to use the methods exposed on an IQueryable source, such as the .Where() and .Select() methods in the System.Linq namespace. By calling the .Where method as an instance method on MyList1 we can define a predicate that will test for a condition and return the resulting values. In our case we want to pass into the Where() method a Lambda expression that will use anonymous functions to test for a condition and return the resulting values. Let's look at the code:

Dim MyList1 As New List(Of Customer)
Dim MyList2 As New List(Of Customer)
'...populate the above lists with data
Dim ItemsInList1NotInList2 As New List(Of Customer)
'Get all of the items in MyList1 that do not already exist in MyList2 using the .Where() method
ItemsInList1NotInList2 = MyList1.Where(Function(i) (MyList2.Select(Function(i2) i2.ID).Contains(i.ID) = False)).ToList()
So the above code can loosely be read from the inside out as "Select all IDs from MyList2 that do not exist in MyList1 and return them into a list of type MyList1". The .Where() method returns the elements from source that satisfy the condition specified by predicate. A predicate is a function that will test each element for a condition returning a Boolean value. In our case the Boolean value returned is True/False if the .ID in MyList2 exists in MyList1. Also notice the .Select() method called on MyList2 to project over the sequence of .ID values and use the index of each element in the projected form. Both IQuerable methods define an Anonymous method in VB.NET using the 'Function()' keyword accepting parameter of the type to be used. In our case the .Where() method takes the anonymous type 'i' which is of type MyList1 and the .Select() method takes an anonymous type 'i2' of type MyList2. If we were to decompile the above code we should see all of the values in MyList2 being iterated over to determine if the .ID value already exists in MyList1 and if not adding it to a new list of type MyList1 that is the result. Just remember that Lambda expression are just syntax sugar. In our case you could write the same code long hand by defining a delegate method that takes in the list types and iterates through them to get the same result. The Lambda expressions make our life as developers much easier by not having to write out so much code.

To me I probably prefer the method above to solve this issue because it is the most concise. However, those not familiar with anonymous types and Lambda expressions may not read the code above so well and want something a bit more explicit in definition. The 2nd method to solve our issue is to define a simple LINQ query dumping the result into a an anonymous type and then converting it into a list. I think the advantage of this 2nd method is it is much more readable. Let's take a look at the code:

Dim MyList1 As New List(Of Customer)
Dim MyList2 As New List(Of Customer)
'...populate the above lists with data
Dim ItemsInList1NotInList2 As New List(Of Customer)
'Get all of the items in MyList1 that do not already exist in MyList2 using a LINQ query
Dim query = From ItemsIn1 In MyList1 _
Where Not (From ItemsIn2 In MyList2 _
Select ItemsIn2.ID).Contains(ItemsIn1.ID) _
Select ItemsIn1
'Convert the LINQ query results into a list of objects
ItemsInList1NotInList2 = query.ToList()
Even though LINQ should not be confused with TSQL, the above query does have attributes of a typical SQL query. It is essentially a 'NOT' clause with LINQ sprinkled in. It reads loosely as follows: "Select all of the items in MyList2 that are also in Mylist1, and exclude these items (using NOT) from everything that is in MyList1; finally Select the results. I think another advantage of this is it is faster to modify if you needed to make the comparison based on (2) properties as opposed to (1) as in my examples. Both solutions work identically and will produce the same results.

So to review, we discussed (3) methods for comparing (2) lists to get only the items in the 1st list that don't exist in the 2nd list: the Enumerable.Except() method, the Queryable.Where() method, and a LINQ query. Each has their place, but all will help to quickly make a comparison that otherwise may have required a long had For-Each loop with a flag set for comparison differences in order to get the same result.

1 comment:

  1. Thank you for this.... this has helped me immensely after a 12 hour search... also helped me to understand => statements... cheers

    ReplyDelete