Monday, January 2, 2012

Leveraging Parallelism in the .NET Framework 4.0 for Asynchronous Programming

As I have mentioned in some of my previous previous posts it's amazing how easy it is to preform asynchronous processing in the later versions of the .NET Framework especially in .NET 4.0. The abstractions provided within the framework expose methods and libraries to be able to easily parallelize your queries or run tasks in parallel without all the hassle of managing your own threads. The idea is to harness the power and avalaibale threads on multi-core platforms today without having to actual manage the thread allocation and use yourself. This functionality is exposed in PLINQ (Parallel Language Integrated Query) and the TPL (Task Parallel Library) and are what I will be highlighting today.

1st let's take a look at parallelizing your existing LINQ query's with PLINQ. We are going to work with a simple 'Employee' class that has a few properties and dummy methods (implementation code for dummy methods is not the focus here) and another class named 'Payroll' with a method named 'CalculatePayRate' as shown below:
Public Class Employee
Public Function FindAll() As List(Of Employee)
'Notice the use of collection initializers that were added in VS.NET 2010 for VB.NET
Dim Employees = New List(Of Employee) From
New Employee() With {.FirstName = "John", .LastName = "Smith", .JobCode = 2, .ServiceYears = 1, .Age = 25, .IsActive = True},
New Employee() With {.FirstName = "Allen", .LastName = "Conway", .JobCode = 4, .ServiceYears = 8, .Age = 32, .IsActive = True},
New Employee() With {.FirstName = "Jane", .LastName = "Hulu", .JobCode = 6, .ServiceYears = 12, .Age = 40, .IsActive = True},
New Employee() With {.FirstName = "Clark", .LastName = "Griswold", .JobCode = 1, .ServiceYears = 4, .Age = 52, .IsActive = False},
New Employee() With {.FirstName = "Paul", .LastName = "Jacks", .JobCode = 15, .ServiceYears = 70, .Age = 93, .IsActive = True}
Return Employees.ToList()
End Function

Public Property Age As Integer
Public Property FirstName As String
Public Property IsActive As Boolean
Public Property LastName As String
Public Property JobCode As Integer
Public Property PayRate As Decimal
Public Property ServiceYears As Integer

Public Sub DoSomething1()
End Sub

Public Sub DoSomething2()
End Sub

Public Sub DoSomething3()
End Sub

Public Function GetEmployeeList1() As List(Of Employee)
'Code here
End Function

Public Function GetEmployeeList2() As List(Of Employee)
'Code here
End Function

Public Function GetEmployeeList3() As List(Of Employee)
'Code here
End Function
End Class

Public Class Payroll
Public Function CalculatePayRate(ByVal JobCode As Integer, ByVal YearsOfService As Integer, ByVal Age As Integer) As Decimal
If (JobCode >= 1 And JobCode <= 5) And (YearsOfService <= 5) Then Return 10.5 ElseIf (JobCode >= 1 And JobCode <= 5) And (YearsOfService > 5) Then
Return 12
ElseIf (JobCode > 5 And JobCode <= 10) And (YearsOfService <= 5) Then Return 22.5 ElseIf (JobCode > 5 And JobCode <= 10) And (YearsOfService > 5) Then
Return 25
ElseIf (JobCode > 10) And (Age > 90) Then
'Reward hard work!
Return 150
Return 8
End If
End Function
End Class
I have set up a simple method below named 'EmployeeCalcPayRateUsingPLINQ' which will demonstrate getting a list of active Employees using LINQ and then calling the 'CalculatePayRate' both synchronously the traditional way using a For-Each loop (commented out in code) and then using PLINQ to make the same call in parallel.

All you need to do to parallelize your LINQ queries is to add the 'AsParallel' call at the end of your query. This will expose several of the PLINQ extension methods like the one we will be using today: 'ForAll'. The ForAll(Of TSource) parallel extension method will take a lambda expression that defines the code or delegate of the method to be called in parallel. We can use .ForAll() when we need to perform some action on every item in the source. In our case we need to manipulate the Employee's .PayRate value. This could be done synchronously, but is a prime candidate to be done in parallel because 1 employee’s payrate has no bearing on another’s so they can be calculated and processed in parallel. Let's take a look at the code:
Public Sub EmployeeCalcPayRateUsingPLINQ()
Dim Employees As List(Of Employee) = Nothing
Dim myPayroll As New Payroll
Dim myEmployee As New Employee

'Get a list of all employees
Employees = myEmployee.FindAll()

'Using LINQ to Objects, create a query that contains all 'Active' employees
Dim ActiveEmpQuery = From Emp In Employees
Where Emp.IsActive
Select Emp

'Using brute-force syncronous loop processing to calculate and set the payrate for each Employee object:
'Uncomment to try syncronous loop version
'For Each Emp As Employee In ActiveEmpQuery
' Emp.PayRate = myPayroll.CalculatePayRate(Emp.JobCode, Emp.ServiceYears, Emp.Age)

'Parallelize the LINQ query by calling the .ForAll extension method.
'Pass in a Lambda Expression that defines the delegate to be called in parallel
'that will calculate the payrate on each Employee instance.
Emp.PayRate = myPayroll.CalculatePayRate(Emp.JobCode, Emp.ServiceYears, Emp.Age)
End Sub)
End Sub
As you can see with the example above, the PLINQ code and the traditional For-Each loop are the same line of code. However, the real gain here is in the time to process. If the list of active employees is 5000, the PLINQ .ForAll method will be able to use any available threads to run in parallel vs. the traditional loop which will process all 5000 records 1 at a time. The net gain is in the reduction of processing time since the PLINQ will run in parallel.

The Task Parallel Library (TPL) is another technology available in the .NET 4.0 Framework and is exposed in a set of APIs in the System.Threading and System.Threading.Tasks namespaces. The TPL will allow us to parallelize 'Tasks'. We can use the TaskFactory.StartNew(Action) method to start a task in parallel. This allows us to call several method in parallel that otherwise would have been to called synchronously. For example if we have (3) methods to call all which are unrelated and non-dependent of each other but must be called to complete processing, TPL can assist greatly. Especially if the call to the 1st method is longer running and blocks subsequent calls for being made. Using TaskFactory.StartNew(Action) is good for calling methods that return a result, and I will also highlight the Parallel.Invoke() method for calling methods with no return value in parallel. So let's 1st look at the code to create Tasks to be run in parallel and then extract their results:
Public Sub EmployeeProcessingUsingTPLTasks()
Dim Results1 As List(Of Employee) = Nothing
Dim Results2 As List(Of Employee) = Nothing
Dim Results3 As List(Of Employee) = Nothing
Dim AllEmployeeData As New List(Of List(Of Employee))
Dim Emp As New Employee()

'Create a Task that will call the Employee.GetEmployeeList1() method having its results dumped into 'Task1'
Dim Task1 = Task(Of List(Of Employee)).Factory.StartNew(Function() Emp.GetEmployeeList1())
'Create a Task that will call the Employee.GetEmployeeList2() method having its results dumped into 'Task2'
Dim Task2 = Task(Of List(Of Employee)).Factory.StartNew(Function() Emp.GetEmployeeList2())
'Create a Task that will call the Employee.GetEmployeeList3() method having its results dumped into 'Task3'
Dim Task3 = Task(Of List(Of Employee)).Factory.StartNew(Function() Emp.GetEmployeeList3())

'Note: each call to [Task].Result ensures that the asynchronous operation is complete before returning (built in functionality)
'Add the List from the Task1.Result to the collection in location '0'
AllEmployeeData.Insert(0, Task1.Result)
'Add the List from the Task2.Result to the collection in location '1'
AllEmployeeData.Insert(1, Task2.Result)
'Add the List from the Task3.Result to the collection in location '3'
AllEmployeeData.Insert(2, Task3.Result)
End Sub
In the example code below Tasks are started and run and their results are available from calling the Task.Result() property. Note that a call to [Task].Result ensures that the asynchronous operation is complete before returning (built in functionality). So this is where a synchronous behavior could appear if accessing a Result from a function that had not yet completed; the thread would block until its results were available. However the Tasks themselves were all run in Parallel and you could use this method to also call methods that have no return value, and which the .Result property didn't need to be accessed.

The last example I will show is using the Parallel.Invoke method to run 1..n methods in parallel. The code is pretty straight forward so let's look at calling a few methods on the Employee class in parallel:
Public Sub EmployeeProcessingUsingTPLParallelInvoke()
Dim Emp As New Employee()

'Make parallel individual calls via the TPL's Parallel.Invoke
Parallel.Invoke(Sub() Emp.DoSomething1(),
Sub() Emp.DoSomething2(),
Sub() Emp.DoSomething3())
End Sub
As you can see individual 'DoSomething()' methods were called in parallel and handled by the TPL. I think this can be helpful when needing to call several individual non-dependent methods but yet to another process behave as an aggregate or composite of calls needed to complete a single operation.

I have just discussed and provided a few examples of using PLINQ and the TPL in the .NET Framework 4.0. These examples barely scrape the surface of these topics and some of the lower level performance relationships that exist in relationship to the hardware the code it is trying to expose. I recommend trying out these examples and then building upon them. At any rate you can see how powerful just a few lines of code can be to any of your .NET applications.

No comments:

Post a Comment