"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

April 21, 2012

C# LINQ Basics - Tool Developer Notes - Part IX

[Previous Post in Series - Tool Developer Notes Part VIII]

This post is learning's based on using LINQ to find aggregate sum from input text files. The Scenario is
  • Parse tab delimited files
  • Aggregate data based in few columns using LINQ queries
I am pretty much comfortable loading the data into SQL server and run TSQL queries to aggregate data in Database tables. LINQ was pretty good learning.

What is LINQ ?
Language Integrated Query. In Simple terms it provide querying capabilities in .NET languages ex-C#, TSQL queries operations - SUM, Group BY capabilites can be done in C# against a dataset
How LINQ works ?
Linq query consists of 3 parts
  •  Data source (In our example it is a DataTable)
  •  Query 
  •  Execute Query
Command Tree is prepared and executed against data source during query execution. The Query is executed when the query variable is iterated. In below example the for loop iteration is the time when the LINQ query is executed, This is called deferred execution.
More reads Link - Link1, Link2

Coming to Exercise for the example
  • Consider Input file is as per below format

Expected output contains
  • Find MinDate and MaxDate based on Column2
  • Aggregate SUM of Column4 based on Column2
Below Stackoverflow post provided useful directions for arriving at solution. Link1, Link2  
The steps for solution include
  • Read the text files
  • Load Data into a Datatable
  • Run LINQ queries to aggregate data
Earlier we have tried similar approach using hash tables in previous post. Below is the example code in a C# console Application

Below is the solution code for a C# / VS2010 Console Application
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
namespace LinqExample1
{
    class Program
    {
        static void Main(string[] args)
        {
            FindAggregate();
        }
        public static IEnumerable<string> ReadAsLines(string filename)
        {
            using (var reader = new StreamReader(filename))
                while (!reader.EndOfStream)
                    yield return reader.ReadLine();
        }

        public static string FindAggregate()
        {
            try
            {
                var filename = "E:\\sample.txt";
                var reader = ReadAsLines(filename);
               
                //Define Data Table
                var data = new DataTable();

                //Add Columns to Data Table
                data.Columns.Add("Column1", typeof(string));
                data.Columns.Add("Column2", typeof(string));
                data.Columns.Add("Column3", typeof(string));
                data.Columns.Add("Column4", typeof(int));
                data.Columns.Add("Column5", typeof(string));
        
                //Add Rows to Data Table
                foreach (var record in reader)
                    data.Rows.Add(record.Split('\t'));

                //Run Aggregate Query to find SUM of Column4 group by Column2
                var result_sum = from rowdata in data.AsEnumerable()
                                 group rowdata by rowdata["Column2"]
                                 into groupData
                                 select new
                                 {
                                     Group = groupData.Key,
                                     Sum = groupData.Sum((r) => decimal.Parse(r["Column4"].ToString()))
                                 };
                //List Sum Group By Column2
                foreach (var val in result_sum)
                {
                    Console.WriteLine("Column2 {0}, Total Value {1} \n", val.Group, val.Sum);
                }
                //Find Min and Max Date based on Column2
                var result_date = from rowdata in data.AsEnumerable()
                                  group rowdata by rowdata["Column2"]
                                      into groupData
                                  select new
                                  {
                                      type = groupData.Key,
                                      MinDate = groupData.Min(record => record["Column5"]),
                                      MaxDate = groupData.Max(record => record["Column5"])
                                  };
                //List MinDate, MaxDate Group By Column2
                foreach (var val in result_date)
                {
                        Console.WriteLine("Column2 Value {0}, Max Date {1} , Min Date {2} \n", val.type ,val.MaxDate, val.MinDate);
                }
                Console.ReadKey();
                return "0";
            }
            catch (Exception EX)
            {
                return "-1";
            }

        }
    }
}

Below is the actual output
More Reads
Happy Learning!!!

No comments: