This post is learning's based on using LINQ to find aggregate sum from input text files. The Scenario is
- Parse tab delimited files
- Aggregate data based in few columns using LINQ queries
What is LINQ ?
Language Integrated Query. In Simple terms it provide querying capabilities in .NET languages ex-C#, TSQL queries operations - SUM, Group BY capabilites can be done in C# against a dataset
How LINQ works ?
Linq query consists of 3 parts
- Data source (In our example it is a DataTable)
- Query
- Execute Query
Command Tree is prepared and executed against data source during query execution. The Query is executed when the query variable is iterated. In below example the for loop iteration is the time when the LINQ query is executed, This is called deferred execution.
Coming to Exercise for the example
- Consider Input file is as per below format
Expected output contains
- Find MinDate and MaxDate based on Column2
- Aggregate SUM of Column4 based on Column2
The steps for solution include
Below is the solution code for a C# / VS2010 Console Application
- Read the text files
- Load Data into a Datatable
- Run LINQ queries to aggregate data
Earlier we have tried similar approach using hash tables in previous post. Below is the example code in a C# console Application
Below is the solution code for a C# / VS2010 Console Application
using System;
using
System.Collections.Generic;
using
System.Linq;
using
System.Text;
using System.IO;
using
System.Data;
namespace
LinqExample1
{
class Program
{
static void Main(string[]
args)
{
FindAggregate();
}
public static IEnumerable<string> ReadAsLines(string
filename)
{
using
(var reader = new
StreamReader(filename))
while
(!reader.EndOfStream)
yield
return reader.ReadLine();
}
public static string
FindAggregate()
{
try
{
var
filename = "E:\\sample.txt";
var
reader = ReadAsLines(filename);
//Define
Data Table
var
data = new DataTable();
//Add
Columns to Data Table
data.Columns.Add("Column1", typeof(string));
data.Columns.Add("Column2", typeof(string));
data.Columns.Add("Column3", typeof(string));
data.Columns.Add("Column4", typeof(int));
data.Columns.Add("Column5", typeof(string));
//Add
Rows to Data Table
foreach
(var record in
reader)
data.Rows.Add(record.Split('\t'));
//Run
Aggregate Query to find SUM of Column4 group by Column2
var
result_sum = from rowdata in data.AsEnumerable()
group rowdata by
rowdata["Column2"]
into groupData
select new
{
Group =
groupData.Key,
Sum =
groupData.Sum((r) => decimal.Parse(r["Column4"].ToString()))
};
//List
Sum Group By Column2
foreach
(var val in
result_sum)
{
Console.WriteLine("Column2 {0}, Total Value {1} \n", val.Group,
val.Sum);
}
//Find
Min and Max Date based on Column2
var
result_date = from rowdata in data.AsEnumerable()
group rowdata by
rowdata["Column2"]
into groupData
select new
{
type =
groupData.Key,
MinDate =
groupData.Min(record => record["Column5"]),
MaxDate =
groupData.Max(record => record["Column5"])
};
//List
MinDate, MaxDate Group By Column2
foreach
(var val in
result_date)
{
Console.WriteLine("Column2
Value {0}, Max Date {1} , Min Date {2} \n", val.type ,val.MaxDate,
val.MinDate);
}
Console.ReadKey();
return
"0";
}
catch
(Exception EX)
{
return
"-1";
}
}
}
}
Below is the actual output
More Reads
Happy Learning!!!
No comments:
Post a Comment