"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 31, 2012

Hadapt (HadoopDB), H-Store(OLTP), F1(Google Distributed RDBMS)

Hadapt, H-Store and F1 database products that will make an impact in coming months
Blog dbms musings is very very good. It is a must follow blog for every database developer. Daniel Abadi research on Columnar Database was commercialised as Vertica. Two more interesting Database projects he is working are Hadapt and H-Store.

Hadapt 
  • Hadapt is targeted for OLAP, Analytics
  • Hadapt, a commericalization of HadoopDB
  • Power of Hadoop, MapReduce is implemented by Hadapt
  • More Reads - Link1, Link2
H-Store - Next Generation OLTP Database Research
  • Targeted for OLTP Systems
  • Focussed on Reads Intensive, Real time Processing
  • Architecture and Documentation Link
Another interesing presentation from Google - F1 (Fault Tolerant Distributed DBMS)
  • Based on Hierarchical Schema
  • Supports RDBMS properties
  • Used for Real-Time Google Ads
  • Impressive Architecture
  • Presentation Link
Complete Architecture, Source Code is available for HadoopDB & H-Store. It will be a very good learning to understand the documentation and implementation of these products.

Hoping to explore them in next posts.

Happy Continuous Learning!!!!

May 29, 2012

Recommendation Engine for Ecommerce Sites


Excellent post on designing Recommendation Engine. Very good approach by Ricky Ho.
 
More Reads
 
 Happy Reading!!!

May 26, 2012

NOSQL Basics

[You may also like - NOSQL - can it replace RDBMS Databases]
Deep Dive is very important to understand the basics/fundamentals of product design. I have explored a couple of NOSQL database products. Based on readings from blogs/papers. I have tried to document the underlying fundamentals behind NOSQL Databases

Tip #1 - NOSQL Stands for “Not Only SQL”
Tip #2 - ACID properites - What it is all about ?
From Earlier Post - ACID Properties short RECAP
  • Atomicity - Transaction is one unit of work. All or None. Either all of its data modifications are performed or none of them are performed
  • Consistency - Transaction must leave the database in a consistent state. Maintain data integrity. Governing Data Structures for Indexes/Storage must be in a correct state. If a Transaction violates a constraint it must be failed.
  • Isolation - Keep Transaction Separate. Concurrency is governed by Isolation levels.
  • Durability - In the case of System failures changes persist and can be recovered on abnormal termination
Tip #3 - What is CAP Theorem. In Every NOSQL White paper there is a reference to the CAP theorem.

CAP stands for consistency, availability and partition tolerance 

Short and easy summary of it I found from link
  • Consistency - Consistent (Latest) Data Reflected querying any server in Distributed Environment
  • Availability - Data Returned from Server irrespective it is latest / last updated data
  • Partition Tolerance - System is available even if individual nodes are down
As per CAP Theorem only two parameters can be targeted for complete support. To Summarize it
  • As per CAP Theorem, RDBMS targets Consistency & Partition Tolerance
  • NoSQL targets Availability and Partition Tolerance
Tip #4 - What is MVCC? While working on NoSQL DB, I noticed MVCC for versioning/managing locks.  
  
MVCC refers to Multiversion concurrency control. MVCC Managing providing latest committed updates for read transactions by versioning. Here with versions present Reads will not block writes. MSSQL 2005 onwards we have Snapshot isolation feature. This is also based on the versioning concept. Reposting my notes on how snapshot isolation is achieved in MSSQL

READ COMMITTED SNAPSHOT using Row Versioning in Microsoft SQL Server 2005 onwards (Applicable for 2008, 2012..) 
a. How it works - A new data snapshot is taken and remains consistent for each statement until the statement finishes execution.   uses a version store and reads the data from the version store.

b. How it solves the concurrency issues  
  • SELECT statements do not lock data during a read operation  (readers do not block writers, and vice versa).  
c. Performance Advantages 
  • SELECT statements can access the last committed value of the row, while other transactions are updating the row without getting blocked
  • Reduces disk contention on the data files. Reducing locking resources, readers do not block writers. No more deadlocks involving readers and writers.  
d. Resource usage and overhead
  •  Row versioning increases resource usage during data modification as row versions are maintained in tempdb. tempdb growth, contention. Additional memory usage.
Tip #5 - Below are common list of features implemented by NoSQL Databases and advantages of it 
  • NO Schema Reqd - Data Types need not be defined
  • CouchDB also uses MVCC for managing versions of data (Good Read Link )
  • Auto Sharding - Spread data across servers to scale out
  • Support for Replication
Still I have a long way to go to understand NOSQL, I am planning to explore NOSQL Database Architecture in detail in coming posts.
Happy Learning!!!!

May 20, 2012

C# - Excel Reading Data, Optional Method Parameters - Part 12

[Previous Post in Series - C# Basics - Tool Developer Notes Part XI]

 

Tip #1 - Reading from Excel
Please find below template excel with columns and data listed below. Objective to read all the worksheets and display data available in individual sheets

You need to use namespace Microsoft.Office.Interop.Excel.


Console Application in C# provided below (.NET 4.0)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Excel = Microsoft.Office.Interop.Excel;
namespace ExcelPOCDemo
{
    public class Program
    {
        static void Main(string[] args)
        {
            Program P = new Program();
            P.LoadDataFromExcel();
        }
        public string LoadDataFromExcel()
        {
            try
            {
                Excel.Application ExcelApp = new Excel.Application();
                Excel.Workbook xlWorkbook = ExcelApp.Workbooks.Open("E:\\Book1.xlsx");
                foreach (Excel.Worksheet Sheet in xlWorkbook.Worksheets)
                {
                       Console.WriteLine(" WorkSheet Name is " + Sheet.Name);
                       Excel.Range DataRange = Sheet.UsedRange;
                       int rowCount = DataRange.Rows.Count;
                       int colCount = DataRange.Columns.Count;
                       for (int i = 1; i <= rowCount; i++)
                       {
                            for(int j = 1; j<=colCount; j++)
                            {
                                Console.WriteLine("Value of i is " + i + " Value of J is " + j + " Data is " + DataRange.Cells[i, j].Value2.ToString());
                            }
                        }
                }
                ExcelApp.Workbooks.Close();
                ExcelApp.Quit();
                Console.ReadLine();
                return "0";
            }
            catch(Exception Ex)
            {
                return "-1";
            }
        }
    }
}

Tip #2 - Implementing optional parameters for method calls. Current code is implemented without any method parameter. For a new change I need to pass a method parameters. To support optional parameter implementation please find below example. C# 4.0 example, using namespace System.Runtime.InteropServices, Optional Keyword and Default Value. Simple C# Console Application

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
namespace OptionalParameters
{
    public class Program
    {
        static void Main(string[] args)
        {
            Program P = new Program();
            P.method();
            P.method("A");
            Console.ReadLine();
        }
        public void method([Optional, DefaultParameterValue(null)] string ParameterA)
        {
            if (ParameterA != null)
            {
                Console.WriteLine("Parameter sent is " + ParameterA.ToString());
            }
            else
            {
                Console.WriteLine("Parameter sent is null");
            }
        }
    }
}
Output result is


Tip #3 - Count number of files available in a folder - Answer
Tip #4 - Converting a list to a array. How to convert a list to an array

Sample List demo and Array Example
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ListArrayExample
{
    public class Program
    {
        static void Main(string[] args)
        {
            List<string> ElementList = new List<string>();
            ElementList.Add("ElementA");
            ElementList.Add("ElementB");
            ElementList.Add("ElementC");
            ElementList.Add("ElementD");
            string[] Elements;
            Elements = ElementList.ToArray();
            foreach (String Data in Elements)
            {
                Console.WriteLine("Value is " + Data);
            }
            Console.ReadLine();
        }
    }
}
Output Result is


Tip #5 - Clearing elements from an array. Thanks to this post for letting me know on Array.clear method to clear data present in array

Happy Learning!!!

May 18, 2012

Cloud Computing for Beginners

[You may also like - Good Reads - Basics Cloud Computing]

This post is based on my learning's on cloud computing after a discussion on the same. I have tried to put things simpler and easier to get started on cloud computing.

In simple terms to define Cloud Computing, I recommend below tweet 

Three Cloud delivery models: Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS)
One more tweet to explain it
Summarizing Saas, PaaS, Iaas
  • SaaS - Consume it
  • PaaS - Build on it
  • Iaas - Migrate to it
(Source - From Presentation @link)
I would recommend another good read Demystifying The Cloud. Important points summarised from the document and other readings
What is IaaS ? 
  • Replaces in house Data Centres / Systems. IaaS provides unlimited number of machines
  • You have to manage patching, security in IaaS Environment
  • Targeted Audience - IT Pros and Sys Admin
  • Example - Rackspace, Amazon Ec2
Good Read - IaaS ≠ Virtualization

What is PaaS ? 
  • Developers can leverage scalable platform 
  • Targets Developers & Architects
  • Windwows Azure, AWS, Google App Engine
What is SaaS ? 
  • Replace Inhouse Applications
  • Targets end users
  • Typically Available (Accessible) via browser interface
  • Google Docs, Salesforce.com, Zoho.com Apps, Microsoft 365, Gmail, hotmail, Google Calendar

Tenets of Cloud
  • Elasticity, Pay per use, High Availability, Programmability and Self Service
  • Unlimited Storage
  • Built in Redundancy
  • Scale as per need (Scale up / Scale down)
What Cloud Computing is not? 
  • VPN Access
  • Data not stored in Company premises
  • 3rd party hosting solution
Cloud Deployment models: public, private, hybrid and community
More Reads

Please feel free to add your comments / corrections.

Happy Learning!!!

May 17, 2012

Big Data Products, Big Data Updates

[Previous Post in Series - Databases Products]

Splunk - Company focussed on integrating big data. Manages Massive Data collected from servers, logs, tweets, mobile devices etc. Product can Index a variety of data from different platforms, devices. Customer and Domains addressed is impressive.

Microsoft had tied up with Hortonworks to Support Hadoop on Windows Platforms. SAP has tied up with Cloudera as its partner for Big Data Processing. Large scale adoption and support for data processing using Hadoop by Major Players.

Streambase - Developed a Platform for Complex Event Processing, Real Time Analytics, Data Processing

Trendwise Analytics - Start up based out of Bangalore working on Real-Time Analytics using Hadoop

Big Query - Google answer for Big Data Products. Google Big Query get started

GoodData - Cloud based BI as a Service Provider. They have developed a platform for managing BI data on cloud. They offer BI as a Service on Cloud

Happy Learning!!!!

May 06, 2012

C# Basics - Tool Developer Notes Part XI

[Previous Post in Series - C# - Notes - Working in XML in C# - Part X Tool Developer Notes]

Often you learn new things when you re-learn basics. This post is fundamentals of C#. More than statements, for, while loops let’s look on basics of program execution, memory management, basics. Below listed are my notes from Illustrated C# 2010 Book.
Tip #1 - Why .NET is a strongly typed language ?

Type assignments are very strict. You cannot assign different types. Example - You cannot assign string to an integer in C#.
string a;
int b;
b = 10;
a = b;

This would error (Compile time error). Identifying type conversion error during compile time is advantage of strongly typed language.
Tip #2 - What is Unified Type System ?
  • All types are derived from System.Object
  • System.Object contains class, interface, delegate, string etc
  • Value Types are derived from System.ValueType
  • System.ValueType inherits from System.Object
Tip #3 - What is Stored in Stack ? How it works?
  • Stack is used for managing program execution; store certain variables, Store parameters sent for methods
  • LIFO fashion (Data Deleted and Added from Top of Stack - Last In First Out)
Tip #4 - What is a Value Type ?
  • Value Types are Stored in Stack, Use less resource (Managed in Stack itself)
  • Value Types Will not cause Garbage Collection as it is stored in Stack
  • Example of Value Types - int, float, long, double, decimal, struct, enum
More Reads Link

Tip #5 - What is Stored in Heap ? How it works ?
  • Reference Types are stored in Heap. Garbage Collector manages heap memory
  • Memory can be allocated and removed from any order (GC has its own algorithm to clear objects from Heap)
  • Program can store data in Heap. GC removes data from Heap
  • All Reference Type objects are stored in Heap (All the data members of the Object regardless of value type or reference type) they will be stored in Heap
  • Ex- If you declare a class with data members, functions. The data members might belong to int, float data type. All the memory for them would be allocated in heap
  • Example of Reference Types - Object, String, Classes, interface, delegate, array
  • More Reads Link1, Link2

Tip #6 - What is output of below program ?
int i = 20;
object j = i; //(Boxing)
j = 50; //(unboxing)

Value of i will still be 20. It is stored as a value type in stack with 20 as its value. J is stored as reference type in heap with value 50 assigned for J. In C with pointers concepts we can change the value of a variable with pointers

int a = 10;
int *b;
b = &a;
*b = 20;

Now the value of a will be set to 20

Tip #8 - What is Boxing and Unboxing ?
Converting a value type to a reference type is boxing. Boxing means creating a new instance of a reference type. Reference types are always destroyed by garbage collection.
int i = 20;
object j = i; //(Boxing)
Converting a reference type to value type is unboxing
int i = 20;
int k;
object j = i;
k = (int)j; //unboxing

Happy Learning!!!

.NET 4.0 Working with Tasks

This post is about Task library. This library would be very useful for load simulator. Posted below sample examples.

// -----------------------------------------------------------------------
// <copyright file="LoadSimulator.cs" company="Microsoft">
// TODO: Update copyright text.
// </copyright>
// -----------------------------------------------------------------------
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.Threading.Tasks;

namespace SampleExercises
{
    static class LoadSimulator
    {
        public static int i = 10;
        static void Main(string[] args)
        {
            string status = "";

             //Task with return status
            Task<string> ThreadsCreation = Task<string>.Factory.StartNew(() =>
            {
                status = methodA();
                return status;
            });

            //Wait for Task Completion
            ThreadsCreation.Wait();
            Console.WriteLine("Status value is " + status);
            Console.ReadLine();

            //Taskwith no return status
             var noReturnTask = new Task(() => methodB());

            //Start The Task
            noReturnTask.Start();
            Console.ReadLine();

        }

        public static string methodA()
        {
            Console.WriteLine("i count is" + i++);
            return i.ToString();
        }
        public static void methodB()
        {
            Console.WriteLine(" This is am empty Task");
        }
    }
}


Reference - Link1


Happy Learning!!!

File Backup Copy Script

A quick task came on my plate. Task has to perform

  • File Copy across Network Drive
  • Delete Old Archived Files

Following posts provided good direction to get started

Access network share from within VBScript eg FileSystemObject
VBScript: Delete Old Files and Folders

Final Script was logic from both posts


ServerShare = "\\IPAddress\c$\Files"
UserName = "SystemName\UserName"
Password = "Password"
NumberOfDays = 3

Set NetworkObject = CreateObject("WScript.Network")
NetworkObject.MapNetworkDrive "", ServerShare, False, UserName, Password

Set objFS = CreateObject("Scripting.FileSystemObject")
strSourceFolder = "Drive\SourceFolder"
Set strDestination = objFS.GetFolder(ServerShare)
Set objFolder = objFS.GetFolder(strSourceFolder)

'Delete Files Older than X Days
For Each objFile In objFolder.files
    If DateDiff("d", objFile.DateCreated,Now) > NumberOfDays Then
       WScript.Echo objFile
        objFile.Delete True
        End If
Next

For Each objFile In strDestination.files
    If DateDiff("d", objFile.DateCreated,Now) > NumberOfDays Then
       WScript.Echo objFile
        objFile.Delete True
        End If

Next
Go(objFolder)
Sub Go(objDIR)
  If objDIR <> "\System Volume Information" Then
    For Each eFolder in objDIR.SubFolders      
        Go eFolder
    Next
    For Each strFile In objDIR.Files
        strFileName = strFile.Name
        WScript.Echo strFileName
       objFS.CopyFile strFile , strDestination &"\"& strFileName
    Next   
  End If 
End Sub

Set Directory = Nothing
Set FSO = Nothing
Set ShellObject = Nothing

NetworkObject.RemoveNetworkDrive ServerShare, True, False
Set NetworkObject = Nothing


Happy Learning!!!

.NET Tool Developer Notes - LINQ, DateTime Parsing

Tip #1 – When Input data is in MM/dd/yyyy HH:mm format. Converting it to MM/dd/yyyy HH:mm:ss

Code for parsing Date / Time formats
using System;
using System.Globalization;
namespace ExampleCode
{
    public class ExampleCode
    {
        static void Main()
        {
            try
            {
                CultureInfo provider = CultureInfo.InvariantCulture;
                DateTime dateTime;
                string dateValue = null;
                DateTimeStyles styles;
                styles = DateTimeStyles.None;

                if (DateTime.TryParse("01/01/2001 05:00", provider, styles, out dateTime))
                {
                    Console.WriteLine(dateTime);
                    dateValue = dateTime.ToString();
                    Console.WriteLine(dateValue);
                    Console.ReadLine();
                }
                string dateSet = "01/01/2001 05:00";
                Console.WriteLine(dateSet.Length);
                if (dateSet.Length == 16)
                {
                    dateSet = dateSet + ":00";
                }
                Console.ReadLine();
                System.Console.WriteLine(DateTime.ParseExact(dateSet, "MM/dd/yyyy HH:mm:ss", provider));
                Console.ReadLine();
            }
            catch (Exception Ex)
            {
                Console.WriteLine(Ex.Message.ToString());
                Console.ReadLine();
            }
        }
    }
}



Tip #2 – DataTable and LINQ Query Example
using System;
using System.Globalization;
using System.Data;
namespace ExampleCode
{
    public class ExampleCode
    {
        static void Main()
        {
            try
            {
                //DataTable TableA
                DataTable dataTable1 = new DataTable();
                //DataTable TableB
                dataTable1.Columns.Add("Name", typeof(string));
                dataTable1.Columns.Add("Age", typeof(int));
                dataTable1.Columns.Add("Place", typeof(string));
                dataTable1.Rows.Add("Raj", "21", "Chennai");
                dataTable1.Rows.Add("Ram", "22", "Chennai");
                dataTable1.Rows.Add("Rick", "24", "Mumbai");
                dataTable1.Rows.Add("James", "15", "Delhi");
                dataTable1.Rows.Add("Andy", "24", "Delhi");

                var queryByCity = from myRow in dataTable1.AsEnumerable()
                              where myRow.Field<string>("Name").Contains("Ra") &&
                               myRow.Field<string>("Place") == "Chennai"
                              select myRow;

                foreach (DataRow dataValues in queryByCity)
                {
                    foreach (object dataValue in dataValues.ItemArray)
                        {
                            if (dataValue is int)
                            {
                                Console.WriteLine("Int: {0}", dataValue);
                            }
                            else if (dataValue is string)
                            {
                                Console.WriteLine("String: {0}", dataValue);
                            }
                            else if (dataValue is DateTime)
                            {
                                Console.WriteLine("DateTime: {0}", dataValue);
                            }
                        }
                }
                Console.ReadLine();
                var queryByAge = from myRow in dataTable1.AsEnumerable()
                                 where myRow.Field<string>("Place") == "Delhi" &&
                                   myRow.Field<int>("Age") > 22
                                  select myRow;

                foreach (DataRow dataValues in queryByAge)
                {
                    foreach (object dataValue in dataValues.ItemArray)
                    {
                        if (dataValue is int)
                        {
                            Console.WriteLine("Int: {0}", dataValue);
                        }
                        else if (dataValue is string)
                        {
                            Console.WriteLine("String: {0}", dataValue);
                        }
                        else if (dataValue is DateTime)
                        {
                            Console.WriteLine("DateTime: {0}", dataValue);
                       }
                    }
                }
                Console.ReadLine();
            }
            catch (Exception Ex)
            {
                Console.WriteLine(Ex.Message.ToString());
                Console.ReadLine();
            }
        }
    }
}



Tip #3 - Check for Entry in Dictionary

Unexceptional Dictionary Accesses in C#

Tip #4 - Log4J email on Error Sample Code link


Tip #5 - C# Working with Excel. Link1, Link2


Happy Learning!!!