"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 21, 2009

DFS Vs BFS

Depth First Search Vs Breadth First Search
DFS
  • A depth-first search (DFS) explores a path all the way to a leaf before backtracking and exploring another path
  • For example, after searching A, then B, then D, the search backtracks and tries another path from B
  • Node are explored in the order A B D E H L M N I O P C F G J K Q
  • N will be found before J  

DFS -
  • Takes less memory
  • The disadvantages are that it takes longer, and will not always find the shortest path
  • Can get struck if Tree has loops
 BFS
  • A breadth-first search (BFS) explores nodes nearest the root before exploring nodes further away
  • For example, after searching A, then B, then C, the search proceeds with D, E, F, G
  • Node are explored in the order A B C D E F G H I J K L M N O P Q
  • J will be found before N 

BFS
  • Will Always find shortest path
  • Can Deal with Looping Structures
  • Takes lot of memory
Algorithm - Pseudo Code
Depth-first searching:

Put the root node on a stack;
while (stack is not empty)
                       {
                              remove a node from the stack;
                              if (node is a goal node) return success;
                              put all children of node onto the stack;
                       }
return failure;

Breadth-first searching:
Put the root node on a queue;
while (queue is not empty)
                        {
                            remove a node from the queue;
                            if (node is a goal node) return success;
                            put all children of node onto the queue;
                        }
return failure;

Note: Queue (FIFO), Stack(LIFO) 
8 Queen problem using DFS Approach link
Reference - Study Material from Site

Algorithm Types

Many Algorithm types are to be considered:

Simple recursive algorithms
Example – Factorial Algorithm
unsigned int factorial(unsigned int n)
{
     if (n <= 1)
    {
        return 1;
    }
    else
   {
          return n * factorial(n-1);
   }
}

Backtracking algorithms
• Backtracking algorithms are based on a depth-first recursive search
• A backtracking algorithm:

  • Tests to see if a solution has been found, and if so, returns it; otherwise
  • For each choice that can be made at this point,
  • Make that choice
  • If the recursion returns a solution, return it
  • If no choices remain, return failure
  • Example – 8 Queen Problem

Divide and conquer algorithms
o Divide-and-conquer algorithms partition the problem into independent subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem
o Combine the solutions to the subproblems into a solution to the original problem
o Example – QuickSort, BinarySearch

Dynamic programming algorithms
o Like divide and conquer, DP solves problems by combining solutions to subproblems.
o Unlike divide and conquer, subproblems are not independent.

  • Subproblems may share subsubproblems,
  • However, solution to one subproblem may not affect the solutions to other subproblems of the same problem.

o DP reduces computation by
o Solving subproblems in a bottom-up fashion.
o Storing solution to a subproblem the first time it is solved.
o Looking up the solution when subproblem is encountered again.
o At this moment two dynamic programming hallmarks are stated:

  • Optimal substructure: an optimal solution to a problem contains optimal solutions to subproblems.
  • Overlapping subproblems: a recursive solution contains a “small” number of distinct subproblems repeated many times.

o Examples -Dynamic Programming Practice Problems - Video Tutorial

Greedy algorithms
o A greedy algorithm works in phases. At each phase:
o You take the best you can get right now, without regard for future consequences
o You hope that by choosing a local optimum at each step, you will end up at a global optimum
o Examples
o Dijkstra’s algorithm for finding the shortest path in a graph - Always takes the shortest edge connecting a known node to an unknown node
o Kruskal’s algorithm for finding a minimum-cost spanning tree - Always tries the lowest-cost remaining edge
o Prim’s algorithm for finding a minimum-cost spanning tree - Always takes the lowest-cost edge between nodes in the spanning tree and nodes not yet in the spanning tree

• Branch and bound algorithms
• Brute force algorithms
• Randomized algorithms

Reference – I relied purely on web search to refer to multiple sites and learn above details. My next plan is to take up one example at a time and learn and analyze the algorithm.

Practicing Programming
You Should Write Blogs

More Reads

December 19, 2009

Drill Down Report in SSRS

This is in continuation with my first blog entry link

1. Create a new report based on Wizard

clip_image002

2. Provide the stored proc in the query builder window

clip_image004

In the next window select Tabular form.
3. Choose options for group by based on city as described below

clip_image006

4. Select option for Drilldown

clip_image008

5. Complete the next steps to complete the report
6. Goto Dataset Properties

clip_image010

7. Change it into a stored proc.
clip_image012

8. You will see the report output like the drill down report option. The input parameter and the proc need to be modified to show it as a drop down list.

clip_image014

Algorithms - Time Complexity

Time Complexity of For Loops
=========================
1. Sum = 0;
    for(i=0; i>n; i++)
        Sum++;
  Complexity is O(N)  - Since the loop is executed only once

2. Sum = 0;
    for(i=0; i<n; i++)
         for(j=0; j<n; j++)
               Sum++;
 Complexity is O(N Square) - Two For loops

3. Sum = 0;
    for(i=0; i<n; i++)
        for(j=0; j<n*n; j++)
 Complexity is O(N3) - N Power3

'O' Notation
  • O(g(n)) is a set of functions, f, such that
  • f(n) < cg(n) - g is an upper bound of f
  • f is O(g) is transitive. If f is O(g) and g is O(h) then f is O(h)
  • Exponential functions grow faster
  • Lograthmic functions grow slower
Omega notation
  • set of functions such that f(n) > cg(n) - g is lower bound of f
Theta notation
theta(g) = O(g) and Omega (g). g is both lower and upper bound of f

Good Tutorial Link

Class P - Set of Problems which will have solutions with polynomial time complexity. E.g - Euler's Problem

Class NP - (Non Deterministic Polynomial) - Problem which can be solved by a series of guessing (non-deterministic) steps but whose solution can be verified as correct in polynomial time is said to lie in class NP. E.g Hamilitonian Problem

Sort Algorithms Time Complexity
  • Insertion Sort - O(N Square)
  • Bubble Sort - O(N Square)
  • Heap Sort - O(NlogN)
  • Quick Sort - O(NlogN)
  • Radix Sort - O(N)

Algorithms - Maximum of sub sequence numbers

Question - Find max subsequence of 3 numbers in an array of 10 numbers

A little console app
==============
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication2
{
class Program
{
     static void Main(string[] args)
    {
          Console.WriteLine("Test Output");
          int[] a = { 21, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
          int maxsum=0,currentsum = 0;
          //max subsequence of 3 numbers
         for (int i = 0; i < 10; i++) //N times
        {
            currentsum = 0;
            for (int j = i; j < i+3 && j <10 ; j++) //3N
             {
                    currentsum = currentsum + a[j];
             }
             if (currentsum > maxsum)
             maxsum = currentsum;
        }
       Console.WriteLine("Max Sum Value is ", maxsum);
    }
}
}

It is O(N Square)

Next problem Set
Write a program which prints array of numbers 1 - 100 as multiples of 3 with A's i.e.(3A,6A,9A.....) , multiples of 5 with B(5B, 10B, 20B....) and multiples of 3 and 5 with AB(15AB, 150AB.....)

TSQL Code Here - SQL 2008 R2
use TEMPDB

DECLARE @I INT
DECLARE @RESULT VARCHAR(5)
SET @I = 1
SET @RESULT = NULL
WHILE 1 = 1
BEGIN
              SET @RESULT =
                                   CASE
                                   WHEN (@I%5 = 0 AND @I%3 = 0) THEN (CONVERT(VARCHAR(10),@I)+'AB')
                                    WHEN @I%3 = 0 THEN (CONVERT(VARCHAR(5),@I)+'A')
                                    WHEN @I%5 = 0 THEN (CONVERT(VARCHAR(5),@I)+'B')
                                    END
               IF @RESULT IS NOT NULL
                     SELECT @RESULT
               SET @I = @I+1
               SET @RESULT = NULL
               IF @I > 100
               BREAK;
END

Write a program to shuffle pack of cards without using math.random....

Given an array containing zeros,+ve and -ve numbers, write a function which pushes all the zeros to the end of the array.

Write a function which takes a string and a substring as input and which deletes the substring in the main string.eg: mainstr=abcxyghixy sub=xy result should be mainstr=abcghi

Testing atoi function

Prime number in TSQL from 1 to 100

The prime number challenge – great waste of time!

Many different solutions here in link

December 17, 2009

Each GROUP BY expression must contain at least one column that is not an outer reference.

Today I ended up with this error. I got the solution from blog link. It solved my problem. Need to learn More :)

Example below
CREATE TABLE TESTError

(
  A INT,
  B INT,
  C INT
)

INSERT INTO TESTError (A,B,C)
VALUES(1,2,3)
INSERT INTO TESTError (A,B,C)
VALUES(2,3,4)
INSERT INTO TESTError (A,B,C)
VALUES(2,4,5)
INSERT INTO TESTError (A,B,C)
VALUES(2,2,2)


--Error Case
SELECT COUNT(A), COUNT(B), COUNT(C), 'Action' = CASE WHEN A < 1 THEN 'ONE'
WHEN A > 1 THEN 'TWO'
END
FROM TESTError
GROUP BY A, B, C,'Action'

Error - Each GROUP BY expression must contain at least one column that is not an outer reference.


--Working Case
SELECT COUNT(A), COUNT(B), COUNT(C), 'Action' = CASE WHEN A < 1 THEN 'ONE'
WHEN A > 1 THEN 'TWO'
END
FROM TESTError
GROUP BY A, B, C

Reason - Column I was selecting is Group BY is a Constant Value which is already derived in Expresession

December 14, 2009

CDC Explored

CDC Feature I did some handson to pull data. I am writing my learnings here. Thanks to balaji for helping me debug. (SQL Server 2008 R2)

CDC Basics here

--STEP 1
CREATE DATABASE TESTCDC

--STEP2
USE TESTCDC
GO
--DROP Table TestTable
CREATE TABLE TESTTable
(ID INT Primary key,
NAME VARCHAR(50))

--STEP3
INSERT INTO TESTTable(ID, NAME)
VALUES (1,'Siva')
INSERT INTO TESTTable(ID, NAME)
VALUES (2,'Raj')
INSERT INTO TESTTable(ID, NAME)
VALUES (3,'Mike')
INSERT INTO TESTTable(ID, NAME)
VALUES (4,'Lloyd')


--STEP 4
--Enable CDC at DB Level
declare @rc int
exec @rc = sys.sp_cdc_enable_db
select @rc

--Query to Verify it
select name, is_cdc_enabled from sys.databases


--STEP 5
--Enable CDC for Table
exec sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'TESTTable' ,
@role_name = 'TestRole',
@supports_net_changes = 1

--Query to Verify DB
select name, type, type_desc, is_tracked_by_cdc from sys.tables


--STEP 6
--Insert Few Records
INSERT INTO TESTTable(ID, NAME)
VALUES (5,'Tauty')
INSERT INTO TESTTable(ID, NAME)
VALUES (6,'Andy')
Update TESTTable SET ID = 10 WHERE ID = 1

--STEP7
--Populated CDC Records
Select * FROM cdc.dbo_TESTTable_CT

--STEP 8
Check Data Populated
DECLARE @Yesterday_Time smalldatetime, @to_lsn binary(10), @from_lsn binary(10);
SELECT @Yesterday_Time = DATEADD(DAY, -1, GETDATE())
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', GETDATE()-2);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE());
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_TESTTable (@from_lsn, @to_lsn ,'all')

--STEP 9
Putting DataPull Part in a SSIS package. For Full pull it is direct dataload (DataFlowTask of all Data). Delta Pull is Tricky and we will be looking into it. We are going to pull data from TESTTable to TestTable1. Changes in TestTable need to be reflected in TestTable1



Once CDC is enabled. You will see like CDC Capturing system tables.

--STEP 10
I am going to create a simple package as below to pull data

--CREATE a Table to Store Last LSN Value (Log Sequence Number)

USE TESTCDC
GO
CREATE TABLE CDCTracking
(LSN VARCHAR(100))

--GET Current Max LSN
SELECT sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE())

--Insert Data
INSERT INTO CDCTracking
VALUES('0x0000001D0000011A0001')

 --STEP 11
Package looks simple as below

--STEP 12
--GET FROM and TO LSN (Excecute SQLTask)

SELECT LSN as 'FROMLSN' ,(select CONVERT(varchar(100), sys.fn_cdc_get_max_lsn(), 1))
AS 'TOLSN'
FROM CDCTracking


Assign output variables for the Execute SQL Task

--STEP13
Below Variables are defined for the package


Assign the value for variable QueryCDC Variable.


SELECT *,__$operation as Operation FROM cdc.fn_cdc_get_net_changes_dbo_TESTTable (CONVERT(binary(10),'0x0000001D0000011A0001',1),CONVERT(binary(10),'0x0000001E000000B00006',1),'all')


--STEP 14
The following is the variable details set for package


--STEP 15
Add below code in C#

// TODO: Add your code here

Dts.Variables["QueryCDC"].Value = "SELECT *,__$operation as Operation" +
" FROM cdc.fn_cdc_get_net_changes_dbo_TESTTable " +
"(CONVERT(binary(10),'" + Dts.Variables["FromLsn"].Value +
"',1),CONVERT(binary(10),'" + Dts.Variables["ToLsn"].Value+
"',1),'all')";

MessageBox.Show(Dts.Variables["QueryCDC"].Value.ToString());
MessageBox.Show(Dts.Variables["FromLsn"].Value.ToString());
MessageBox.Show(Dts.Variables["ToLsn"].Value.ToString());

--STEP 16
Data Flow Task I am doing as below


--STEP 17
First assign OLEDB source editor as provided below





--STEP 18
Add a conditional splitter operator

--STEP 19
Create a OLEDB Command to update as shown below










--STEP 20
DELETE Operation as below







--STEP 21
Insert is straight forward Data Insert, OLEDB Destination





--STEP 22
End to End would Look Like below

INSERT INTO TESTTable(ID, NAME)
VALUES (91,'Tamy')
INSERT INTO TESTTable(ID, NAME)
VALUES (92,'Mosses')
Update TESTTable SET ID = 110 WHERE ID = 5





Handling Errors in SSIS

I basically wrote a simple script for for SSIS Error Handling for one of packages developed. This is pretty easy one. Getting Started on this

Table to log errors

Step 1

CREATE TABLE ErrorLog (
ErrorDesc varchar(MAX),
Taskname varchar(100))

Step 2

Stored Proc to Log Errors

CREATE PROCEDURE dbo.LogError (@ErrorDesc varchar(2000), @Taskname varchar(100))
AS
BEGIN
INSERT INTO [dbo].[ErrorLog] ([ErrorDesc], [Taskname])
VALUES (@ErrorDesc, @Taskname)
END

Step 3

Create SQL Task on Error in the package


Step 4

Parameter mapping

This is simple example.

Further improvements
  • Error Handling in proc
  • Include Errorlogid and other parameters in errorlog table

References - MSSQLTips
SQL Server Integration Services (SSIS) - Design Best Practices
Top 10 SQL Server Integration Services Best Practices
Handling Slowly Changing Dimensions in SSIS
About the BI Monkey SSIS ETL Framework
ETL Framework for SSIS 2008 - Part 1
ETL Framework for SSIS 2008 - Part 2
Stonemeadow Solutions ETL Framework
SSIS Community Tasks and Components
SQL Bits Session - Change Data Capture for real-time BI
Change Data Capture Best Practices

Happy Reading!!

December 12, 2009

Basics - Employee manager Legacy Interview Question

Question - Query to Display Employee and their Manager names. Table details provided below.

use tempdb
--STEP1
CREATE TABLE Employee
(ID INT,
Name VARCHAR(50),
ManagerId INT)

--STEP2
INSERT INTO Employee(ID,Name,ManagerId)
VALUES
(1,'Siva',NULL),
(2,'Raj',1),
(3,'Raja',2),
(4,'Santosh',3),
(5,'Santosh Raj',4)

--STEP 3
SELECT Distinct E.ID as 'EmployeeID', E.Name as 'EmployeeName', M.ID as 'ManagerId', M.Name as 'Managername' FROM Employee E LEFT JOIN Employee M
ON E.ManagerId = M.Id

I am trying on SQL 2008 R2..

I wanted to learn data structures and post some interesting interview questions. I did attend course few months back. Whatever I could grab and I learnt I will post it soon....All only 'C' programs :).....

Question - Write a Query to return Employeee and Last Maximum Salary

CREATE TABLE EmpSal
(
Empid INT,
MonthofPay VARCHAR(20),
Salary Money
)

INSERT INTO EmpSal(Empid,MonthofPay,Salary)
VALUES(10,'June',15000),
(10,'July',18000),
(10,'April',21000),
(10,'August',24000),
(11,'June',25000),
(11,'July',28000),
(11,'April',31000),
(12,'August',44000),
(12,'June',55000),
(12,'July',18000),
(12,'April',11000),
(12,'August',14000)

SELECT E1.* FROM EmpSal E1 JOIN

(
         SELECT Empid, MAX(Salary) as Salary
         FROM EmpSal
         GROUP BY Empid
) E2
ON E1.Empid = E2.Empid
AND E1.Salary = E2.Salary

2. Write a query to return third max salary

SELECT Top 1 Salary, * From EmpSal E1
WHERE E1.Salary IN
       (
            SELECT Top 3 Salary
            FROM EmpSal
            Order by Salary Desc
       )
Order by E1.Salary Asc

Primenumber generation in TSQL

December 10, 2009

Basics - Deleting Duplicate Entries using ROW_NUMBER() Function

ROW_NUMBER() - Divides the result set produced by the FROM clause into partitions to which the ROW_NUMBER function is applied (As mentioned in MSDN)

IF EXISTS (SELECT 1 FROM SYS.TABLES WHERE NAME ='TEST')

DROP TABLE TEST
GO
--STEP1
CREATE TABLE TEST
(ID INT,
NAME VARCHAR(20))
GO

--STEP 2
INSERT INTO TEST (ID,NAME)
VALUES (1,'Raj')
GO
INSERT INTO TEST (ID,NAME)
VALUES (1,'Raj')
GO
INSERT INTO TEST (ID,NAME)
VALUES (2,'Ram')
GO
INSERT INTO TEST (ID,NAME)
VALUES (2,'Ram')
GO
SELECT * FROM TEST
GO

--STEP 3
WITH [CTE DUPLICATE] AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID DESC),
Id,NAME
FROM TEST
)
DELETE FROM [CTE DUPLICATE]
OUTPUT DELETED.*
WHERE RN > 1;
GO

--STEP 4
SELECT * FROM TEST

December 03, 2009

My First SSRS Report

It was fun learning SSRS and Report Creation. I learnt to create Reports. I am going to post an example of creating a standard report using parameters. Report is going to pass parameters to stored procedures and fetch data.

I am doing it on SQL Server 2008 R2

Example Table and Stored Procs
use tempdb

create table dbo.Earnings
(Name VARCHAR(100) NOT NULL,
SaleDate DATETIME NOT NULL,
SaleValue INT NOT NULL,
City VARCHAR(100) NOT NULL
)

INSERT INTO Earnings (Name,SaleDate,SaleValue,City)
VALUES ('Raja',GETDATE()-1,2500,'Chennai'),
('Raja',GETDATE()-2,2000,'Chennai'),
 ('Raja',GETDATE()-3,4000,'Mumbai'),
('Raja',GETDATE()-30,9000,'Hyderabad'),
('Ram',GETDATE()-1,9000,'Hyderabad'),
('Ram',GETDATE()-2,5000,'Chennai'),
('Ram',GETDATE()-30,8500,'Delhi'),
('Robert',GETDATE()-2,9000,'Hyderabad'),
('Robert',GETDATE()-30,9000,'Hyderabad'),
('Siva',GETDATE(),90,'Bangalore')

CREATE PROC SaleValue
(@FromDate DATETIME,
@ToDate DATETIME
)
AS
BEGIN
SELECT Name, SaleDate, SaleValue, City
FROM dbo.Earnings
WHERE SaleDate >= @FromDate
AND SaleDate <= @ToDate
END

DECLARE @FromDate DATETIME, @ToDate DATETIME
SET @FromDate = GETDATE()-100
SET @ToDate = GETDATE()
EXEC SaleValue @FromDate,@ToDate

CREATE PROC SaleCity
(@City VARCHAR(100)
)
AS
BEGIN
SELECT Name, SaleDate, SaleValue, City
FROM dbo.Earnings
WHERE City like @City
END

DECLARE @City VARCHAR(100)
SET @City = 'Chennai'
EXEC SaleCity @City
 
Creating Report

Step 1 – Create a Data Source Connection



Step 2 – Adding Report. Under Reports ->Add New Item->Select Report. It will create a report.rdl file.

Step 3 – Under Report Data. New->Data Source. Select Shared DataSource and Select it as shown in below pic.






Step 4 – New DataSet. Name, Data Source and Select Stored Procedure as shown below.

Refresh and see parameters as shown below. It is under Parameters tab.



Step 5 – Now If you refresh Parameters would be automatically listed



Step 6 – Under Toolbox drag a table, as shown in screen.



Step 7 – Drag and Drop DataSet Columns as shown below in screen


Step 8 – Click on preview, You will see FromData, ToDate and When you click on view report you will see output.
Note: I added footer, Set Font color to make it little better :-)




Now In the Similar Way I need to create Report for Second Proc that we created. SaleCity Procedure.

I am going to link this as a link to the mail report. When you click on City. It would provide you city level details.



Linking a Report

In the First Report. Under City. Click on Text properties and goto actions tab and fill the sub report details as shown in below screen shot.



Now You can try from Salevalue Report. When you Click on Report it would show City details.

Now, Adding a Little Graph Display. Drag a Chart from Toolbox, Data - SaleValue, Category-City



Final Report listed below...




I am happy I am sharing what I learnt in last few days :-)...

Happy Reading....Will come back with some interesting SSIS stuff....

I below link useful for creating report. I got the link after posting the blog :).

Expression Examples (Reporting Services)