"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 14, 2009

CDC Explored

CDC Feature I did some handson to pull data. I am writing my learnings here. Thanks to balaji for helping me debug. (SQL Server 2008 R2)

CDC Basics here

--STEP 1
CREATE DATABASE TESTCDC

--STEP2
USE TESTCDC
GO
--DROP Table TestTable
CREATE TABLE TESTTable
(ID INT Primary key,
NAME VARCHAR(50))

--STEP3
INSERT INTO TESTTable(ID, NAME)
VALUES (1,'Siva')
INSERT INTO TESTTable(ID, NAME)
VALUES (2,'Raj')
INSERT INTO TESTTable(ID, NAME)
VALUES (3,'Mike')
INSERT INTO TESTTable(ID, NAME)
VALUES (4,'Lloyd')


--STEP 4
--Enable CDC at DB Level
declare @rc int
exec @rc = sys.sp_cdc_enable_db
select @rc

--Query to Verify it
select name, is_cdc_enabled from sys.databases


--STEP 5
--Enable CDC for Table
exec sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'TESTTable' ,
@role_name = 'TestRole',
@supports_net_changes = 1

--Query to Verify DB
select name, type, type_desc, is_tracked_by_cdc from sys.tables


--STEP 6
--Insert Few Records
INSERT INTO TESTTable(ID, NAME)
VALUES (5,'Tauty')
INSERT INTO TESTTable(ID, NAME)
VALUES (6,'Andy')
Update TESTTable SET ID = 10 WHERE ID = 1

--STEP7
--Populated CDC Records
Select * FROM cdc.dbo_TESTTable_CT

--STEP 8
Check Data Populated
DECLARE @Yesterday_Time smalldatetime, @to_lsn binary(10), @from_lsn binary(10);
SELECT @Yesterday_Time = DATEADD(DAY, -1, GETDATE())
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', GETDATE()-2);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE());
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_TESTTable (@from_lsn, @to_lsn ,'all')

--STEP 9
Putting DataPull Part in a SSIS package. For Full pull it is direct dataload (DataFlowTask of all Data). Delta Pull is Tricky and we will be looking into it. We are going to pull data from TESTTable to TestTable1. Changes in TestTable need to be reflected in TestTable1



Once CDC is enabled. You will see like CDC Capturing system tables.

--STEP 10
I am going to create a simple package as below to pull data

--CREATE a Table to Store Last LSN Value (Log Sequence Number)

USE TESTCDC
GO
CREATE TABLE CDCTracking
(LSN VARCHAR(100))

--GET Current Max LSN
SELECT sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE())

--Insert Data
INSERT INTO CDCTracking
VALUES('0x0000001D0000011A0001')

 --STEP 11
Package looks simple as below

--STEP 12
--GET FROM and TO LSN (Excecute SQLTask)

SELECT LSN as 'FROMLSN' ,(select CONVERT(varchar(100), sys.fn_cdc_get_max_lsn(), 1))
AS 'TOLSN'
FROM CDCTracking


Assign output variables for the Execute SQL Task

--STEP13
Below Variables are defined for the package


Assign the value for variable QueryCDC Variable.


SELECT *,__$operation as Operation FROM cdc.fn_cdc_get_net_changes_dbo_TESTTable (CONVERT(binary(10),'0x0000001D0000011A0001',1),CONVERT(binary(10),'0x0000001E000000B00006',1),'all')


--STEP 14
The following is the variable details set for package


--STEP 15
Add below code in C#

// TODO: Add your code here

Dts.Variables["QueryCDC"].Value = "SELECT *,__$operation as Operation" +
" FROM cdc.fn_cdc_get_net_changes_dbo_TESTTable " +
"(CONVERT(binary(10),'" + Dts.Variables["FromLsn"].Value +
"',1),CONVERT(binary(10),'" + Dts.Variables["ToLsn"].Value+
"',1),'all')";

MessageBox.Show(Dts.Variables["QueryCDC"].Value.ToString());
MessageBox.Show(Dts.Variables["FromLsn"].Value.ToString());
MessageBox.Show(Dts.Variables["ToLsn"].Value.ToString());

--STEP 16
Data Flow Task I am doing as below


--STEP 17
First assign OLEDB source editor as provided below





--STEP 18
Add a conditional splitter operator

--STEP 19
Create a OLEDB Command to update as shown below










--STEP 20
DELETE Operation as below







--STEP 21
Insert is straight forward Data Insert, OLEDB Destination





--STEP 22
End to End would Look Like below

INSERT INTO TESTTable(ID, NAME)
VALUES (91,'Tamy')
INSERT INTO TESTTable(ID, NAME)
VALUES (92,'Mosses')
Update TESTTable SET ID = 110 WHERE ID = 5





Handling Errors in SSIS

I basically wrote a simple script for for SSIS Error Handling for one of packages developed. This is pretty easy one. Getting Started on this

Table to log errors

Step 1

CREATE TABLE ErrorLog (
ErrorDesc varchar(MAX),
Taskname varchar(100))

Step 2

Stored Proc to Log Errors

CREATE PROCEDURE dbo.LogError (@ErrorDesc varchar(2000), @Taskname varchar(100))
AS
BEGIN
INSERT INTO [dbo].[ErrorLog] ([ErrorDesc], [Taskname])
VALUES (@ErrorDesc, @Taskname)
END

Step 3

Create SQL Task on Error in the package


Step 4

Parameter mapping

This is simple example.

Further improvements
  • Error Handling in proc
  • Include Errorlogid and other parameters in errorlog table

References - MSSQLTips
SQL Server Integration Services (SSIS) - Design Best Practices
Top 10 SQL Server Integration Services Best Practices
Handling Slowly Changing Dimensions in SSIS
About the BI Monkey SSIS ETL Framework
ETL Framework for SSIS 2008 - Part 1
ETL Framework for SSIS 2008 - Part 2
Stonemeadow Solutions ETL Framework
SSIS Community Tasks and Components
SQL Bits Session - Change Data Capture for real-time BI
Change Data Capture Best Practices

Happy Reading!!

No comments: