"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

June 18, 2011

TSQL Tip of the Day

Tip #1

This post is based on stackoverflow question. I'm introducing a new table in database, How much space shall i project for the table

Let's find answers for below questions
  • What is the schema for the table, Datatypes of columns
  • Space occupied for one row
  • How many indexes are defined on the table
  • How many transactions per day ?
  • Any Archival Strategy on the table ? 
  • Table Partitioning for faster access ?
 For Example Create Employee Table with below columns

EmployeeId  - bigint
EmployeeName - char(100)
DeptId - smallint
CreatedDate - DateTime
ModifiedDate - DateTime

Range of defined datatypes are
  • smallint range is up to -/+ 32,767
  • bigint range is up to -/+ 9,223,372,036,854,775,807
  • EmployeeName - No unicode stored
Space per row

Space per row = 8 bytes (big int) + 100 byte (char column) + 2 bytes (small int) + 8 bytes (date time) + 8 bytes (date time)
Space per row= 126 bytes
Rows Per Page = 8096/126 - 64 Rows
Pages for 10K Rows - 10000/64 = 156 Pages
For 1 million rows - 1000000/64 = 15625 * 8096 Bytes = .11 GB Approximately
Now lets look at space for Indexes on  EmployeeId  - Clustered Index
Calculate the Space Used to Store Data in the Leaf Level

Num_Rows = 1000000
Num_Cols = 5
Fixed_Data_Size = 126 bytes
Num_Variable_Cols = 0
Max_Var_Size = NA
Null_Bitmap = 2 + ((Num_Cols + 7) / 8)
NULL_Bitmap = 2+((5+7)/8)
NULL_Bitmap = 3.5
Variable_Data_Size = 2 + (Num_Variable_Cols x 2) + Max_Var_Size
Variable_Data_Size = NA
Row_Size = Fixed_Data_Size + Variable_Data_Size + Null_Bitmap + 4
Row_Size = 126+0+4+4
Row_Size = 134
Rows_Per_Page = 8096 / (Row_Size + 2)
Rows_Per_Page = 8096 / (136) = 60
Free_Rows_Per_Page = 8096 x ((100 - Fill_Factor) / 100) / (Row_Size + 2)
Assuming Fill factor is 70%
Free_Rows_Per_Page = 8096 * ((100-70)/100) / (136)
Free_Rows_Per_Page =  17
Num_Leaf_Pages = Num_Rows / (Rows_Per_Page - Free_Rows_Per_Page)
Num_Leaf_Pages = 1000000 /(60-17)
Num_Leaf_Pages = 21276
Leaf_space_used = 8192 x Num_Leaf_Pages
Leaf Space used = 174297872 Bytes

Calculate the Space Used to Store Data in the Leaf Level = Leaf Space used = .16 GB

Clustered index size (bytes) = Leaf_Space_Used + Index_Space_used


Similarly, Estimating the Size of a Nonclustered Index
Disclaimer: This is completely based on my reading and does not guarantee this is 100% correct. This exercise provides us rough estimate on arriving space required for tables.

Next Step - Create the table and verify estimated space and actual space. I will post it soon.

use master
GO
SP_Configure 'show advanced options',1;
GO
SP_Configure 'Fill Factor', 70;
RECONFIGURE
GO
--Based on http://technet.microsoft.com/en-us/magazine/2007.12.sqlqa.aspx

--STEP 1
use AppDB
GO
IF OBJECT_ID ('Employee') IS NOT NULL DROP TABLE Employee
 CREATE TABLE Employee
 (
 EmployeeId   BIGINT IDENTITY(1,1) PRIMARY KEY WITH (FillFactor = 70),
 EmployeeName CHAR(100) NOT NULL,
 DeptId  smallint NOT NULL,
 CreatedDate  DateTime NOT NULL,
 ModifiedDate  DateTime NOT NULL
 )


 --STEP 2
--Insert Records into this table
DECLARE @I INT, @J INT, @K INT
SET @I = 1
SET @J = 1
SET @K = 1
 WHILE 1 = 1
 BEGIN
 SET @I = @I + 1
 INSERT INTO Employee(EmployeeName,DeptId,CreatedDate, ModifiedDate)
 VALUES ((CONVERT(CHAR(5),@I)+'A'),@J,GETDATE(), GETDATE())
 SET @K = @K + 1
 IF(@K > 1000)
 BEGIN
   SET @K = 0;
   SET @J = @J+1;
 END
 SELECT @I
  IF @I>=1000000
 BREAK;
 END
 --This step took 10 minutes to insert 1 million records
 

--STEP 3
 SELECT COUNT(1) FROM Employee WITH (NOLOCK)
 SELECT Fill_Factor FROM Sys.Indexes WHERE Object_id=object_id('Employee') AND name IS NOT NULL;
 --70
  
--STEP 4
 sp_spaceused Employee
 135600 KB
 .129 GB


Using Read-to-use EXCEL Estimating the Size of an SQL Server Database





213606400 bytes = 0.198936462 gigabytes 

I found this article exactly the same I was looking for Performance Considerations of Data Types

Very Important thing to note here. When to consider char vs varchar - When not clear on average length of column choose char datatype
More Reads
SQL Server–HOW-TO: quickly retrieve accurate row count for table
Get the NULL percentage of values in a specific table dynamically

Happy Reading!!

No comments: