Often we require saving all the history of the changes of our databases. Most of the time is related to data security purposes. There are several ways to accomplish that, from triggers, timestamps to stored procedures.
But since SQL Server 2008 there is a new data tracking and capturing feature in user-created tables, it is called CDC. In this article, we will see how to use CDC and how efficiently it could serve to our purpose.
Fig 1
Enabling Change Data Capture on a Database
By default CDC comes disabled, so we need to enable it first on the database. Because CDC is a table-level feature, it then has to be enabled for each table to be tracked.
You can run the following queries, first to create databases and tables, and then to enable CDC and check if it’s enabled.
So, first we need to create the database:
create database Contoso
Creating Tables….
Articles
USE [Contoso]
GO
/****** Object: Table [dbo].[Articles] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Articles](
[Article_ID] [int] NOT NULL,
[Articles] [varchar](50) NOT NULL,
[Cost] [float] NULL,
CONSTRAINT [PK_Articles] PRIMARY KEY CLUSTERED
(
[Article_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Customers
USE [Contoso]
GO
/****** Object: Table [dbo].[Customers] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Customers](
[Customer_ID] [int] NOT NULL,
[Name] [varchar](50) NULL,
[Last_Name] [varchar](50) NULL,
[Phone] [int] NULL,
CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED
(
[Customer_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Orders
USE [Contoso]
GO
/****** Object: Table [dbo].[Orders] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Orders](
[Orders_ID] [int] NOT NULL,
[Customer_ID] [int] NOT NULL,
[Article_ID] [int] NOT NULL,
[Quantity] [int] NULL,
[Total] [float] NULL,
CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED
(
[Orders_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Orders] WITH CHECK ADD CONSTRAINT [FK_Orders_Articles] FOREIGN KEY([Article_ID])
REFERENCES [dbo].[Articles] ([Article_ID])
GO
ALTER TABLE [dbo].[Orders] CHECK CONSTRAINT [FK_Orders_Articles]
GO
ALTER TABLE [dbo].[Orders] WITH CHECK ADD CONSTRAINT [FK_Orders_Customers] FOREIGN KEY([Customer_ID])
REFERENCES [dbo].[Customers] ([Customer_ID])
GO
ALTER TABLE [dbo].[Orders] CHECK CONSTRAINT [FK_Orders_Customers]
GO
After creating tables you should have something similar to this:
Fig 2
Inserting data to tables
Articles
USE [Contoso]
GO
INSERT INTO [dbo].[Articles]
([Article_ID]
,[Articles]
,[Cost])
VALUES
(1000,'Milk',4.5),
(1001,'Coke 2L',6),
(1002,'Cookie small',0.5),
(1003,'Cookie medium',1),
(1004,'Cookie Large',4)
GO
Customers
USE [Contoso]
GO
INSERT INTO [dbo].[Customers]
([Customer_ID]
,[Name]
,[Last_Name]
,[Phone])
VALUES
(2000,'George','Lucas',555123456),
(2001,'Carlos','Blanca',555456789),
(2002,'Bill','Hampton',555415639)
GO
Orders
USE [Contoso]
GO
INSERT INTO [dbo].[Orders]
([Orders_ID]
,[Customer_ID]
,[Article_ID]
,[Quantity]
,[Total])
VALUES
(3000,2002,1000,2,9),
(3001,2001,1001,3,13.5),
(3002,2000,1002,5,18)
GO
Now let's review how to enable CDC on a database.
EXECUTE sys.sp_cdc_enable_db;
Then, how to check if CDC is enabled correctly
SELECT name,is_cdc_enabled FROM sys.databases WHERE name='Contoso'
You should see something like this:
Fig 3
Now that CDC is enabled on the Contoso database, let’s enable CDC on specific tables (e.g. Customer’s table), just make sure the SQL Server Agent service is up and running, also that you have roles created, if you don’t have roles you can pass NULL as parameter in @role_name.
EXEC sys.sp_cdc_enable_table
@source_schema ='dbo',
@source_name ='Customers',
@role_name ='CustomersRole',
@supports_net_changes = 1
In order to check that CDC is enabled correctly for Customer’s table, you can execute this SQL statement.
SELECT name,is_tracked_by_cdc FROM sys.tables WHERE name='Customers'
You should see something similar to this:
Fig 4
For disabling CDC on the table, sys.sp_cdc_disable_table needs to be executed.
EXEC sys.sp_cdc_disable_table
@source_schema ='dbo',
@source_name ='Customers',
@capture_instance ='dbo_Customers'
Similarly, for disabling CDC at the database level, we need EXEC sys.sp_cdc_disable_db.
Tip. In the SQL Server Agent Service you can check that CDC is enabled if you see something similar to this:
Fig 5
How to see data generated by CDC
First you need to know what LSN is. LSN (Log Sequence Number) is a number that identifies each register that is recorded in the transaction log of the database with a LSN provided by SQL Server.
Also you need to know that CDC displays operations done in the tables with the help of Column_$operation ID’s. A Column _$operation contains value which corresponds to DML Operations. Following is quick list of value and its corresponding meaning.
• Delete Statement = 1
• Insert Statement = 2
• Value before Update Statement = 3
• Value after Update Statement = 4
Now execute some DML in the Customers table for the CDC to be able to track changes.
DELETE FROM Customers WHERE Customer_ID=2003
INSERT INTO Customers VALUES(2005,'Mary','McGregor',555789412)
UPDATE Customers SET Name='Jorge' WHERE Customer_ID=2000
UPDATE Customers SET Name='Charles' WHERE Customer_ID=2001
A delete, an insert, and two updates have been fired. To see the captured changes, SQL provides two functions:
• cdc.fn_cdc_get_net_changes_dbo_Customers
• cdc.fn_cdc_get_all_changes_dbo_Customers
As the name suggests, the 'net changes' function gives the net changes on the records and the 'all changes' function provides all the changes before and after the DML is executed.
DECLARE @Begin_LSN binary(10), @End_LSN binary(10)
-- get the first LSN
SELECT @Begin_LSN =sys.fn_cdc_get_min_lsn('dbo_Customers')
-- get the last LSN
SELECT @End_LSN =sys.fn_cdc_get_max_lsn()
-- returns net changes
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Customers(
@Begin_LSN, @End_LSN,'ALL');
/*__$start_lsn __$operation __$update_mask Customer_ID Name Last_Name Phone
---------------------- ------------ ---------------- ----------- ----- -----
0x0000004D000003600003 4 NULL 2000 Jorge Lucas 555123456
0x0000004D000003680003 4 NULL 2001 Charles Serrano 555456789
0x00000050000000700005 1 NULL 2003 Jorge Montano 555442641
0x0000004D000003580003 2 NULL 2005 Mary McGregor 555789412
(4 row(s) affected)*/
cdc.fn_cdc_get_net_changes_dbo_Customers gives us all the records net changed falling between the LSN we provide in the function. We have three records returned by the net_change function; there was a delete, an insert, and two updates, but on the same record. In case of the updated record, it simply shows the net changed value after both the updates are complete.
For getting all the changes, execute cdc.fn_cdc_get_all_changes_dbo_Customers; there are options either to pass 'ALL' or 'ALL UPDATE OLD'. The 'ALL' option provides all the changes, but for updates, it provides the after updated values.
DECLARE @Begin_LSN binary(10), @End_LSN binary(10)
-- get the first LSN
SELECT @Begin_LSN =sys.fn_cdc_get_min_lsn('dbo_Customers')
-- get the last LSN
SELECT @End_LSN =sys.fn_cdc_get_max_lsn()
-- returns all changes both before & after update values for UPDATE
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Customers(
@Begin_LSN, @End_LSN,'ALL');
/*__$start_lsn __$operation __$update_mask Customer_ID Name Last_Name Phone
---------------------- ------------ ---------------- ----------- ----- -----
0x00000031000000200003 0x00000031000000200002 4 0x04 2001 Carlos Serrano 555456789
0x0000004D000003580003 0x0000004D000003580002 2 0x0F 2005 Mary McGregor 555789412
0x0000004D000003600003 0x0000004D000003600002 4 0x02 2000 Jorge Lucas 555123456
0x0000004D000003680003 0x0000004D000003680002 4 0x02 2001 Charles Serrano 555456789
0x00000050000000700005 0x00000050000000700002 1 0x0F 2003 Jorge Montano 555442641
(5 row(s) affected)*/
What is the utility of using CDC?
The essence of CDC can be effectively put to use. Let us see a very simple demonstration of its utility using a PROCESS table and a stored procedure that saves all track changes from Customers.
CREATE TABLE PROCESS
(Process_ID INT IDENTITY (1,1),
Process_Date DATETIME,
BEGIN_LSN BINARY(10),
END_LSN BINARY(10))
So the table has been created to log the minimum and maximum LSN during each job run. When the stored procedure runs, it will make an entry with the Process_ID, the date-time stamp of its run, and the minimum and maximum of the LSN for the records modified. For logging the details in the created table, the Stored Procedure below has been created. A job can call and execute the procedure at the very last step before it exits.
IF EXISTS(SELECT 1 FROM Contoso.sys.objects
WHERE name='ProcessLogEntry'AND [type]='P')
DROP PROCEDURE ProcessLogEntry
GO
CREATE PROCEDURE dbo.ProcessLogEntry
AS
SET NOCOUNT ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
BEGIN
DECLARE @BEGIN_LSN BINARY(10)
IF ((SELECT COUNT(1)FROM PROCESS)>0)
BEGIN
SELECT @BEGIN_LSN=MAX(END_LSN)FROM PROCESS
INSERT INTO PROCESS SELECT GETDATE(),
@BEGIN_LSN,sys.fn_cdc_get_max_lsn();
END
ELSE
INSERT INTO PROCESS SELECT GETDATE(),
sys.fn_cdc_get_min_lsn('dbo_Customers'),sys.fn_cdc_get_max_lsn();
END;
Well, we execute the procedure manually; it makes the entry for the LSN for the DML we executed previously, i.e.: an insert, a delete, and two updates.
EXEC ProcessLogEntry
SELECT * FROM PROCESS
/*Process_ID Process_Date BEGIN_LSN END_LSN
1 2012-05-21 01:31:28.277 0x00000000000000000000 0x00000054000001700001
(1 row(s) affected)*/
As it is logged, let’s execute some more DML:
INSERT INTO Customers VALUES(2008,'Juan','Perez',555123678)
UPDATE Customers SET Last_Name='Lopez' WHERE Customer_ID=2008
We have inserted a record and updated the department for Customer_ID 2. To log the changes, we execute the procedure ProcessLogEntry again.
EXEC ProcessLogEntry
SELECT * FROM PROCESS
/*Process_ID Process_Date BEGIN_LSN END_LSN
1 2012-05-21 01:31:28.277 0x00000000000000000000 0x00000054000001700001
2 2012-05-21 01:34:12.227 0x00000054000001700001 0x00000055000000D00003
(2 row(s) affected)*/
The second entry has been made in our Process table. Now, this enables us to get the changes done by the job on any specific day. The code below gets the net changes done by process_id =2.
DECLARE @Begin_LSN BINARY(10), @End_LSN BINARY(10)
SELECT @Begin_LSN = BEGIN_LSN FROM PROCESS WHERE Process_ID=2
SELECT @End_LSN = END_LSN FROM PROCESS WHERE Process_ID=2
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Customers(
@Begin_LSN, @End_LSN,'ALL');
/*__$start_lsn __$seqval __$operation __$update_mask Customer_ID Name Last_Name Phone
0x00000055000000C80003 0x00000055000000C80002 2 0x0F 2008 Juan Perez 555123678
0x00000055000000D00003 0x00000055000000D00002 4 0x04 2008 Juan Lopez 555123678
(2 row(s) affected)*/
As you can see there were two records that are related to process 2, and we get detailed information about the changes.
Summary
Data Base Administrators sometimes has tried to create systems that record all the changes made to the data in a database. Since SQL Server 2008 and currently with SQL Server 2012, we have this great tool called CDC that comes with SQL Server that allows us to track changes in our data. It could be used to track guys that are trying to gain unauthorized access or executing DML (update/delete). All the details as to who fired the query, timestamp, and the actual query can be recorded for helping DBA personnel report any unethical activities on the server. This will go a long way to boost the security of the servers and help organizations preserve the most important aspect of their business: confidential information.