CDC

Change Data Capture, also known as CDC, introduced the first time in SQL Server 2008 version, as a helpful feature to track and capture the changes that are performed on the SQL Server database tables, with no additional programming efforts. Before SQL Server 2016, Change Data Capture could be enabled on a SQL Server database only under the SQL Server Enterprise edition, which is not required starting from SQL Server 2016.

Change Data Capture tracks the INSERT, UPDATE and DELETE operations on the database table, and records detailed information about these changes in a mirrored table, with the same columns structure of the source tables, and additional columns to record the description of these changes. SQL Server writes one record for each INSERT statement showing the inserted values, on record for each DELETE statement showing the deleted data and two records for each UPDATE statement, the first one showing the data before the change and the second one showing the data after performing the change.

For example, I have a simple dbo_Persons table here and have two records, Sathish & Abhinav.  Now let's say that somebody goes and changes, you know, Sathish to Sathish Chandra, then I would have to go and maintain some kind of audit trail table, do you know, where I can get the old values and the new values.       

Yes, CDC helps us to capture, insert, update and delete activities on table data. Now enabling CDC is a two-step process.          

The first thing is we need to go and enable CDC on the database level and then we need to go and define specific tables on which we need to go and enable CDC.         

To enable CDC, we need to do go and fire this stored procedure here,

--Enable on database level

use sqldbahub

exec sys.sp_cdc_enable_db     

Msg 22830, Level 16, State 1, Procedure sys.sp_cdc_enable_db_internal, Line 274 [Batch Start Line 21]

Could not update the metadata that indicates database sqldbahub is enabled for Change Data Capture. The failure occurred when executing the command 'SetCDCTracked(Value = 1)'. The error returned was 15404: 'Could not obtain information about Windows NT group/user 'AzureAD\SathishChandraBhanum', error code 0x54b.'. Use the action and error to determine the cause of the failure and resubmit the request.

Completion time: 2024-05-08T14:26:13.1966680+05:30

If you face above error:

--If you face error on enabled CDC

EXEC sp_changedbowner 'sa'

So once you fire this stored procedure, It enables CDC on the database level.

Once you have enabled CDC, it creates a couple of system tables like in the system tables folder as below.

[cdc].[captured_columns]

[cdc].[change_tables]

[cdc].[ddl_history]

[cdc].[index_columns]

[cdc].[lsn_time_mapping]        

Also it created a many stored procedures.  if you go down below, you can see this underscore CDC stored procedure, all of these are created for CDC.  

Now the next step is now that we need to enable CDC on the table level.            

We have this simple table here called a dbo.Persons.      

We'll let's go ahead and enable CDC on the dbo.Persons.

EXEC sys.sp_cdc_enable_table

    @source_schema = N'dbo',

    @source_name   = N'Persons',

    @role_name     = N'null',

    @filegroup_name = N'PRIMARY'

    @supports_net_changes = 1 –Use this, If the table had primary key

--if you face error

The error returned was 22836: ‘Could not update the metadata for database [database name] to indicate that a Change Data Capture job has been added. The failure occurred when executing the command ‘sp_add_jobstep_internal’.

SELECT srvname AS OldName FROM master.dbo.sysservers

SELECT SERVERPROPERTY('ServerName') AS NewName

exec sp_dropserver 'SQLDBAHUB'; 

GO 

sp_addserver 'ST21PUNELT1685', local; 

GO 

Once enabled CDC on the table level, it will create one more table in the system tables folder like [cdc].[dbo_Persons_CT] (always table name underscore with CT.         )

Underscore CT table, which actually captures insert, update and delete activities for a particular table which we enabled CDC.        

For this example, [cdc].[dbo_Persons_CT] table will capture all the activities of insert, update and delete for the sales table.

Now CDC has been enabled on the [dbo_Persons] table,  table data look like below.

PersonID

LastName

FirstName

Address

City

1

M

Sathish

Ameerpet

HYD

2

A

Abhinav

LBNagar

Hyderabad

Now if I will do some modification on the [dbo_Persons] table data, like change FirstName Sathish  to Sathish Chandra.               

PersonID

LastName

FirstName

Address

City

1

M

Sathish Chandra

Ameerpet

HYD

2

A

Abhinav

LBNagar

Hyderabad

Now if I go and open [cdc].[dbo_Persons_CT] table data looks like below:

__$start_lsn

__$end_lsn

__$seqval

__$operation

__$update

_mask

PersonID

LastName

FirstName

Address

City

__$command_id

0x000000310000BC580003

NULL

0x000000310000BC580002

3

0x04

1

M

Sathish

Ameerpet

HYD

1

0x000000310000BC580003

NULL

0x000000310000BC580002

4

0x04

1

M

Sathish Chandra

Ameerpet

HYD

1

 

You can see that there are lots of fields here,  just look at the last two fields. So here you can see that, it's showing that, the old value was sathish and then later it was changed to Sathish Chandra and the remaining fields are unchanged.

in the same way, if I go and delete Abhinav, then will get one entry here saying that Abhinav was deleted, this one stands for delete.

__$start_lsn

__$end_lsn

__$seqval

__$operation

__$update_mask

PersonID

LastName

FirstName

Address

City

__$command_id

0x000000310000BC580003

NULL

0x000000310000BC580002

3

0x04

1

M

Sathish

Ameerpet

HYD

1

0x000000310000BC580003

NULL

0x000000310000BC580002

4

0x04

1

M

Sathish Chandra

Ameerpet

HYD

1

0x000000310000C1A00003

NULL

0x000000310000C1A00002

1

0x1F

2

A

Abhinav

LBNagar

Hyderabad

1




in the same way, if I go and insert new record Bhanu,

Now we can monitor insert, update and delete activities on a table dbo_Persons_CT.          

Now there is one more important field in this table here called as the operation field.      

The additional columns include

  • __$start_lsn and __$end_lsn that show the commit log sequence number (LSN) assigned by the SQL Server Engine to the recorded change
  • __$seqval that shows the order of that change related to other changes in the same transaction, 
  • __$operation that shows the operation type of the change, where 1 = delete, 2 = insert, 3 = update (before change), and 4 = update (after change)
  • __$update_mask that is a bit mask defined for each captured column, identifying the updating columns

This operation field actually helps you to track that the kind of activity has happened and when this transaction occurred. 

In other words, you can track which kind of the operation like insert, delete or update performed.          

Now there are a couple of more tables here and probably these tables can be interesting.

[cdc].[captured_columns]

[cdc].[change_tables]

[cdc].[ddl_history]

[cdc].[index_columns]

[cdc].[lsn_time_mapping]

The first table, you know, which is very interesting is the captured_columns table.   In this table, basically it just keeps the record of what kind of columns it is tracking.  you can see that currently we have enabled CDC on the dbo_Persons table. So we are tracking the dbo_Persons.

Second, it has one more table here called as the change_tables table, here it tracks on basically which tables has been enabled for CDC.    

The most important table is the dbo_Persons_CT table where you will get the old values and new values with the operation field saying that what kind of activities was conducted on the record.

Limitations

Change Data Capture can be easily used to audit only the database DML changes, albeit it with no option to monitor SELECT statement, with the negligible configuration effort. On the other hand, to consider CDC as a SQL Server Audit solution, it requires significant maintenance and administration effort. This includes automating an archiving mechanism, as the tracking data will be kept in the change table for a configurable number of days and will be stored in the same or different data file, that should be also monitored and maintained.

In addition, the change tables will be stored under each database, and a function will be created for each tracked table. This makes it cumbersome and requires significant programming effort to create a consolidated auditing report that reads the DML changes information from all tables under the same database, from all databases under the same instance, or cross multiple instance.

Another limitation for CDC feature as a SQL Server Audit solution is difficult process that is required to handle the DDL changes on CDC enabled table, as having the Change Data Capture enabled on the source table will not prevent performing DDL changes on that table.

Also, if the SQL Server Agent service is not running, CDC capture job will not work, and the database log file will grow rapidly, even if the database recovery model is Simple, as the log truncation will not advance, even if a CHECKPOINT is performed, till all the changes that are waiting for capture will be gathered by CDC capture process.

No comments:

Post a Comment

Popular Posts