Você está na página 1de 61

Asurion[Company address]

Enable a Database for Replication (SQL Server


Management Studio)
SQL Server 2012
A database is implicitly enabled for replication when a member of the sysadmin fixed
server role creates a publication with the New Publication Wizard. A member of
the sysadmin fixed server role can also enable a database for replication explicitly, so that
a member of the db_owner fixed database role can create one or more publications in that
database. To enable a database explicitly, use the Publication Databases page of
the Publisher Properties - <Publisher> dialog box. For more information about accessing
this dialog box, see Create a Publication.

To enable a database for replication


1. On the Publication Databases page of the Publisher Properties <Publisher> dialog box, select the Transactional and/or Merge check box for each
database you want to replicate. Select Transactional to enable the database for
snapshot replication.
2. Click OK.

Enable a Remote Publisher at a Distributor (SQL Server


Management Studio)
SQL Server 2012
Enable a Publisher to use a remote Distributor on the Publishers page. This page is
available in the Configure Distribution Wizard and the Distributor Properties <Distributor> dialog box. For more information about using the wizard and accessing the
dialog box, see Configure Publishing and Distribution and View and Modify Distributor and
Publisher Properties.

To enable a Publisher in the Configure Distribution Wizard


1. On the Publishers page of the Configure Distribution Wizard, click Add.

2. Click Add SQL Server Publisher. For information about enabling an Oracle
Publisher to use a Distributor, see Create a Publication from an Oracle Database.
3. In the Connect to Server dialog box, specify connection information for the
Publisher that will use the remote Distributor, and then clickConnect.
4. On the Distributor Password page, in the Password and Confirm password text
boxes, specify a strong password for the distributor_adminaccount, which
replication uses to connect from the Publisher to the Distributor to perform
administrative tasks.
5. To view and modify settings for a Publisher, click the properties button ().
6. Click OK.

To enable a Publisher in the Distributor Properties dialog box


1. On the Publishers page of the Distributor Properties - <Distributor> dialog box,
click Add.
2. Click Add SQL Server Publisher. For information about enabling an Oracle
Publisher to use a Distributor, see Create a Publication from an Oracle Database.
3. In the Connect to Server dialog box, specify connection information for the
Publisher that will use the remote Distributor, and then clickConnect.
4. On the Publishers page, in the Password and Confirm password text boxes,
specify a strong password for the distributor_admin account, which replication uses
to connect from the Publisher to the Distributor to perform administrative tasks.
5. To view and modify settings for a Publisher, click the properties button ().
6. Click OK.

Set the Distribution Retention Period for Transactional


Publications (SQL Server Management Studio)
SQL Server 2012
Specify the minimum distribution retention period and maximum distribution retention
period on the Distribution Database Properties - <DistributionDatabase> dialog box.
This is available from the General page of the Distributor Properties -

<Distributor> dialog box. For more information about accessing this dialog box, see View
and Modify Distributor and Publisher Properties.

To specify the distribution retention period


1. On the General page of the Distributor Properties - <Distributor> dialog box,
click the properties button () for the distribution database.
2. To specify the minimum distribution retention period, enter a value in the At
least box; to specify the maximum distribution retention period, enter a value in
the But not more than box.
3. Click OK.

Set the History Retention Period (SQL Server Management


Studio)
SQL Server 2012
Specify the history retention period on the General page of the Distribution Database
Properties - <DistributionDatabase> dialog box. This setting controls how long
replication agent history is stored. This page is available from the General page of
the Distributor Properties - <Distributor>dialog box. For more information about
accessing this dialog box, see View and Modify Distributor and Publisher Properties.

To specify the history retention period


1. On the General page of the Distributor Properties - <Distributor> dialog box,
click the properties button () for the distribution database.
2. Enter a value in the Store replication performance history at least box.
3. Click OK.

Change Data Capture


Change data capture provides historical change information for a user table by capturing
both the fact that DML changes were made and the actual data that was changed. Changes
are captured by using an asynchronous process that reads the transaction log and has a low
impact on the system.
As shown in the following illustration, the changes that were made to user tables are
captured in corresponding change tables. These change tables provide an historical view of
the changes over time. The change data capture functions that SQL Server provides enable
the change data to be consumed easily and systematically.

Security Model
This section describes the change data capture security model.

Configuration and Administration


To either enable or disable change data capture for a database, the caller
of sys.sp_cdc_enable_db (Transact-SQL) or sys.sp_cdc_disable_db (TransactSQL) must be a member of the fixed server sysadmin role. Enabling and disabling
change data capture at the table level requires the caller of sys.sp_cdc_enable_table
(Transact-SQL) and sys.sp_cdc_disable_table (Transact-SQL) to either be a member of
the sysadmin role or a member of the database database db_owner role.
Use of the stored procedures to support the administration of change data capture
jobs is restricted to members of the server sysadmin role and members of
the database db_owner role.

Change Enumeration and Metadata Queries


To gain access to the change data that is associated with a capture instance, the user
must be granted select access to all the captured columns of the associated source
table. In addition, if a gating role is specified when the capture instance is created,
the caller must also be a member of the specified gating role. Other general change

data capture functions for accessing metadata will be accessible to all database
users through the public role, although access to the returned metadata will also
typically be gated by using select access to the underlying source tables, and by
membership in any defined gating roles.

DDL Operations to Change Data Capture Enabled Source Tables


When a table is enabled for change data capture, DDL operations can only be applied
to the table by a member of the fixed server rolesysadmin, a member of
the database role db_owner, or a member of the database role db_ddladmin.
Users who have explicit grants to perform DDL operations on the table will receive
error 22914 if they try these operation.

Data Type Considerations for Change Data Capture


All base column types are supported by change data capture. The following table lists the
behavior and limitations for several column types.

Type of
Column

Changes
Captured in
Change
Tables

Limitations

Sparse
Columns

Yes

Does not support capturing changes when


using a columnset.

Computed
Columns

No

Changes to computed columns are not


tracked. The column will appear in the
change table with the appropriate type, but
will have a value of NULL.

XML

Yes

Changes to individual XML elements are not


tracked.

Timestamp

Yes

The data type in the change table is


converted to binary.

BLOB data
types

Yes

The previous image of the BLOB column is


stored only if the column itself is changed.

Change Data Capture and Other SQL Server Features


This section describes how the following features interact with change data capture:

Database mirroring

Transactional replication

Database restore or attach

Database Mirroring
A database that is enabled for change data capture can be mirrored. To ensure that capture
and cleanup happen automatically on the mirror, follow these steps:
1. Ensure that SQL Server Agent is running on the mirror.
2. Create the capture job and cleanup job on the mirror after the principal has failed
over to the mirror. To create the jobs, use the stored procedure sys.sp_cdc_add_job
(Transact-SQL).
For more information about database mirroring, see Database Mirroring (SQL Server).

Transactional Replication
Change data capture and transactional replication can coexist in the same database, but
population of the change tables is handled differently when both features are enabled.
Change data capture and transactional replication always use the same
procedure, sp_replcmds, to read changes from the transaction log. When change data
capture is enabled on its own, a SQL Server Agent job calls sp_replcmds. When both
features are enabled on the same database, the Log Reader Agent calls sp_replcmds. This

agent populates both the change tables and the distribution database tables. For more
information, see Replication Log Reader Agent.
Consider a scenario in which change data capture is enabled on the AdventureWorks2012
database, and two tables are enabled for capture. To populate the change tables, the
capture job calls sp_replcmds. The database is enabled for transactional replication, and a
publication is created. Now, the Log Reader Agent is created for the database and the
capture job is deleted. The Log Reader Agent continues to scan the log from the last log
sequence number that was committed to the change table. This ensures data consistency in
the change tables. If transactional replication is disabled in this database, the Log Reader
Agent is removed and the capture job is re-created.

Note

When the Log Reader Agent is used for both change data capture and
transactional replication, replicated changes are first written to the
distribution database. Then, captured changes are written to the change
tables. Both operations are committed together. If there is any latency in
writing to the distribution database, there will be a corresponding latency
before changes appear in the change tables.
Restoring or Attaching a Database Enabled for Change Data Capture
SQL Server uses the following logic to determine if change data capture remains enabled
after a database is restored or attached:

If a database is restored to the same server with the same database name, change
data capture remains enabled.

If a database is restored to another server, by default change data capture is


disabled and all related metadata is deleted.
To retain change data capture, use the KEEP_CDC option when restoring the
database. For more information about this option, see RESTORE.

If a database is detached and attached to the same server or another server, change
data capture remains enabled.

If a database is attached or restored with the KEEP_CDC option to any edition other
than Enterprise, the operation is blocked because change data capture requires SQL
Server Enterprise. Error message 932 is displayed:

SQL Server cannot load database '%.*ls' because change data capture
is enabled. The currently installed edition of SQL Server does not
support change data capture. Either disable change data capture in the
database by using a supported edition of SQL Server, or upgrade the
instance to one that supports change data capture.
You can use sys.sp_cdc_disable_db to remove change data capture from a restored or
attached database.
[Top]

Related Tasks (required)


Task

Topic

Provides an overview of change data capture.

About Change
Data Capture
(SQL Server)

Describes how to enable and disable change data


capture on a database or table.

Enable and
Disable Change
Data Capture
(SQL Server)

Describes how to administer and monitor change data


capture.

Administer and
Monitor Change
Data Capture
(SQL Server)

Describes how to work with the change data that is


available to change data capture consumers. This topic
covers validating LSN boundaries, the query functions,
and query function scenarios.

Work with
Change Data
(SQL Server)

Provides an overview of change tracking.

About Change
Tracking (SQL
Server)

Describes how to enable and disable change tracking on


a database or table.

Enable and
Disable Change
Tracking (SQL
Server)

Describes how to manage change tracking, configure


security, and determine the effects on storage and
performance when change tracking is used.

Manage Change
Tracking (SQL
Server)

Describes how applications that use change tracking


can obtain tracked changes, apply these changes to
another data store, and update the source database.
This topic also describes the role change tracking plays
when a failover occurs and a database must be restored
from a backup.

Work with
Change Tracking
(SQL Server)

Enable and Disable Change Data Capture (SQL Server)


SQL Server 2012
This topic describes how to enable and disable change data capture for a database and a
table.

Enable Change Data Capture for a Database


Before a capture instance can be created for individual tables, a member of
the sysadmin fixed server role must first enable the database for change data capture. This
is done by running the stored procedure sys.sp_cdc_enable_db (Transact-SQL) in the
database context. To determine if a database is already enabled, query
the is_cdc_enabled column in the sys.databases catalog view.
When a database is enabled for change data capture, the cdc schema, cdc user, metadata
tables, and other system objects are created for the database. The cdc schema contains the

change data capture metadata tables and, after source tables are enabled for change data
capture, the individual change tables serve as a repository for change data. The cdc schema
also contains associated system functions used to query for change data.
Change data capture requires exclusive use of the cdc schema and cdc user. If either a
schema or a database user named cdc currently exists in a database, the database cannot
be enabled for change data capture until the schema and or user are dropped or renamed.
See the Enable Database for Change Data Capture template for an example of enabling a
database.

Important

To locate the templates in SQL Server Management Studio, go to View,


click Template Explorer, and then select SQL Server
Templates. Change Data Capture is a sub-folder. Under this folder, you
will find all the templates referenced in this topic. There is also a Template
Explorer icon on the SQL Server Management Studio toolbar.

Transact-SQL
-- ================================
-- Enable Database for CDC template
-- ================================
USE MyDB
GO
EXEC sys.sp_cdc_enable_db
GO

Disable Change Data Capture for a Database


A member of the sysadmin fixed server role can run the stored
procedure sys.sp_cdc_disable_db (Transact-SQL) in the database context to disable change
data capture for a database. It is not necessary to disable individual tables before you
disable the database. Disabling the database removes all associated change data capture
metadata, including the cdc user and schema and the change data capture jobs. However,
any gating roles created by change data capture will not be removed automatically and
must be explicitly deleted. To determine if a database is enabled, query
theis_cdc_enabled column in the sys.databases catalog view.
If a change data capture enabled database is dropped, change data capture jobs are
automatically removed.
See the Disable Database for Change Data Capture template for an example of disabling a
database.

Important

To locate the templates in SQL Server Management Studio, go to View,


click Template Explorer, and then click SQL Server
Templates. Change Data Capture is a sub-folder where you will find all
the templates that are referenced in this topic. There is also a Template
Explorer icon on the SQL Server Management Studio toolbar.
Transact-SQL
-- =================================================
-- Disable Database for Change Data Capture template
-- =================================================
USE MyDB
GO
EXEC sys.sp_cdc_disable_db
GO

Enable Change Data Capture for a Table


After a database has been enabled for change data capture, members of
the db_owner fixed database role can create a capture instance for individual source tables
by using the stored procedure sys.sp_cdc_enable_table. To determine whether a source
table has already been enabled for change data capture, examine the is_tracked_by_cdc
column in the sys.tables catalog view.
The following options can be specified when creating a capture instance:
Columns in the source table to be captured .
By default, all of the columns in the source table are identified as captured columns. If only a
subset of columns need to be tracked, such as for privacy or performance reasons, use
the @captured_column_list parameter to specify the subset of columns.
A filegroup to contain the change table.
By default, the change table is located in the default filegroup of the database. Database
owners who want to control the placement of individual change tables can use
the @filegroup_name parameter to specify a particular filegroup for the change table
associated with the capture instance. The named filegroup must already exist. Generally, it
is recommended that change tables be placed in a filegroup separate from source tables.
See theEnable a Table Specifying Filegroup Option template for an example showing
use of the @filegroup_name parameter.
Transact-SQL
-- ===================================================
-- Enable a Table Specifying Filegroup Option Template
-- ===================================================
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',

@source_name = N'MyTable',
@role_name
= N'MyRole',
@filegroup_name = N'MyDB_CT',
@supports_net_changes = 1
GO

A role for controlling access to a change table.


The purpose of the named role is to control access to the change data. The specified role
can be an existing fixed server role or a database role. If the specified role does not already
exist, a database role of that name is created automatically. Members of either
the sysadmin or db_owner role have full access to the data in the change tables. All other
users must have SELECT permission on all the captured columns of the source table. In
addition, when a role is specified, users who are not members of either
the sysadmin or db_owner role must also be members of the specified role.
If you do not want to use a gating role, explicitly set the @role_name parameter to NULL.
See the Enable a Table Without Using a Gating Roletemplate for an example of
enabling a table without a gating role.
Transact-SQL
-- ===================================================
-- Enable a Table Without Using a Gating Role template
-- ===================================================
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name
= NULL,
@supports_net_changes = 1
GO

A function to query for net changes.


A capture instance will always include a table valued function for returning all change table
entries that occurred within a defined interval. This function is named by appending the
capture instance name to "cdc.fn_cdc_get_all_changes_". For more information,
seecdc.fn_cdc_get_all_changes_<capture_instance> (Transact-SQL).
If the parameter @supports_net_changes is set to 1, a net changes function is also
generated for the capture instance. This function returns only one change for each distinct
row changed in the interval specified in the call. For more information,
see cdc.fn_cdc_get_net_changes_<capture_instance> (Transact-SQL).
To support net changes queries, the source table must have a primary key or unique index to
uniquely identify rows. If a unique index is used, the name of the index must be specified
using the @index_name parameter. The columns defined in the primary key or unique index
must be included in the list of source columns to be captured.
See the Enable a Table for All and Net Changes Queries template for an example
demonstrating the creation of a capture instance with both query functions.
Transact-SQL

-=======================================================
-- Enable a Table for All and Net Changes Queries template
-=======================================================
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name
= N'MyRole',
@supports_net_changes = 1
GO

Note

If change data capture is enabled on a table with an existing primary key,


and the @index_name parameter is not used to identify an alternative
unique index, the change data capture feature will use the primary key.
Subsequent changes to the primary key will not be allowed without first
disabling change data capture for the table. This is true regardless of
whether support for net changes queries was requested when change data
capture was configured. If there is no primary key on a table at the time it is
enabled for change data capture, the subsequent addition of a primary key is
ignored by change data capture. Because change data capture will not use a
primary key that is created after the table was enabled, the key and key
columns can be removed without restrictions.

Disable Change Data Capture for a Table


Members of the db_owner fixed database role can remove a capture instance for individual
source tables by using the stored proceduresys.sp_cdc_disable_table. To determine
whether a source table is currently enabled for change data capture, examine
the is_tracked_by_cdc column in the sys.tables catalog view. If there are no tables
enabled for the database after the disabling takes place, the change data capture jobs are
also removed.
If a change data capture-enabled table is dropped, change data capture metadata that is
associated with the table is automatically removed.
See the Disable a Capture Instance for a Table template for an example of disabling a table.
Transact-SQL
-- ===============================================
-- Disable a Capture Instance for a Table template
-- ===============================================
USE MyDB
GO

EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@capture_instance = N'dbo_MyTable'
GO

Administer and Monitor Change Data Capture (SQL


Server)
SQL Server 2012
This topic describes how to administer and monitor change data capture.

In This Topic

Capture Job

Cleanup Job

Monitor the Change Data Capture Process

Capture Job
The capture job is initiated by running the parameterless stored
procedure sp_MScdc_capture_job. This stored procedure starts by extracting the
configured values for maxtrans, maxscans, continuous, and pollinginterval for the capture
job from msdb.dbo.cdc_jobs. These configured values are then passed as parameters to the
stored procedure sp_cdc_scan. This is used to invoke sp_replcmds to perform the log scan.

Capture Job Parameters


To understand capture job behavior, you must understand how the configurable parameters
are used by sp_cdc_scan.

maxtrans Parameter
The maxtrans parameter specifies the maximum number of transactions that can be
processed in a single scan cycle of the log. If, during the scan, the number of transactions to
be proessed reaches this limit, no additional transactions are included in the current scan.
After a scan cycle is complete, the number of transactions that were processed will always
be less than or equal to maxtrans.

maxscans Parameter
The maxscans parameter specifies the maximum number of scan cycles that are attempted
to drain the log before either returning (continuous = 0) or executing a waitfor (continuous =
1).

continous Parameter
The continuous parameter controls whether sp_cdc_scan relinquishes control in after either
draining the log or executing the maximum number of scan cycles (one shot mode). It also
controles whether sp_cdc_scan continues to run until explicitly stopped (continuous mode).

One Shot Mode


In one shot mode, the capture job requests sp_cdc_scan to perform up to maxtrans scans
to try to drain the log and return. Any transactions in addition to maxtrans that are present
in the log will be processed in later scans.
One shot mode is used in controlled tests, where the volume of transactions to be processed
is known, and there are advantages to the fact that the job closes automatically on when it
is finished. One shot mode is not recommended for production use. This is because t relies
on the job schedule to manage how frequently the scan cycle is run.
When running in one shot mode, you can compute an upper bound on expected throughput
of the capture job, expressed in transactions per second by using the following computation:
(maxtrans * maxscans) / number of seconds between scans
Even if the time that is required to scan the log and populate the change tables were not
significantly different from 0, the average throughput of the job could not exceed the value
obtained by dividing the maximum allowed transactions for a single scan multiplied by the
maximum allowed scans by the number of seconds separating log processing.
If one shot mode were to be used to regulate log scanning, the number of seconds between
log processing would have to be governed by the job schedule. When this kind of behavior is
desired, running the capture job in continuous mode is a better way to manage rescheduling
the log scan.

Continuous Mode and the Polling Interval


In continuous mode, the capture job requests that sp_cdc_scan run continuously. This lets
the stored procedure manage its own wait loop by providing not only for maxtrans and
maxscans but also a value for the number of seconds between log processing (the polling
interval). Running in this mode, the capture job remains active, executing
a WAITFOR between log scanning.

Note

When the value of the polling interval is greater than 0, the same upper
limit on throughput for the recurring one shot job also applies to the job
operation in continuous mode. That is, (maxtrans * maxscans) divided by a
nonzero polling interval will put an upper bound on the average number of
transactions that can be processed by the capture job.

Capture Job Customization


For the capture job, you can apply additional logic to determine whether a new scan begins
immediately or whether a sleep is imposed before it starts a new scan instead of rely on a
fixed polling interval. The choice could be based merely on time of the day, perhaps
enforcing very long sleeps during peak activity times, and even moving to a polling interval
of 0 at close of day when it is important to complete the days processing and prepare for
nightly runs. Capture process progress could also be monitored to determine when all
transactions committed by mid-night had been scanned and deposited in change tables.
This lets the capture job end, to be restarted by a scheduled daily restart. By replacing the
delivered job step calling sp_cdc_scan with a call to a user written wrapper
for sp_cdc_scan, highly customized behavior can be obtained with little additional effort.
[Top]

Cleanup Job
This section provides information about how the change data capture cleanup job works.

Structure of the Cleanup Job


Change data capture uses a retention based cleanup strategy to manage change table size.
The cleanup mechanism consists of a SQL Server Agent Transact-SQL job that is created
when the first database table is enabled. A single cleanup job handles cleanup for all
database change tables and applies the same retention value to all defined capture
instances.

The cleanup job is initiated by running the parameterless stored


procedure sp_MScdc_cleanup_job. This stored procedure starts by extracting the
configured retention and threshold values for the cleanup job from msdb.dbo.cdc_jobs.
The retention value is used to compute a new low watermark for the change tables. The
specified number of minutes is substracted from the maximum tran_end_time value from
thecdc.lsn_time_mapping table to obtain the new low water mark expressed as a datetime
value. The CDC.lsn_time_mapping table is then used to convert this datetime value to a
corresponding lsn value. If the same commit time is shared by multiple entries in the table,
the lsn that corresponds to the entry that has the smallest lsn is chosen as the new low
watermark. This lsn value is passed to sp_cdc_cleanup_change_tables to remove change
table entries from the database change tables.

Note

The advantage of using the commit time of the recent transaction as the
base for computing the new low watermark is that it lets the changes
remain in change tables for the specified time. This happens even when
the capture process is running behind. All entries that have the same
commit time as the current low watermark continue to be represented
within the change tables by choosing the smallest lsn that has the shared
commit time for the actual low watermark.

When a cleanup is performed, the low watermark for all capture instances is initially updated
in a single transaction. It then tries to remove obsolete entries from the change tables and
the cdc.lsn_time_mapping table. The configurable threshold value limits how many entries
are deleted in any single statement. Failure to perform the delete on any individual table will
not prevent the operation from being attempted on the remaining tables.

Cleanup Job Customization


For the cleanup job, the possibility for customization is in the strategy used to determine
which change table entries are to be discarded. The only supported strategy in the delivered
cleanup job is a time-based one. In that situation, the new low watermark is computed by
subtracting the allowed retention period from the commit time of the last transaction
processed. Beacuse the underlying cleanup procedures are based on lsn instead of time,
any number of strategies can be used to determine the smallest lsn to keep in the change
tables. Only some of these are strictly time-based. Knowledge about the clients, for
example, could be used to provide a failsafe if downstream processes that require access to
the change tables cannot run. Also, although the default strategy applies the same lsn to
clean up all the databases change tables, the underlying cleanup procedure, can also be
called to clean up at the capture instance level.

Monitor the Change Data Capture Process


Monitoring the change data capture process lets you determine if changes are being written
correctly and with a reasonable latency to the change tables. Monitoring can also help you
to identify any errors that might occur. SQL Server includes two dynamic management views
to help you monitor change data
capture: sys.dm_cdc_log_scan_sessions and sys.dm_cdc_errors.

Identify Sessions with Empty Result Sets


Every row in sys.dm_cdc_log_scan_sessions represents a log scan session (except the row
with an ID of 0). A log scan session is equivalent to one execution of sp_cdc_scan. During a
session, the scan can either return changes or return an empty result. If the result set is
empty, the empty_scan_count column in sys.dm_cdc_log_scan_sessions is set to 1. If there
are consecutive empty result sets, such as if the capture job is running continuously, the
empty_scan_count in the last existing row is incremented. For example, if
sys.dm_cdc_log_scan_sessions already contains 10 rows for scans that returned changes and
there are five empty results in a row, the view contains 11 rows. The last row has a value of
5 in the empty_scan_count column. To determine sessions that had an empty scan, run the
following query:
SELECT * from sys.dm_cdc_log_scan_sessions where empty_scan_count <> 0

Determine Latency
The sys.dm_cdc_log_scan_sessions management view includes a column that records the
latency for each capture session. Latency is defined as the elapsed time between a
transaction being committed on a source table and the last captured transaction being
committed on the change table. The latency column is populated only for active sessions.
For sessions with a value greater than 0 in the empty_scan_count column, the latency
column is set to 0. The following query returns the average latency for the most recent
sessions:
SELECT latency FROM sys.dm_cdc_log_scan_sessions WHERE session_id = 0
You can use latency data to determine how fast or slow the capture process is processing
transactions. This data is most useful when the capture process is running continuously. If
the capture process is running on a schedule, latency can be high because of the lag
between transactions being committed on the source table and the capture process running
at its scheduled time.

Another important measure of capture process efficiency is throughput. This is the average
number of commands per second that are processed during each session. To determine the
throughput of a session, divide the value in the command_count column by the value in the
duration column. The following query returns the average throughput for the most recent
sessions:
SELECT command_count/duration AS [Throughput] FROM sys.dm_cdc_log_scan_sessions
WHERE session_id = 0

Use Data Collector to Collect Sampling Data


The SQL Server data collector lets you collect snapshots of data from any table or dynamic
management view and build a performance data warehouse. When change data capture is
enabled on a database, it is useful to take snapshots of the sys.dm_cdc_log_scan_sessions
view and the sys.dm_cdc_errors view at regular intervals for later analysis. The following
procedure sets up a data collector for collecting sample data from the
sys.dm_cdc_log_scan_sessions management view.
Configuring Data Collection
1. Enable data collector and configure a management data warehouse. For more
information, see Manage Data Collection.
2. Execute the following code to create a custom collector for change data capture.
Transact-SQL
USE msdb;
DECLARE @schedule_uid uniqueidentifier;
-- Collect and upload data every 5 minutes
SELECT @schedule_uid = (
SELECT schedule_uid from sysschedules_localserver_view
WHERE name = N'CollectorSchedule_Every_5min')
DECLARE @collection_set_id int;
EXEC dbo.sp_syscollector_create_collection_set
@name = N' CDC Performance Data Collector',
@schedule_uid = @schedule_uid,
@collection_mode = 0,
@days_until_expiration = 30,
@description = N'This collection set collects CDC metadata',
@collection_set_id = @collection_set_id output;
-- Create a collection item using statistics from
-- the change data capture dynamic management view.
DECLARE @paramters xml;

DECLARE @collection_item_id int;


SELECT @paramters = CONVERT(xml,
N'<TSQLQueryCollector>
<Query>
<Value>SELECT * FROM sys.dm_cdc_log_scan_sessions</Value>
<OutputTable>cdc_log_scan_data</OutputTable>
</Query>
</TSQLQueryCollector>');
EXEC dbo.sp_syscollector_create_collection_item
@collection_set_id = @collection_set_id,
@collector_type_uid = N'302E93D1-3424-4BE7-AA8E-84813ECF2419',
@name = ' CDC Performance Data Collector',
@frequency = 5,
@parameters = @paramters,
@collection_item_id = @collection_item_id output;
GO
3. In SQL Server Management Studio, expand Management, and then expand Data
Collection. Right click CDC Performance Data Collector, and then click Start
Data Collection Set.
4. In the data warehouse you configured in step 1, locate the table
custom_snapshots.cdc_log_scan_data. This table provides a historical snapshot of
data from log scan sessions. This data can be used to analyze latency, throughput,
and other performance measures over time.

How to enable and use SQL Server Change Data Capture

In the previous article, we described the main characteristics of the SQL Server
feature for tracking data inserts, deletes and updates Change Data Capture. We
also compared it to another SQL Server auditing feature SQL Server Change
Tracking
In this article, well show how to enable and use the SQL Server Change Data
Capture feature

How to set up SQL Server Change Data Capture?


The feature is available only in SQL Server Enterprise and Developer editions,
starting with. It can be enabled only using system stored procedures. SQL Server
Management Studio provides a wide range of code templates for various feature
related actions
To open the templates:
1.

In the SQL Server Management Studio menu, open View

2.

Click Template Explorer

3.

Open SQL Server Templates

4.

Open the Change Data Capture sub-folder. The T-SQL templates for
administration, configuration, enumeration and meta data querying are available

To set up the feature:


1.

Make sure SQL Server Agent is running. If not, right-click it in Object


Explorer and click Start

2.

To enable the feature on the database, open the Enable Database for CDC
template in the Configuration sub-folder, and replace the database name with the
name of the database you want to track

3.

USE AdventureWorks2012
GO
EXEC sys.sp_cdc_enable_db
GO

The login used must have SQL Server sysadmin privileges and must be a db_owner
of the database. Otherwise, youll get the following error

Msg 22830, Level 16, State 1, Procedure sp_cdc_enable_db_internal, Line 193


The failure occurred when executing the command SetCDCTracked(Value = 1).

The error returned was 15517: Cannot execute as the database principal
because the principal dbo does not exist, this type of principal cannot be
impersonated, or you do not have permission.. Use the action and error to
determine the cause of the failure and resubmit the request.
One of the ways to fix it is to change the database owner to sa and execute
sys.sp_cdc_enable_db again
EXEC sp_changedbowner 'sa'
GO
EXEC sys.sp_cdc_enable_db
GO

4.

5.

To check whether Change Data Captureis enabled or disabled for a specific


database, query the sys.databases view. The is_cdc_enabled column value 0
indicates that the feature is disabled, otherwise its 1
SELECT name, is_cdc_enabled
FROM sys.databases

After the feature is enabled on the database, the cdc schema, cdc user, data
capture metadata tables are automatically created

6.
7.

You have to enable the feature for each table you want to track
EXEC sys.sp_cdc_enable_table

8.

@source_schema = N'Person',

9.

@source_name = N'Address',

10.

@role_name

= NULL,

11.

@supports_net_changes = 1
GO

When the feature on the table is successfully enabled, the following messages are
shown

Job cdc.AdventureWorks2012_capture started successfully.


Job cdc.AdventureWorks2012_cleanup started successfully.
There are four parameters available: @captured_column_list, @filegroup_name,
@role_name, and @supports_net_changes
By default, all columns in the table are tracked. If you want to track only the specific
ones, use the @captured_column_list parameter. The syntax is
@captured_column_list = N'AddressLine1, AddressLine2, City'

The @filegroup_name parameter can be used to change the default location of the
change tables, for example
@filegroup_name = N'SECONDARY'

By default, the change table is located in the default filegroup of the database.
Database owners who want to control the placement of individual change tables
can use the@filegroup_name parameter to specify a particular filegroup for the
change table associated with the capture instance. The named filegroup must
already exist. Generally, it is recommended that change tables be placed in a
filegroup separate from source tables.[1]
The template for changing the default filegroup is Enable a Table Specifying
Filegroup Option in the Configuration sub-folder
By default, all members of the sysadmin and db_owner roles have full access the
captured records. To limit access to the captured change data, create a new role
that provides necessary permissions on the captured information, and use the
@role_name parameter to grant the permissions only to the role members
When the @role_name parameter is set to NULL, only members of sysadmin and
db_owner roles have full access to captured information. When set to a specific role,
only the members of the role (called a gating role) can access the changed data
table. The template for assigning a specified role is Enable a Table Without
Using a Gating Role
@role_name = N'cdc_Admin'

The @supports_net_changes parameter enables to show multiple changes


aggregated as a single one. This parameter can be used only on tables that have a
primary key or unique index
For example, if the row was first inserted (_$operation = 2), and deleted
(_$operation = 1) afterwards, the net change will be that nothing has happened.
The feature captures both transactions, but the @supports_net_changes parameter
enables to see both individual and net changes

The template for setting the @supports_net_changes parameter is Enable a Table


for All and Net Changes Queries
@supports_net_changes = 1

12.

To check whether Change Data Capture is enabled on the table

13.
14.

SELECT name, is_tracked_by_cdc


FROM sys.tables
where name = 'Address'

If a table is tracked, 1 is returned, 0 otherwise


When the feature is enabled on the table, a capture table and up to two query
functions are automatically created. For the Person.Address table, these are the
cdc.Person_Address_CT table, and cdc.fn_cdc_get_all_changes_Person_Address and
cdc.fn_cdc_get_net_changes_Person_Address table valued functions. The latter one
is created only when the @supports_net_changes parameter is set to 1. These
functions are used to query change tables

The change table structure


The first five columns in the change table store specific transaction information
the start and end log sequence number, the operation type (delete, insert, update),
the ID of the column updated, and IDs of the columns affected by updates. The rest
of the columns are identical as in the source table and store the captured
information
Whenever a row is inserted or deleted in the source table, a new row identical to the
inserted or deleted one is added to the change table. When a row is updated, 2 rows
are inserted. The first one is identical to the row before the update, and the second
one to the row after the update

With time, the change tables grow. To maintain their size and keep those from
growing uncontrollably, the cdc.<database_name>_cleanup job is used
If the structure of the source table is modified, the structure of the change table is
updated accordingly

The Change Data Capture jobs


When the Change Data Capture feature is enabled for the first table in the
database, two SQL Server jobs are automatically created one to capture the
changes and another for cleaning up the old captured information

The capture job is in charge of capturing data changes and processing them into
change tables
It runs continuously, processing a maximum of 1000 transactions per scan cycle
with a wait of 5 seconds between cycles. The cleanup job runs daily at 2 A.M. It
retains change table entries for 4320 minutes or 3 days, removing a maximum of
5000 entries with a single delete statement. [2]
Like other SQL Server jobs, the capture job can be stopped and started. When the
job is stopped, the online transaction log is not scanned for the changes, and
changes are not added to the capture tables. The change capturing process is not
broken, as the changes will be processed once the job is started again. As the
feature that can cause a delay in log truncating, the un-scanned transactions will
not be overwritten, unless the feature is disabled on the database. However, the
capture job should be stopped only when necessary, such as in peak hours when
scanning logs can add load, and restarted afterwards
Change Data Capture can be enabled only using code, as SQL Server
Management Studio offers no options for the feature. It has to be enabled for each
table individually. For each tracked table, a new system table and up to two
functions are created, which brings additional load to the database. Although it
captures more information about transactions than SQL Server Change Tracking, it
doesnt answer the who, when, and how questions

How to analyze and read Change Data Capture (CDC) records

In the previous article, we described the main features of SQL Server Change Data
Capture and showed how to set it up. Now, we will analyze the records stored in
change tables and describe the methods to read them

The system tables created by the feature


The following tables are automatically created in the tracked database
when Change Data Capture is enabled:
cdc.captured_columns contains a row for each column tracked in the tracked
(source) tables
cdc.change_tables contains a row for each change table in the tracked database
cdc.ddl_history contains a row for each structure (Data Definition Language)
change of source tables
cdc.index_columns contains a row for each index column associated with a change
table. The index columns are used to uniquely identify rows in the source tables
cdc.lsn_time_mapping contains a row for each transaction in the source tables. It
maps Log Sequence Number values to the time the transaction was committed
The table dbo.cdc_jobs that stores configuration parameters for capture and
cleanup jobs is the only system table created in the msdb database
When the feature is enabled on a table, the change table
named cdc.<captured_instance>_CT is automatically created in the tracked
database. The table contains a row for each insert and delete on the source table,
and two rows for each update. The first one is identical to the row before the
update, and the second one to the row after the update. To query the table, use
the cdc.fn_cdc_get_all_changesand cdc.fn_cdc_get_net_changes functions

The first five columns contain the metadata necessary for the feature, the rest are
the exact replica of the source table
__$start_lsn the Log Sequence Number of the commited transaction. Every change
committed in the same transaction has its own row in the change table, but the
same __$start_lsn
__$end_lsn the column is always NULL in SQL Server 2012, future compatibility is
not guarantee
__$seqval the sequence value used to order the row changes within a transaction
__$operation indicates the change type made on the row
1.

Delete

2.

Insert

3.

Updated row before the change

4.

Updated row after the change


__$update_mask similar to the update mask available in Change Tracking, a bit
mask used to identify the ordinals of the modified columns

The system table valued functions


The same as with the SQL Server Change Tracking feature, the change
information in SQL ServerChange Data Capture is available through table valued
functions. We will describe and show examples for the ones most frequently used
[1]
cdc.fn_cdc_get_all_changes_<capture_instance> returns a row for each change in
the source table that belongs to the Log Sequence Number in the range specified by
the input parameters

cdc.fn_cdc_get_all_changes_capture_instance(from_lsn, to_lsn,
'<row_filter_option>')

The <row_filter_option> parameter affects only the UPDATEs. It can have the
following values:

All every change is represented with a single row

All update old UPDATEs are represented by 2 rows showing before and
update values of the row

cdc.fn_cdc_get_net_changes_<capture_instance> returns one row that represents


multiple changes on a single row aggregated as a single one
cdc.fn_cdc_get_net_changes_capture_instance( from_lsn , to_lsn ,
'<row_filter_option>' )

The rows with a single change are represented the same way as with
the cdc.fn_cdc_get_all_changesfunction. For example, if a column was first updated
from 1970 Napa Street to 123 Street and then to 99 Daisy Street,
the cdc.fn_cdc_get_all_changes function returns all 3 transactions

While the cdc.fn_cdc_get_net_changes function returns only one

The <row_filter_option> parameter can have the following values:

All returns the LSN of the final change, the __$update_mask column is
always NULL

All with mask returns the LSN of the final change, the __$update_mask
column shows the IDs of the modified columns

All with merge returns the LSN of the final change. The __$operation value
is 1 for a delete, 5 when the net operation is an insert or an update. The
__$update_mask column is always NULL

Both cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions require


two parameters the maximal and minimal Log Sequence Number (LSN) for the
queried set of records
To determine the Log Sequence Number that can be used in functions, two functions
are available:
sys.fn_cdc_get_min_lsn Returns the start_lsn column value for the specified
capture instance from the cdc.change_tables system table. This value represents
the low endpoint of the validity interval for the capture instance.Requires
membership in the sysadmin fixed server role or db_owner fixed database role. For
all other users, requires SELECT permission on all captured columns in the source
table and, if a gating role for the capture instance was defined, membership in that
database role. [2]
SELECT sys.fn_cdc_get_min_lsn('Person_Address') AS min_lsn

sys.fn_cdc_get_max_lsn similar to the sys.fn_cdc_get_min_lsn function, returns the


maximum Log Sequence Number, the high endpoint of the validity interval for all
source tables. No parameters are required
SELECT sys.fn_cdc_get_min_lsn('Person_Address') AS min_lsn
SELECT sys.fn_cdc_get_min_lsn() AS min_lsn

sys.fn_cdc_get_column_ordinal returns the ordinal of the column in a source table.


These ordinals are used in change tables to reference a specific column
SELECT sys.fn_cdc_get_min_lsn( 'Person_Address', 'AddressLine1')

returns 2, as its the second column in the Person.Address table


Change Data Capture also provides a range of system stored procedures to
configure, maintain, and manage the feature on the database and tables, and to
acquire captured information [3]

Reading the records


In the following example, we inserted three rows into the Person.Address table,
update one, and deleted one
INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2],
[City],
[StateProvinceID]) VALUES (32522, N'1234 Rodeo Drive', NULL, N'New York', 79)

INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2],


[City],
[StateProvinceID]) VALUES (32523, N'2345 Red Hills Way', NULL, N'Bellevue', 79)

INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2],


[City],
[StateProvinceID]) VALUES (32524, N'3456 Big City Street', NULL, N'Edmonds', 79)

UPDATE [Person].[Address] SET [AddressLine1] = N'5415 La Valetta Blv.' ,


[City] =
N'Seattle' WHERE [AddressID] = 16

DELETE FROM [Person].[Address] WHERE [AddressID] = 32524

To read the change tables, MSDN doesnt recommend direct querying of the table,
but using the system functions instead
To read all captured information for the Person.Address table, execute:
DECLARE @from_lsn binary (10), @to_lsn binary (10)

SET @from_lsn = sys.fn_cdc_get_min_lsn('Person_Address')


SET @to_lsn = sys.fn_cdc_get_max_lsn()

SELECT *
FROM cdc.fn_cdc_get_all_changes_Person_Address(@from_lsn, @to_lsn, 'all')
ORDER BY __$seqval

the first three rows with __$operation = 2 show the inserted rows

the fourth row with __$operation = 3 is the row that was updated before the
update

the fifth row with __$operation= 4 is the updated row after the update

the last row with __$operation = 1 is the deleted row


Here are the same operations tracked by SQL Server Change Tracking

As shown, Change Data Capture shows the exact values of all tracked columns in
the modified rows, even if the column itself was not updated
Following code is an example of how to use the fn_cdc_get_column_ordinal and
__$update_mask to check whether a column has been changed or not
DECLARE @from_lsn binary (10) ,@to_lsn binary (10)
DECLARE @AddressIDPosition INT
DECLARE @AddressLine1Position INT
DECLARE @AddressLine2Position INT

DECLARE @CityPosition INT


DECLARE @StProvIDPos INT
DECLARE @PostalCode INT

SET @from_lsn = sys.fn_cdc_get_min_lsn('Person_Address')


SET @to_lsn = sys.fn_cdc_get_max_lsn()
SET @AddressIDPosition = sys.fn_cdc_get_column_ordinal('Person_Address',
'AddressID')
SET @AddressLine1Position = sys.fn_cdc_get_column_ordinal('Person_Address',
'AddressLine1')
SET @AddressLine2Position = sys.fn_cdc_get_column_ordinal('Person_Address',
'AddressLine2')
SET @CityPosition = sys.fn_cdc_get_column_ordinal('Person_Address', 'City')
SET @StProvIDPos = sys.fn_cdc_get_column_ordinal('Person_Address',
'StateProvinceID')
SET @PostalCode = sys.fn_cdc_get_column_ordinal('Person_Address', 'PostalCode')

SELECT fn_cdc_get_all_changes_Person_Address.__$operation
,fn_cdc_get_all_changes_Person_Address.__$update_mask
,sys.fn_cdc_is_bit_set(@AddressIDPosition, __$update_mask) as
'UpdatedAddressID'
,sys.fn_cdc_is_bit_set(@AddressLine1Position, __$update_mask) as
'UpdatedLine1'
,sys.fn_cdc_is_bit_set(@AddressLine2Position, __$update_mask) as
'UpdatedLine2'
,sys.fn_cdc_is_bit_set(@CityPosition fn_cdc_get_column_ordinal) as
'UpdatedCity'
,sys.fn_cdc_is_bit_set(@StProvIDPos, __$update_mask) as
'UpdatedState'
,sys.fn_cdc_is_bit_set(@PostalCode, __$update_mask) as 'Updated Postal'
FROM cdc.fn_cdc_get_all_changes_Person_Address(@from_lsn, @to_lsn, 'all')
ORDER BY __$seqval

The result set shows 1 if the Updated_<column_name> was modified, 0 otherwise

While Change Tracking shows only what was changed and whether the change
was an insert, update, or delete, Change Data Capture shows the values inserted,
deleted or updated for the modified rows. For updates, it shows both old and new
values of the updated row
The feature doesnt track the user who made the change. To do that, you have to
create a new field where the users details are stored and updated after each
change. The same goes for the time of the change and the machine used to make
the change. The execution of the SELECT statements and object access are not
tracked
As change capturing is an asynchronous process. First a change is committed to a
source table and the change is added to the change table afterwards. There is a
delay between these two actions. The captured info has to be obtained using
functions. When a table schema changes, the changes to the affected columns will
be ignored unless a new change table is associated with the source
The data captured in change tables can grow uncontrollably, if you stop the job that
purges the data, or modify it so it doesnt run often enough. The feature is
supported only in Enterprise and Developer editions
If you need detailed information about the data changes on your tables, Change
Data Capture is a better solution than Change Tracking, as it provides the values
that were inserted, deleted or updated. If the feature doesnt track all events you
would like and provide all the information you are looking for, there are other
auditing solutions. In the next article in the series, well analyze the SQL Server
Auditing feature vs ApexSQL Audit

C D C C H A N G E D ATA C A P T U R E ( Z E R O C O S T E T L S O LU T I O N )
November 10, 2011

What is CDC ?
Change data capture records insert, update, and delete activity that is
applied to a SQL Server table
How it does ?

The source of change data for change data capture is the SQL Server
transaction log.
As inserts, updates, and deletes are applied to tracked source tables, entries
that describe those changes are added to the log. The log serves as input to the
change data capture capture process.
This reads the log and adds information about changes to the tracked tables
associated change table.

Functions are provided to enumerate the changes that appear in the change
tables over a specified range, returning the information in the form of a filtered
result set.

Where it is used ?
A good example of a data consumer that is targeted by this technology is an
extraction, transformation, and loading (ETL) application. An ETL application
incrementally
loads change data from SQL Server source tables to a data
warehouse or data mart.
we can eliminate the use the of after update/delete/insert trigger
Note:Change data capture is available only on the Enterprise, Developer, and
Evaluation editions of SQL Server.
Advantages of using CDC :

Minimal impact on the database (even more so if one uses log shipping to
process the logs on a dedicated host).
No need for programmatic changes to the applications that use the database.
Low latency in acquiring changes.
Transactional integrity: log scanning can produce a change stream that
replays the original transactions in the order they were committed. Such a change
stream include changes made to all tables participating in the captured
transaction.
No need to change the database schema

is CDC (Change Data Capture) available in other off the shelf


products)?
products perform change data capture using database transaction log files.
These include:

Attunity Stream
Centerprise Data Integrator from Astera
DatabaseSync from WisdomForce
GoldenGate Transactional Data Integration
HVR from HVR Software
DBMoto from HiT Software
Shadowbase from Gravic

IBM InfoSphere Change Data Capture (previously DataMirror Transformation


Server)
Informatica PowerExchange CDC Option (previously Striva)
Oracle Streams
Oracle Data Guard
Replicate1 from Vision Solutions
SharePlex from Quest Software
FlexCDC, part of Flexviews for MySQL

How to implement CDC ?


Lets get into the details of how to implement the CDC in SQL server 2008
Before enabling the cdc on the tables we need to enable the cdc on the
Database.
To determine weather the database is enabled or not check the
sys.databases table
SELECT NAME,IS_CDC_ENABLED
FROM SYS.DATABASES
WHERE NAME = 'SQLJUNKIESHARE'

Use the following system stored procedure to enable CDC on a database


USE SQLJUNKIESHARE
EXEC SYS.SP_CDC_ENABLE_DB
GO

after CDC is enabled on the database above listed tables are created in
system tables

cdc.captured_columns Returns the columns tracked for a specific capture


instance.
cdc.change_tables Returns tables created when CDC is enabled for a table.
Use sys.sp_cdc_help_change_data_capture to query this informationrather than
query this table directly.
cdc.ddl_history Returns rows for each DDL change made to the table, once
CDE is enabled. Use sys.sp_cdc_get_ddl_history instead of querying this table
directly.
cdc.index_columns Returns index columns associated with the CDC-enabled
table. Query
sys.sp_cdc_help_change_data_capture to retrieve this information rather than
querying this table directly.
cdc.lsn_time_mapping Helps you map the log sequence number to
transaction begin and end times. Again, avoid querying the table directly, and
instead use the functions sys.fn_cdc_map_lsn_to_time and
sys.fn_cdc_map_time_to_lsn.

after enabling CDC on database lets enable CDC on table


Now that Change Data Capture is enabled, I can proceed with capturing
changes for tables in
the database by using the sys.sp_cdc_enable_table system stored procedure.
The parameters of

this stored procedure are described below


sp_cdc_enable_table Parameters
Parameter Description

@source_schema
This parameter defines the schema of the object.
@source_name
This parameter specifies the table name.
@role_name
This option allows you to select the name of the userdefined role that will have permissions to access the CDC data.
@capture_instance
You can designate up to two capture instances for a
single table. This comes in handy if you plan on altering the schema of a table
already
captured by CDC. You can alter the schema
without affecting theoriginal CDC (unless it is a data type change), create a new
capture instance,
track changes in two tables, and then
drop the original capture instance once you are sure the new schema capture fits
your requirements. If
you dont designate the name, the
default value is schema_source.
@supports_net_changes
When enabled, this option allows you to show just
the latest change to the data within the LSN range selected. This option requires a
primary key be defined on the table. If no
primary key is defined, you can alsodesignate a unique key in the @index_name
option.
@index_name
This parameter allows you to designate the unique key
on the table to be used by CDC if a primary key doesnt exist.
@captured_column_list If you arent interested in tracking all column
changes, this option allows you to narrow down the list.
@filegroup_name
This option allows you to designate where the CDC data
will be stored. For very large data sets, isolation on a separate filegroup may yield
better manageability and performance.
@partition_switch
This parameter takes a TRUE or FALSE value
designating whether or not a ALTER TABLESWITCH PARTITION command will be
allowed against the CDC table (default is
FALSE)here how its like before enabling the cdc on a particular

4when we set the @supports_net_changes to 1 either we need to have a primary


key or we need to specify a non clustered index name
after making those changes when we execute the stored procedure result will
beJob cdc.sqljunkieshare_capture started successfully.
Job cdc.sqljunkieshare_cleanup started successfully.

Capture and Cleanup Jobs


Besides the Change Data Capture tables and query functions that have
been created in our example, two SQL Server Agent jobs are created: a
Capture and a Cleanup Job.
The Capture job generally runs continuously and is used to move changed
data to the CDC tables from the transaction log.
The Cleanup job runs on a scheduled basis to remove older data from the
CDC tables so that they dont get too large. By default, data older than

three days is automatically removed from CDC tables by this job.

we can also validate the settings of your newly configured capture instance
using the
sys.sp_cdc_help_change_data_capture stored procedure:
EXEC sys.sp_cdc_help_change_data_capture dbo, EMPLOYEE

The commit LSN both identifies changes that were committed within the
same transaction, and orders those transactions.

The column __$start_lsn identifies the commit log sequence number (LSN)
that was assigned to the change.

The column __$seqval can be used to order more changes that occur in the
same transaction.

The column __$operation records the operation that is associated with the
change:

1 = delete,

2 = insert,

3 = update (before image)

4 = update (after image).

The column __$update_mask is a variable bit mask with one defined bit for
each captured column. For insert and delete entries, the update mask will always
have all bits set. Update rows, however, will only have those bits set that
correspond to changed columns.
Before going in to more details lets look at some CDC functions

cdc.fn_cdc_get_all_changes_capture_instance ( from_lsn , to_lsn ,


<row_filter_option> )
<row_filter_option> ::= { all | all update old }
it is prefreble to use cdc function to query the change tables

DECLARE @from_lsn binary(10), @to_lsn binary(10)


SET @from_lsn =
sys.fn_cdc_get_min_lsn('dbo_EMPLOYEE')
SET @to_lsn = sys.fn_cdc_get_max_lsn()
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_EMPLOYEE (@from_lsn,
@to_lsn, 'all');
<row_filter_option> ::= { all | all update old }
Option

all is used to filter the result set so that to get all the changes on that table
or capture instance but one row for each change

(i.e one row for insert ,one row for delete one row for update)
all update old is used to filter the result set so that to get all the changes on that
table or capture instance with two rows for update
(i.e one containing the values of the captured columns before the update and
another containing the values of the captured columns after the update).
another important function
cdc.fn_cdc_get_net_changes_capture_instance ( from_lsn , to_lsn ,
<row_filter_option> )
<row_filter_option> ::= { all | all with mask | all with merge }
You might have confused what is the difference
between cdc.fn_cdc_get_net_changes_capture_instanceand cdc.fn_cdc_get_al
l_changes_capture_instance when to use net changes and all changes
cdc.fn_cdc_get_net_changes_capture_instance returns one net change row for
each source row changed within the specified LSN range. That is, when a source row
has multiple changes during the LSN range, a single row that reflects the final
content of the row is returned by the function. For example, if a transaction inserts a
row in the source table and a subsequent transaction within the LSN range updates
one or more columns in that row, the function returns only one row, which includes
the updated column values.
lets say we need to get all the net changes in the source the table during a period of
24 hours
DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10),
@to_lsn binary(10);
-- Obtain the beginning of the time interval.
SET @begin_time = GETDATE() -1;
-- DML statements to produce changes in the DBO.Employee table.
-- Obtain the end of the time interval.
SET @end_time = GETDATE();
-- Map the time interval to a change data capture query range.
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal',
@begin_time);
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal',
@end_time);
-- Return the net changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE(@from_lsn,
@to_lsn, 'all');
a new function we used above is sys.fn_cdc_map_time_to_lsn() ,function
returns the LSN when supplied with time and relational operator

sys.fn_cdc_map_time_to_lsn ( <relational_operator>,
tracking_time )
<relational_operator>::= { largest less than | largest less than or
equal | smallest greater than | smallest greater than or equal }
please use the below link to know some more CDC functions
http://msdn.microsoft.com/en-us/library/bb510744.aspx
now lets create a dimension table which servers as a destination table
CREATE TABLE DBO.DIMEMPLOYEE
(ID INT IDENTITY (1,1) PRIMARY KEY, COUSTEMERID INT UNIQUE ,FIRST_NAME
VARCHAR(50),LAST_NAME VARCHAR(50))

INSERT INTO DBO.DIMEMPLOYEE (COUSTEMERID,FIRST_NAME,LAST_NAME)

SELECT ID,FIRST_NAME,LAST_NAME FROM EMPLOYEE

now lets update some rows and delete some rows in the dbo.employee table

delete from dbo.employee where id between 50 and 100


update dbo.employee
set last_name = 'no lastname'
where id between 150 and 200

now lets use the below query to get the changes in the table after updating and deleting

DECLARE @begin_time datetime, @end_time datetime,


@from_lsn binary(10), @to_lsn binary(10);
-- Obtain the beginning of the time interval.
SET @begin_time = DATEADD(HH,-1,GETDATE());
-- DML statements to produce changes in the Db.employee table.

-- Obtain the end of the time interval.


SET @end_time = GETDATE();
-- Map the time interval to a change data capture query range.
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal'
, @begin_time);
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal'
, @end_time);

-- Return the net changes occurring within the query window.


SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE(@from_lsn, @to_lsn, 'all');

Now lets create a simple package in SSIS which querys these functions and updates
the dimension table

Use the below query in the OLEDB source , we can also query the change tables
directly
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE
(sys.fn_cdc_map_time_to_lsn
('smallest greater than or equal'
, DATEADD(HH,-1,GETDATE()))
, sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE()), 'all')

write simple update and delete queries in and map the id to coustemerid column in
DBO.DIMEMPLOYEE table

Using CDC with out effecting the source table we were able to log those changes
and using simple package we updated the data warehouse environment

Configuring SQL Server for CDC


You must perform a few configuration tasks to prepare SQL Server for
PowerExchange change data capture
(CDC).
If your SQL Server tables have a high level of update activity, use a distributed
server as the host of the
distribution database from which change data is captured. This practice prevents
competition between
PowerExchange CDC and your production database for CPU use and disk storage.
1. Install the SQL Server Management Objects (SMO) framework and the following
related packages, if they are
not already installed:
Microsoft Core XML Services (MSXML) 6.0
Microsoft SQL Server 2008 Management Objects
Microsoft SQL Server 2008 Native Client
Microsoft SQL Server 2008 Replication Management Objects
Microsoft SQL Server System CLR Types
You can download these packages from
http://www.microsoft.com/download/en/details.aspx?id=3522.
If you run SQL Server 2005, you can still download the SQL Server 2008 packages.
Note: On a 64-bit system, you must install both the X64 and X86 Microsoft SQL
Server 2008 Management
Objects packages and also the X86 Microsoft SQL Server System CLR Types
package. The X86 packages
provide a required 32-bit component.
2. Start the SQL Server Agent and Log Reader Agent if they are not running. For
more information, see your
Microsoft SQL Server documentation.
Configuring SQL Server for CDC 75
3. Configure and enable SQL Server transactional replication on the publication
database. For more information,
see your Microsoft SQL Server documentation.

Tip: The default transactional retention period at the Distributor is 72 hours. If you
are use the
PowerExchange Logger, accept this default retention period. If you do not use the
PowerExchange Logger,
Informatica recommends that you increase the retention period to 14 days.
However, you might need to a
lower value if you have a high volume of transactions or space constraints.
4. Verify that each source table in the distribution database has a primary key.

Managing SQL Server CDC


You might need to stop CDC for source tables occasionally, for example, to change
the table definitions.
Disabling Publication of Change Data for a SQL Server Source
You can disable publication of change data for a SQL Server source. For example,
you might disable publication
to perform some database maintenance, change the table definition, or avoid
capturing unwanted changes.
u Open the capture registration for the table, and change the Status setting from
Active to History.
This action disables publication of the SQL Server article for the table to the
distribution database, which
causes change capture to stop.
Warning: After the registration status is set to History, you cannot activate the
registration for CDC use again.
Changing a SQL Server Source Table Definition
If you change the definition of a SQL Server source table that is registered for
change data capture, use this
procedure to enable PowerExchange to use the updated table definition and
preserve access to previously
captured data. Table definition changes include adding, altering, or dropping
columns.
Tip: If you no longer need to capture change data from a column in a table, you can
remove the column from the
extraction map without changing the capture registration. Change data for that
column is still captured but is not
extracted.
Managing SQL Server CDC 81
To change a SQL Server source table definition:
1. Stop DELETE, INSERT, and UPDATE activity against the table.
2. Verify that any change data that was captured under the previous table definition
has completed extraction
processing. Then stop all workflows that extract change data for the table.
3. Delete the capture registration and extraction map.

4. Use DDL to change the table definition in SQL Server.


5. In the PowerExchange Navigator, create a new capture registration that reflects
the metadata changes and
set its status to Active. PowerExchange creates a corresponding extraction map.
The newly activated capture registration becomes eligible for change data capture.
6. If necessary, change the target table definition to reflect the source table
metadata changes.
7. In the PowerCenter Designer, import the altered source and target definitions.
Edit the mapping if necessary.
8. If necessary, rematerialize the target tables. After materialization completes,
create new restart tokens.
9. Create new restart tokens for the altered table.
10. Re-enable DELETE, INSERT, and UPDATE activity against the table.
11. Cold start the extraction workflows.
Changing the MULTIPUB Parameter Setting After Running Extractions
After you run change data extraction processing, you can change the MULTIPUB
parameter setting in the MSQL
CAPI_CONNECTION statement. You might need to do this task if you add or remove
publication databases that
include sources of CDC interest. To preserve data integrity, you must use the proper
procedure.
The MULTIPUB parameter indicates whether you extract data for articles in a single
publication database or in
multiple publications. For a single publication database, Informatica recommends
that you set MULTIPUB to N so
that PowerExchange can use more efficient extraction processing. For multiple
publications, you must set
MULTIPUB to Y, the default setting. This parameter applies to real time extractions
directly from the change
stream, and PowerExchange Logger for Linux, UNIX, and Windows extractions in
continuous extraction mode.
To switch the MULTIPUB setting from Y to N:
Use this procedure to switch the MULTIPUB from the default of Y to N. If you use the
PowerExchange Logger for

Linux, UNIX, and Windows, you must cold start it after making this change.
1. Stop extraction workflows that process the SQL Server distribution database and
that are running in real-time
extraction mode or continuous extraction mode.
2. If you use the PowerExchange Logger for Linux, UNIX, and Windows, stop the
PowerExchange Logger.
3. In the dbmover configuration file, edit the MSQL CAPI_CONNECTION statement to
switch the MULTIPUB
parameter setting from Y to N.
4. Cold start the PowerExchange Logger.
5. Restart the extraction workflows.
Note: The sequence tokens no longer include a timestamp.
To switch the MULTIPUB setting from N to Y:
Use this procedure to switch the MULTIPUB from N back to Y. If you use the
PowerExchange Logger for Linux,
UNIX, and Windows, you do not need to cold start it after making this change.
1. Stop DELETE, INSERT, and UPDATE activity on the SQL Server source tables.
2. Wait for the extraction workflows to reach the end of log and then stop them.
82 Chapter 5: Microsoft SQL Server CDC
3. In the dbmover configuration file, edit the MSQL CAPI_CONNECTION statement to
switch the MULTIPUB
parameter setting from Y to N.
4. To help avoid performance degradation, define the following index on the
distribution database:
USE [distribution]
GO
/****** Object: Index [IX_MSrepl_transactions] Script Date: 03/31/2012 11:56:07
******/
CREATE NONCLUSTERED INDEX [IX_MSrepl_transactions] ON [dbo].
[MSrepl_transactions]
(
[entry_time] ASC,
[publisher_database_id] ASC,

[xact_seqno] ASC,
[xact_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = ON, SORT_IN_TEMPDB =
OFF, IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
5. To get the current restart tokens for the end of log, use one of the following
methods:
Run the DTLUAPPL utility with the GENERATE RSTKKN option.
In the PowerExchange Navigator, perform a database row test with a SELECT
CURRENT_RESTART SQL
statement.
Specify the CURRENT_RESTART option on the RESTART1 and RESTART2 special
override statements
in the PWXPC restart token file. When the CDC session runs, PWXPC requests that
PowerExchange
provide restart tokens for the current EOL. PWXPC uses this restart information to
locate the extraction
start point.
6. Add the current restart tokens for the extractions to the restart token file.
7. Allow DELETE, INSERT, and UPDATE activity to resume on the SQL Server tables.
8. Cold start the extraction workflows.
Note: PowerExchange adds a timestamp in the sequence token to combine the data
from multiple publication
databases during extraction processing.

Você também pode gostar