
This is how the pioneers compressed their databases
SQL 2008 saw the introduction of data compression to SQL Server. (I'm referring specifically to data compression, not backup file compression.) This Enterprise-only feature allows SQL Server to store data on disk and in memory in a compressed format. SQL can perform three types of data compression - no compression, row compression, or page compression. I'm not going to go into the details of each method, but in a nutshell (and seriously over-simplified), row compression changes fixed-length fields into variable length fields and page compression includes row compression and also compresses data that is common across fields in all the rows on a page. The Books Online Entry can be found here.
How do you know if you can benefit from data compression? Microsoft provides a stored procedure that will give you an estimate of how much space you can save using the various compression methods: sp_estimate_data_compression_savings. Before you go enabling data compression, there are a few things to keep in mind.
- This is an Enterprise Edition (or Developer Edition) only feature. If you enable data compression in a database, you will only be able to restore that database to a SQL Server running Enterprise or Developer edition. Keep this in mind when designing your disaster recovery plan.
- Likewise, if you have any servers that will be accessing the compressed databases as a linked server, they will also need to be Enterprise or Developer editions.
- Compression comes with a price. SQL will store the data in memory and on disk in compresed form, so you gain performance by having fewer disk I/Os and being able to cache more data in memory. The flipside, however, is you'll need some extra CPU cycles to work with that data.
- SQL data compression is completely transparent to the user or application accessing the database.
If you've evaluated the pros and cons of data compression and decided it's something you want to implement, how do you go about doing it? SQL Server does not compress an entire database. Rather, it compresses tables and indexes (and indexed views and some other things - see the BOL link above for more details.) Enabling compression on an object requires rebuilding the object, much like an index rebuild. The TSQL commands are ALTER TABLE or ALTER INDEX, depending on which you are compressing. You include the DATA_COMPRESSION clause in the command.
"But Shaun," I hear you howling, "my database has hundreds of tables!!! Are you telling me I have to manually compress each one?" Unfortunately, yes. Furthermore, there is no way to tell SQL Server that all future objects should be compressed, so if your database has tables being created frequently, you'll need to look for uncompressed objects on a regular basis and compress them.
Luckily, compression can be performed in an online operation, meaning the objects are still available to users while the compression is happening. Unfortunately, if your table is a heap, the online compression will be single threaded, meaning it will be S-L-O-W. Multi-threaded compression for heaps only happens when SQL can have exclusive access to the table.
But back to that one-object-at-a-time issue. As all DBAs know, anything that has to be done one object at a time can be scripted and, thus, somewhat automated. I'm about to give you a script that will do this for you.
This script will created a stored procedure called db_compression. It is based on a script first written by Paul Nielsen several years ago. I've modified it a bit. The procedure takes three input paramters: @minCompression, @MaxRunTimeInMinutes, and @ExecuteCompressCommand. @minCompression is the minimum compression ratio threshold you want to hit before an object is compressed, expressed as a decimal. For example, if you only want to compress objects where you can achieve 25% compression, use a value of .25. @MaxRunTimeInMinutes is the maximum time the procecure will spend compressing objects. Note that this does NOT include the time it takes to perform the initial scan of all objects and calculate their estimated savings from compression. It is also not a hard and fast limit. The script only checks this time before each compress command is started, so it's possible a large table may make the procedure run longer than the limit entered here. Finally, the last option is a flag that tells the script if it should actually execute the compression commands or just print them out.
How it works
When first run, the script will create a list of all user tables and indexes in the database. (System tables cannot be compressed.) It writes this list to a table called dbEstimate. There is a flag for each record to indicate if the row has been processed or not and it also notes if the object currently has some sort of compression enabled. Once this table is populated, the script will execute the sp_estimate_data_compression_savings procedure against it. It will run this two times for each record - to get the estimated sizes for the two compression settings the object currently does not have. For example, if a table already has page compression enabled, the script will get estimated sizes for row compression and no compression. This portion of the script is not limited by the @MaxRunTimeInMinutes parameter.
The next portion of the script does the actual work. The dbEstimate table is looped through with a cursor. The cursor will only look at objects that contain data and where the compression ratio exceeds the value specified in the @minCompression parameter. If the @ExecuteCompressCommand parameter is 1, the script will execute a command to compress the object using whichever method gives the greatest compression ratio. If the @ExecuteCompressCommand is 0, it will not perform the compression and simply output the command instead. The output also includes a count of how many objects were compressed.
What happens if all the objects selected for compression do not get compressed before the allotted time is up? If you re-run the procedure, it will detect that the dbEstimate table already exists and, if there are still items than have not been processed, it will pick up compressing those, skipping the initial scan portion (which is often time consuming). If, however, all the items in the table have been processed, the script will wipe out the table and re-scan the entire database. This is useful if you have many tables and / or indexes being created.
If an object is already compressed, the script will not change the object from the existing compression method to the other method (i.e. from page to row or vice-versa), even if the other method may result in greater compression. (This might be something to look into for the next iteration of the script, but I'm not sure if this situation would occur often enough to make the effort worthwhile.)
The procedure is hard-coded to use the SORT_IN_TEMPDB option. It will also start compressing the smallest objects first. It does not compress using the ONLINE option, mainly due to the single-threaded limitation with heaps mentioned above. If you want this to run in online mode, add that option to lines 415 and 421 below. You can use this procedure to compress a large database over multiple days by running it for a couple hours overnight or during times of minimal usage.
Compressing the database will not automatically decrease the size of the data file on the disk. If you want to do that, you'll need to shrink the database (and perform the appropriate defragmentation process afterwards).
CREATE PROC [dbo].[db_compression]
(
@minCompression FLOAT = .25 /* e.g. .25 for minimum of 25% compression */
,@MaxRunTimeInMinutes INT = 60
,@ExecuteCompressCommand BIT = 0 /* 1 to execute command */
)
AS
SET nocount ON;
/*
Original script by Paul Nielsen www.SQLServerBible.com March 13, 2008
Modified by Shaun J. Stuart www.shaunjstuart.com February 27, 2013
Sets compression for all objects and indexs in the database needing adjustment
If estimated gain is equal to or greater than min compression parameter
then enables row or page compression, whichever gives greater space savings
If row and page have same gain then, enables row compression
If estimated gain is less than min compression parameter, then compression is set to none
- SJS 2/27/13
- Added @MaxRunTimeInMinutes to limit run length (checked afer each command, so
may run longer) Note: This timer only applies to the actual compression process.
It does not limit the time it takes to perform a scan of all the tables / indexes
and estimate compression savings.
- Sorted compression cursor to compress smallest tables first
- Added flag to execute compression command or not
- Added sort in tempdb option (always)
- Re-wrote slightly to persist initial scan results. Will only re-scan after all tables
have been processed
*/
DECLARE @CompressedCount INT;
SET @CompressedCount = 0;
DECLARE @StartTime DATETIME2;
SET @StartTime = CURRENT_TIMESTAMP;
CREATE TABLE #ObjEst
(
PK INT IDENTITY
NOT NULL
PRIMARY KEY
,object_name VARCHAR(250)
,schema_name VARCHAR(250)
,index_id INT
,partition_number INT
,size_with_current_compression_setting BIGINT
,size_with_requested_compression_setting BIGINT
,sample_size_with_current_compression_setting BIGINT
,sample_size_with_requested_compresison_setting BIGINT
);
IF NOT EXISTS ( SELECT 1
FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[dbEstimate]')
AND type IN (N'U') )
BEGIN
CREATE TABLE dbEstimate
(
PK INT IDENTITY
NOT NULL
PRIMARY KEY
,schema_name VARCHAR(250)
,object_name VARCHAR(250)
,index_id INT
,ixName VARCHAR(255)
,ixType VARCHAR(50)
,partition_number INT
,data_compression_desc VARCHAR(50)
,None_Size INT
,Row_Size INT
,Page_Size INT
,AlreadyProcessed BIT
);
END
/*
If all objects have been processed, rescan and start fresh. Useful for catching
added objects since last scan and / or finding objects that were compressed
initially with one method but now would do better with another method. But beware -
this is I/O intensive and can take a while.
*/
IF NOT EXISTS ( SELECT 1
FROM dbEstimate
WHERE AlreadyProcessed = 0 )
BEGIN
DELETE FROM dbEstimate;
INSERT INTO dbEstimate
(schema_name
,object_name
,index_id
,ixName
,ixType
,partition_number
,data_compression_desc
,AlreadyProcessed)
SELECT S.name
,o.name
,I.index_id
,I.name
,I.type_desc
,P.partition_number
,P.data_compression_desc
,0 AS AlreadyProcessed
FROM sys.schemas AS S
JOIN sys.objects AS O ON S.schema_id = O.schema_id
JOIN sys.indexes AS I ON o.object_id = I.object_id
JOIN sys.partitions AS P ON I.object_id = P.object_id
AND I.index_id = p.index_id
WHERE O.TYPE = 'U';
-- Determine Compression Estimates
DECLARE @PK INT
,@Schema VARCHAR(150)
,@object VARCHAR(150)
,@DAD VARCHAR(25)
,@partNO INT
,@indexID INT
,@ixName VARCHAR(250)
,@SQL NVARCHAR(MAX)
,@ixType VARCHAR(50)
,@Recommended_Compression VARCHAR(10);
DECLARE cCompress CURSOR FAST_FORWARD
FOR
SELECT schema_name
,object_name
,index_id
,partition_number
,data_compression_desc
FROM dbEstimate
WHERE AlreadyProcessed = 0;
OPEN cCompress;
FETCH cCompress INTO @Schema, @object, @indexID, @partNO, @DAD;
WHILE @@Fetch_Status = 0
BEGIN
/* evaluate objects with no compression */
IF @DAD = 'none'
BEGIN
/* estimate Page compression */
INSERT #ObjEst
(object_name
,schema_name
,index_id
,partition_number
,size_with_current_compression_setting
,size_with_requested_compression_setting
,sample_size_with_current_compression_setting
,sample_size_with_requested_compresison_setting)
EXEC sp_estimate_data_compression_savings
@Schema_name = @Schema
,@object_name = @object
,@index_id = @indexID
,@partition_number = @partNO
,@data_compression = 'page';
UPDATE dbEstimate
SET none_size = O.size_with_current_compression_setting
,page_size = O.size_with_requested_compression_setting
FROM dbEstimate D
JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
AND D.Object_name = O.object_name
AND D.index_id = O.index_id
AND D.partition_number = O.partition_number;
DELETE #ObjEst;
-- estimate Row compression
INSERT #ObjEst
(object_name
,schema_name
,index_id
,partition_number
,size_with_current_compression_setting
,size_with_requested_compression_setting
,sample_size_with_current_compression_setting
,sample_size_with_requested_compresison_setting)
EXEC sp_estimate_data_compression_savings
@Schema_name = @Schema
,@object_name = @object
,@index_id = @indexID
,@partition_number = @partNO
,@data_compression = 'row';
UPDATE dbEstimate
SET row_size = O.size_with_requested_compression_setting
FROM dbEstimate D
JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
AND D.Object_name = O.object_name
AND D.index_id = O.index_id
AND D.partition_number = O.partition_number;
DELETE #ObjEst;
END /* end evaluating objects with no compression */
/* evaluate objects with row compression */
IF @DAD = 'row'
BEGIN
/* estimate Page compression */
INSERT #ObjEst
(object_name
,schema_name
,index_id
,partition_number
,size_with_current_compression_setting
,size_with_requested_compression_setting
,sample_size_with_current_compression_setting
,sample_size_with_requested_compresison_setting)
EXEC sp_estimate_data_compression_savings
@Schema_name = @Schema
,@object_name = @object
,@index_id = @indexID
,@partition_number = @partNO
,@data_compression = 'page';
UPDATE dbEstimate
SET row_size = O.size_with_current_compression_setting
,page_size = O.size_with_requested_compression_setting
FROM dbEstimate D
JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
AND D.Object_name = O.object_name
AND D.index_id = O.index_id
AND D.partition_number = O.partition_number;
DELETE #ObjEst;
/* estimate None compression */
INSERT #ObjEst
(object_name
,schema_name
,index_id
,partition_number
,size_with_current_compression_setting
,size_with_requested_compression_setting
,sample_size_with_current_compression_setting
,sample_size_with_requested_compresison_setting)
EXEC sp_estimate_data_compression_savings
@Schema_name = @Schema
,@object_name = @object
,@index_id = @indexID
,@partition_number = @partNO
,@data_compression = 'none';
UPDATE dbEstimate
SET none_size = O.size_with_requested_compression_setting
FROM dbEstimate D
JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
AND D.Object_name = O.object_name
AND D.index_id = O.index_id
AND D.partition_number = O.partition_number;
DELETE #ObjEst;
END /* end evaluating objects with row compression */
/* evalutate objects with page compression */
IF @DAD = 'page'
BEGIN
/* estimate Row compression */
INSERT #ObjEst
(object_name
,schema_name
,index_id
,partition_number
,size_with_current_compression_setting
,size_with_requested_compression_setting
,sample_size_with_current_compression_setting
,sample_size_with_requested_compresison_setting)
EXEC sp_estimate_data_compression_savings
@Schema_name = @Schema
,@object_name = @object
,@index_id = @indexID
,@partition_number = @partNO
,@data_compression = 'row';
UPDATE dbEstimate
SET page_size = O.size_with_current_compression_setting
,row_size = O.size_with_requested_compression_setting
FROM dbEstimate D
JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
AND D.Object_name = O.object_name
AND D.index_id = O.index_id
AND D.partition_number = O.partition_number;
DELETE #ObjEst;
/* estimate None compression */
INSERT #ObjEst
(object_name
,schema_name
,index_id
,partition_number
,size_with_current_compression_setting
,size_with_requested_compression_setting
,sample_size_with_current_compression_setting
,sample_size_with_requested_compresison_setting)
EXEC sp_estimate_data_compression_savings
@Schema_name = @Schema
,@object_name = @object
,@index_id = @indexID
,@partition_number = @partNO
,@data_compression = 'none';
UPDATE dbEstimate
SET none_size = O.size_with_requested_compression_setting
FROM dbEstimate D
JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
AND D.Object_name = O.object_name
AND D.index_id = O.index_id
AND D.partition_number = O.partition_number;
DELETE #ObjEst;
END /* end evaluating objects with page compression */
FETCH NEXT FROM cCompress INTO @Schema, @object,
@indexID, @partNO, @DAD
END
CLOSE cCompress
DEALLOCATE cCompress
PRINT 'Initial scan complete.'
END
/* End evaluating compression savings. Now do the actual compressing. */
PRINT 'Beginning compression.';
/* Do not process objects that do not meet our compression criteria */
UPDATE dbEstimate
SET AlreadyProcessed = 1
WHERE (1 - (CAST(Row_Size AS FLOAT) / None_Size)) < @minCompression
AND (Row_Size <= Page_Size) AND None_Size > 0;
UPDATE dbEstimate
SET AlreadyProcessed = 1
WHERE (1 - (CAST(Page_Size AS FLOAT) / None_Size)) < @minCompression
AND (Page_Size <= Row_Size) AND None_Size > 0;
/* Do not set compression on empty objects */
UPDATE dbEstimate
SET AlreadyProcessed = 1
WHERE None_size = 0;
/* Do not process objects that are already using an acceptable compression method */
UPDATE dbEstimate
SET AlreadyProcessed = 1
WHERE (1 - (CAST(Page_Size AS FLOAT) / none_Size)) >= @minCompression
AND (Page_Size <= Row_Size) AND None_Size > 0
AND data_compression_desc = 'PAGE';
UPDATE dbEstimate
SET AlreadyProcessed = 1
WHERE (1 - (CAST(Row_Size AS FLOAT) / None_Size)) >= @minCompression
AND (Row_Size <= Page_Size) AND None_Size > 0
AND data_compression_desc = 'ROW';
/* Ignore tables that have not had their sizes calculated. This can occur if the procedure
execution was cancelled before the initial scan completed. */
UPDATE dbEstimate
SET AlreadyProcessed = 1
WHERE None_Size IS NULL
OR Row_Size IS NULL
OR Page_Size IS NULL;
-- set the compression
DECLARE cCompress CURSOR FAST_FORWARD
FOR
SELECT schema_name
,object_name
,partition_number
,ixName
,ixType
,CASE WHEN (1 - (CAST(Row_Size AS FLOAT) / none_Size)) >= @minCompression
AND (Row_Size <= Page_Size) THEN 'Row' WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression
AND (Page_Size <= row_Size) THEN 'Page'
ELSE 'None'
END AS Recommended_Compression
,PK
FROM dbEstimate
WHERE None_Size <> 0
AND (CASE WHEN (1 - (CAST(Row_Size AS FLOAT)
/ none_Size)) >= @minCompression
AND (Row_Size <= Page_Size) THEN 'Row' WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression
AND (Page_Size <= row_Size) THEN 'Page'
ELSE 'None'
END <> data_compression_desc)
AND AlreadyProcessed = 0
ORDER BY None_Size ASC; /* start with smallest tables first */
OPEN cCompress
FETCH cCompress INTO @Schema, @object, @partNO, @ixName, @ixType,
@Recommended_Compression, @PK -- prime the cursor;
WHILE @@Fetch_Status = 0
BEGIN
IF @ixType = 'Clustered'
OR @ixType = 'heap'
SET @SQL = 'ALTER TABLE ' + @Schema + '.' + @object
+ ' Rebuild with (data_compression = '
+ @Recommended_Compression + ', SORT_IN_TEMPDB=ON)';
ELSE
SET @SQL = 'ALTER INDEX ' + @ixName + ' on ' + @Schema
+ '.' + @object
+ ' Rebuild with (data_compression = '
+ @Recommended_Compression + ',SORT_IN_TEMPDB=ON)';
IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes
BEGIN
IF @ExecuteCompressCommand = 1
BEGIN
PRINT 'Compressing table/index: '
+ @Schema + '.' + @object;
EXEC sp_executesql
@SQL;
UPDATE dbEstimate
SET AlreadyProcessed = 1
WHERE PK = @PK;
SET @CompressedCount = @CompressedCount + 1;
END
ELSE
BEGIN
PRINT 'Command execution not enabled. Command is:'
+ @SQL;
END
END
ELSE
BEGIN
PRINT 'Max runtime reached. Exiting...';
BREAK
END
FETCH cCompress INTO @Schema, @object, @partNO, @ixName,
@ixType, @Recommended_Compression, @PK;
END
CLOSE cCompress;
DEALLOCATE cCompress;
PRINT 'Objects compressed: ' + CONVERT(VARCHAR(10), @CompressedCount);
DROP TABLE #ObjEst;
RETURN
