How To List Table Row Counts And Index Statistics

On getting an ETL performance testing system of some sort up and running, one of the Socratic questions that arises at some point is:

“When we run a workload, how much data is there in the database and how does it affect the workload itself?”

OK, so not as Socratic as “What is justice?”, as we can actually measure this one.

The starting point being, of course, to identify that data in the first place.
The snippets below provide a starting point for getting that data out of SQL Server.

Row Counts Only

The starting point can be to just list row counts of all tables in the target database.
The short example below will do just that.

-- tables only, clustered or heap, nothing else
SELECT
	S.name As SchemaName,
	T.name As TableName,
	SUM (P.rows) AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
WHERE
	T.type = 'U' -- user tables only
	AND P.index_id IN (0, 1) -- HEAP and CLUSTERED only
GROUP BY
	S.name,
	T.name
ORDER BY
	SchemaName,
	TableName
;

Rows Counts Per Partition

Next, it can be interesting to look at row counts per partition. This might be helpful if you’re running some crazy updates on specific partitions as part of your ETL process and you’d like to check whether it makes too big a difference to run those over low populated partitions vs highly populated ones.

-- tables only, clustered or heap, counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.rows AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
WHERE
	T.type = 'U' -- user tables only
	AND P.index_id IN (0, 1)-- HEAP and CLUSTERED only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber
;

Row Counts Per Partition & Index

It can also be interesting to see if ETL performance is affected by loads on top of tables with growing indexes, both in number and size.
This can go both ways, of course, but it also prepares for the next step.

-- tables, clustered or heap, plus indexes, counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.index_id AS IndexId,
	I.name AS IndexName,
	I.type_desc AS IndexType,
	P.rows AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
	INNER JOIN sys.indexes AS I
		ON I.object_id = P.object_id
		AND I.index_id = P.index_id
WHERE
	T.type = 'U' -- user tables only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber,
	IndexId
;

Row Counts Per Partition & Index, Plus Index Statistics

If we’re not yet content with the previous data, we can roll up our sleeves by digging into the index stats themselves.
Ask yourself, how many times did you see that T-SQL-done-ETL run incredibly slow and then magically fix itself when the underlying tables got rebuilt? As in REBUILD, really.
Fragmentation can have a big impact on any query, and ETL-ities are no exception.
The example below will grab basic fragmentation stats from all indexes so you can see if that’s a possible cause of red herrings in your workload.
Be careful running this query on a production system during business hours, it can quite heavy, especially if you pump up the call to sys.dm_db_index_physical_stats to increase the scan accuracy.

-- tables, clustered or heap, plus indexes, and physical stats counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.index_id AS IndexId,
	I.name AS IndexName,
	I.type_desc AS IndexType,
	P.rows AS Rows,
	PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
	PIS.index_depth AS IndexDepth,
	PIS.index_level AS IndexLevel,
	PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
	PIS.fragment_count AS IndexFragmentCount,
	PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
	PIS.page_count AS IndexPageCount
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
	INNER JOIN sys.indexes AS I
		ON I.object_id = P.object_id
		AND I.index_id = P.index_id
	CROSS APPLY sys.dm_db_index_physical_stats(
		/* database_id */ DB_ID(),
		/* object_id */ T.object_id,
		/* index_id */ I.index_id,
		/* partition_number */ P.partition_number,
		/* mode */ NULL) AS PIS
WHERE
	T.type = 'U' -- user tables only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber,
	IndexId
;

All For All

At some point you might also need to grab a snapshot of some or all the databases in your server for workload documentation purposes.
Doing this on all databases one by one can be time-consuming.
The example below lets you skip the RBAR pain and grab the same stats as the previous example, but for all the databases you need at once.
Remember to add a WHERE+LIKE filter on the first block to select only the database name patterns you want.
Again, be weary of running this on a production system during business hours.

-- list databases to loop
DECLARE @Databases TABLE
(
	DatabaseKey INT IDENTITY (1, 1) NOT NULL PRIMARY KEY CLUSTERED,
	DatabaseName SYSNAME NOT NULL
)
INSERT INTO @Databases (DatabaseName)
SELECT name FROM sys.databases;

-- this will hold the final result
IF OBJECT_ID('tempdb.dbo.#Data') IS NOT NULL
DROP TABLE #Data;

CREATE TABLE #Data
(
	DatabaseName SYSNAME NOT NULL,
	SchemaName SYSNAME NOT NULL,
	TableName SYSNAME NOT NULL,
	PartitionNumber INT NOT NULL,
	IndexId INT NOT NULL,
	IndexName SYSNAME NULL,
	IndexType SYSNAME NOT NULL,
	Rows BIGINT NOT NULL,
	IndexAllocationUnitType SYSNAME NULL,
	IndexDepth TINYINT NULL,
	IndexLevel TINYINT NULL,
	IndexAverageFragmentationPercent FLOAT NULL,
	IndexFragmentCount BIGINT NULL,
	IndexAverageFragmentSizeInPages FLOAT NULL,
	IndexPageCount BIGINT NULL,

	PRIMARY KEY CLUSTERED
	(
		DatabaseName,
		SchemaName,
		TableName,
		PartitionNumber,
		IndexId
	)
);

DECLARE @DatabaseKey INT = 1;
DECLARE @MaxDatabaseKey INT = (SELECT MAX(DatabaseKey) FROM @Databases);
WHILE @DatabaseKey <= @MaxDatabaseKey
BEGIN

	DECLARE @DatabaseName SYSNAME =
	(
		SELECT
			DatabaseName
		FROM
			@Databases
		WHERE
			DatabaseKey = @DatabaseKey
	);

	DECLARE @SQL NVARCHAR(MAX) = '
		INSERT INTO #Data
		(
			DatabaseName,
			SchemaName,
			TableName,
			PartitionNumber,
			IndexId,
			IndexName,
			IndexType,
			Rows,
			IndexAllocationUnitType,
			IndexDepth,
			IndexLevel,
			IndexAverageFragmentationPercent,
			IndexFragmentCount,
			IndexAverageFragmentSizeInPages,
			IndexPageCount
		)
		SELECT
			''' + @DatabaseName + ''' AS DatabaseName,
			S.name As SchemaName,
			T.name As TableName,
			P.partition_number AS PartitionNumber,
			P.index_id AS IndexId,
			I.name AS IndexName,
			I.type_desc AS IndexType,
			P.rows AS Rows,
			PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
			PIS.index_depth AS IndexDepth,
			PIS.index_level AS IndexLevel,
			PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
			PIS.fragment_count AS IndexFragmentCount,
			PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
			PIS.page_count AS IndexPageCount
		FROM
			' + QUOTENAME(@DatabaseName) + '.sys.tables AS T
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.schemas AS S
				ON S.schema_id = T.schema_id
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.partitions AS P
				ON P.object_id = T.object_id
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.indexes AS I
				ON I.object_id = P.object_id
				AND I.index_id = P.index_id
			CROSS APPLY sys.dm_db_index_physical_stats(
				/* database_id */ DB_ID(''' + @DatabaseName + '''),
				/* object_id */ T.object_id,
				/* index_id */ I.index_id,
				/* partition_number */ P.partition_number,
				/* mode */ NULL) AS PIS
		WHERE
			T.type = ''U'' -- user tables only
		ORDER BY
			SchemaName,
			TableName,
			PartitionNumber,
			IndexId
		;
		';

	EXECUTE (@SQL);

	SET @DatabaseKey += 1;

END

SELECT * FROM #Data;

How To Extract Serialized Data From String In One Go In T-SQL

Consider the following data:

DECLARE @OrderString VARCHAR(MAX) = '1=1;|2=|3=|4=1;|5=|';

This data was being returned in an externally generated XML file, so out of the control of the developer.

Each element of information is delimited by a pipe (|).
Within each element, the first number, to the left the equal sign (=), represented an “Item Number”.
The second number, to the right of the equal sign, represented an “Order Quantity”.
This could be any number, and if empty, it meant zero.

The developer needed to return this data in form of a table.
He also needed to return only the rows with an Item Quantity greater than zero.

In T-SQL, this conversion can be done in one go, using a single query.
Let’s go through the steps.

Remove Unnecessary Characters

First of all, if take a look at the source data, we can clearly see an oddity there.
The semi-colon character seems to appear only after and when there is a declared Item Value.
This is extra information we don’t need, as we already have each block delimited by the pipe character.
So let’s take this out of the equation:

WITH Step0 AS
(
	SELECT
		String = REPLACE(@OrderString, ';', '')
),

Identify Elements

Now, to make our lives easier, let’s split each element, or, “block” into a separate row.
This can be accomplished with a good old recursive CTE:

Step1 AS
(
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step0

	UNION ALL
	
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step1
	WHERE
		LEN(String) > 0
),

Identify Columns

Now that we have a row identified for each block, we can then extract the columns themselves:

Step2 AS
(
	SELECT
		ItemNumber = SUBSTRING(Block, 1, CHARINDEX('=', Block) - 1),
		ItemValue = SUBSTRING(Block, CHARINDEX('=', Block) + 1, LEN(Block))
	FROM
		Step1
),

Convert & Filter

Finally, now with proper columns in place, we can filter out the zero quantity items:

Step3 AS
(
	SELECT
		ItemNumber = CAST(ItemNumber AS INT),
		ItemValue = CAST(ItemValue AS INT)
	FROM
		Step2
	WHERE
		CAST(ItemValue AS INT) > 0
)

SELECT
	*
FROM
	Step3;

And that was it. Nothing to it.

Full Code

DECLARE @OrderString VARCHAR(MAX) = '1=1;|2=|3=|4=1;|5=|';

WITH Step0 AS
(
	SELECT
		String = REPLACE(@OrderString, ';', '')
),
Step1 AS
(
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step0

	UNION ALL
	
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step1
	WHERE
		LEN(String) > 0
),

Step2 AS
(
	SELECT
		ItemNumber = SUBSTRING(Block, 1, CHARINDEX('=', Block) - 1),
		ItemValue = SUBSTRING(Block, CHARINDEX('=', Block) + 1, LEN(Block))
	FROM
		Step1
),

Step3 AS
(
	SELECT
		ItemNumber = CAST(ItemNumber AS INT),
		ItemValue = CAST(ItemValue AS INT)
	FROM
		Step2
	WHERE
		CAST(ItemValue AS INT) > 0
)

SELECT
	*
FROM
	Step3;

8 DBCC commands you’ll want to know

DBCC commands can be really useful while debugging your SQL Server. Below I’ve listed 8 commands you’ll want to keep in your pocket.

DBCC DROPCLEANBUFFERS

DBCC DROPCLEANBUFFERS

Amazingly does what is says on the tin. Dumps all pages from memory. Great for performance testing. Try not to do this on a production system unless you enjoy answering support calls from annoyed users.

DBCC SHRINKDATABASE

DBCC SHRINKDATABASE (N'MyDatabase', 10 /* = Percent To Leave Free */);

Also does what is says on the tin. This will shrink all files in the target database (plus the log) and release unused space from files back to the operation system. This works by blindly moving pages from the end of the file up the free spaces in the beginning of the file, without any consideration for consolidation of your data. Great if you enjoy increasing fragmentation like a maniac, or you let that development database grow a wee bit too much.

If you do have to run it, consider DBCC SHRINKFILE instead and do it one file at a time, so you don’t have to come back the next day.
It also might be a good idea to run a full re-index too on your database, to get those pages back in shape again.

DBCC SHRINKFILE

DBCC SHRINKFILE ('SomeFile', 10 /* = Target File Size In MB */)

Same as DBCC SHRINKDATABASE but for a specific file. Same warnings apply. Best to use this one file at a time. But then again, best to pre-allocate your database files to begin with, if you’re able to.

DBCC FREEPROCCACHE

DBCC FREEPROCCACHE

Dumps all the cached query plans. Useful for progressive load testing when paired with DBCC DROPCLEANBUFFERS. Also useful if SQL Server has gone bonkers and cached a really nasty plan for some query. If this does happen, check if your statistics are updated, as the Query Optimizer usually knows best – at least with the information it has.

DBCC CHECKDB

DBCC CHECKDB /* runs on the current database */
DBCC CHECKDB(MyDatabase) /* runs on the target database */
DBCC CHECKDB(MyDatabase, REPAIR_REBUILD) /* tries to repair the database, if only recoverable corruption has occurred */
DBCC CHECKDB(MyDatabase, REPAIR_ALLOW_DATA_LOSS) /* tries to repair the database, even if data loss occurs */

Verifies database consistency and corruption.
Errors get returned in nice red messages where you can find “etc etc etc Page (1:789) etc etc etc”.
The 1 refers to the number of the file containing the fault.
The 789 is the faulty page number in the referred file.
Recovering from this feels like theoretical physics from what I’ve seen out there, and seems to range from letting CHECKDB run its course, to selectively retrieve rows from specific backups.

DBCC SQLPERF

Shows what your log file usage looks like. Useful to see if some database is getting ready to blow up due to lack of log space, especially if you have fixed log file size.

DBCC SQLPERF(LOGSPACE)

DBCC INPUTBUFFER

Now this will work magic. If you give it a SPID, it will show you the last query ran by that SPID.
Great quick and dirty way to figure out what query is blocking some other process.

DBCC INPUTBUFFER(<SPID>)

To lookup a list of running processes, you can, for example, run:

SELECT * FROM sys.sysprocesses

DBCC HELP

For a refresher on what DBCC stuff you can do, DBCC HELP will provide information on the usage of specific DBCC commands, plus a list of available commands to start with… that have help available, at least.

For a list of stuff just run:

DBCC HELP ('?')

To get help on say, INPUTBUFFER, just run:

DBCC HELP ('INPUTBUFFER')

How To Easily Loop And Index In PowerShell

Just learned these bits today while trying to answer something and thought to share them.

Consider these two arrays:

$x = @("a","b","c","d");
$y = @("apple","banana","carrot","dinner");

What I want to see as output is:

a is for apple
b is for banana
c is for carrot
d is for dinner

There are a couple of ways of going about this and then some.

You can use a regular for loop:

for ($i = 0; $i -Lt $x.Count; ++$i)
{
    "$($x[$i]) is for $($y[$i])";
}

You can go with a regular foreach loop too this way, if the values in $x are distinct – we’ll get around that in a second:

foreach ($v in $x)
{
    "$v is for $($y[$x.IndexOf($v)])"
}

You can shorten it by using a ForEach-Object CmdLet:

$x | ForEach-Object { "$_ is for $($y[$x.IndexOf($_)])" };

You can shorten the ForEach-Object into the % operator:

$x | % { "$_ is for $($y[$x.IndexOf($_)])" };

And finally, you can fix the issue of indexing duplicates by adding an initializer to the foreach and treating it just like a for:

$x | % { $i = 0 } { "$_ is for $($y[$i])"; ++$i };

You learn something new every day.


Index: PowerShell HowTo’s


How To Create Environment-Aware DacPac Deployment Scripts In PowerShell

I was made “aware” of the idea of environment-aware scripts by a colleague of mine a while ago on a project.

Listening to the principles, I wondered why hadn’t I think about this myself before. It sounded so darn basic, yet so masterfully effective. I guess the more I learn about stuff, the more I realize I don’t know anything at all. But that’s a good thing, right? Surely realizing one does not know stuff helps one keep an open-mind? But anyways, less philosophy, more coding.

An environment-aware script is a script that is aware of its environment.
Ah, do I hear reflexive definition alarm bells ringing? Or is it my imagination?

No, it is actually that simple.
As an example, an environment-aware script may:

  1. Figure out where it is being run upon. Machine name is a likely candidate in a corporate environment as we usually know them before hand.
  2. Decide whether the machine name is an expected target for the script being run.
  3. If so, correctly set itself up for the particularities of that environment and go on to doing its thing.
  4. If not, gracefully (well, with a human readable error) reply “Computer Says No. I don’t know what this machine is and therefore I’m not prepared to go ahead.”.

Environment-aware scripts or processes are all over the place. Take a Windows installation, it’s aware of the properties of the machine it is being installed upon. It will model installation of its binaries depending your hardware. Machines with Intel CPUs will get one set of drivers while AMD machines will get another. But it will still work. However, try to install an x64 release on an x86 system and see how that works. Computer says no. But it says “no” and tells you why, it doesn’t just error out randomly.

This is so ubiquitous, I really wondered why I hadn’t though about this earlier.
Kudos to my colleague for reminding me of the basics.

Below is a very basic example of such a script in PowerShell.
It will figure out where it is being run at, pick the appropriate configurations and run a deployment of the configured DacPac on the current machine.
If the machine is not recognized it will gracefully fail.
Well, in this case, it will throw a very red error, so it can get trapped by its caller.

Essentially, you can tag along a script like this with your other deployment files to make your life easier.

<# 
.SYNOPSIS 
    Sample script for an environment aware SSDT deployment.
.DESCRIPTION 
.NOTES     
.LINK
.EXAMPLE
    Deploy -Action Test
.EXAMPLE 
    Deploy -Action Publish
.INPUTTYPE 
.RETURNVALUE 
.PARAMETER P1
.PARAMETER P2
#>

Param
(
    [Parameter(Mandatory = $True)]
    [ValidateSet("Test", "Publish")]
    [string] $Action
)

# configure the supported environments here
# add a hash table block for each environment
# the script will refuse to work in any other environment
$Environments =
    @{
        Name = "ENV1";
        ServerName = "MACHINE1";
        PackageFile = "MyDatabase.dacpac";
        PublishProfile = "MyDatabase.ENV1.publish.xml";
        DeployReportFile = "MyDatabase.ENV1.DeployReport.xml";
        ScriptFile = "MyDatabase.ENV1.Script.sql";
        SqlPackage = "${Env:ProgramFiles(x86)}Microsoft SQL Server110DACbinSqlPackage.exe";
    },
    @{
        Name = "ENV2";
        ServerName = "MACHINE2";
        PackageFile = "MyDatabase.dacpac";
        PublishProfile = "MyDatabase.ENV2.publish.xml";
        DeployReportFile = "MyDatabase.ENV2.DeployReport.xml";
        ScriptFile = "MyDatabase.ENV2.Script.sql";
        SqlPackage = "${Env:ProgramFiles(x86)}Microsoft SQL Server110DACbinSqlPackage.exe";
    };

# detect the environment this script is being run on
$ServerName = $Env:COMPUTERNAME

foreach ($Environment in $Environments)
{
    if ($Environment.ServerName -Eq $ServerName)
    {
        Write-Host "$($Environment.Name) Environment Identified.";
        
        # locate SqlPackage
        Write-Host "Locating SqlPackage...";
        $SqlPackage = $Environment.SqlPackage;

        # ensure SqlPackage exists
        if (Test-Path $SqlPackage)
        {
            Write-Host "Using SqlPackage @ $SqlPackage";
        }
        else
        {
            throw "SqlPackage not found!";
        }
        
        if ($Action -Eq "Test")
        {
            Write-Host "Running DeployReport Action...";
            &$SqlPackage "/Action:DeployReport", "/SourceFile:$($Environment.PackageFile)", "/Profile:$($Environment.PublishProfile)", "/OutputPath:$($Environment.DeployReportFile)";
            
            Write-Host "Running Script Action...";
            &$SqlPackage "/Action:Script", "/SourceFile:$($Environment.PackageFile)", "/Profile:$($Environment.PublishProfile)", "/OutputPath:$($Environment.ScriptFile)";
        }
        elseif ($Action -Eq "Publish")
        {
            Write-Host "Running Publish Action...";
            &$SqlPackage "/Action:Publish", "/SourceFile:$($Environment.PackageFile)", "/Profile:$($Environment.PublishProfile)";
        }
        else
        {
            throw "Action $Action is not supported";
        }
        
        Write-Host "Action Completed.";
        Write-Host "Exiting...";
        
        return;
    }
}

throw "Deployment is not supported on this machine ($ServerName)";

If you save this script as, say, DeployStuff.ps1, you’ll have two ways of running this example:

In Test mode, it will compare the DacPac against the database and output the XML DeployReport and the SQL Publish Script onto the same folder.
You can check these to ensure the deployment will go the way you want it to go, and no unexpected wizardry will happen.

.DeployStuff -Action Test

In Publish mode, it will publish the DacPac outright, no more questions asked.

.DeployStuff -Action Publish

Output of the entire process goes to the caller (e.g. the PowerShell window here), so remember to adjust this to what best fits your needs.


Index: PowerShell HowTo’s


How To Access Environment Variables In PowerShell

Because I can never seem to remember this stuff.

Basics

Magical Environment Variable:

$Env

Doesn’t do much on its own though.

List all available properties:

Get-ChildItem $Env:

Note to self: Remember to put that blasted colon at the end, and it will save you hours of figuring out why it lists your current folders instead.

Notable Properties

Get the local computer name:

$Env:COMPUTERNAME

Get the processor attributes:

$Env:NUMBER_OF_PROCESSORS
$Env:PROCESSOR_ARCHITECTURE
$Env:PROCESSOR_IDENTIFIER
$Env:PROCESSOR_LEVEL
$Env:PROCESSOR_REVISION

Get the Operating System type:

$Env:OS

Get the executable search path (just like PATH in DOS):

$Env:PATH

Get the TEMP path:

$Env:TEMP

Get user related information:

$Env:USERDNSDOMAIN
$Env:USERDOMAIN
$Env:USERNAME
$Env:USERPROFILE

Get the program files folders:

${Env:ProgramFiles}
${Env:ProgramFiles(x86)}

Index: PowerShell HowTo’s


How To Estimate The Compression Ratio For All Tables In Database

Wondering how much space you could save by compression all those disk eating tables?

This snippet will look at all the tables in your database and estimate how much you would save by applying ROW or PAGE compression on them.
There is nothing special about to it, it just makes use of the sp_estimate_data_compression_savings on all your user tables and organizes the results.

The Code

SET NOCOUNT ON;

DECLARE @Objects TABLE
(
	ItemKey INT IDENTITY (1, 1) PRIMARY KEY CLUSTERED,
	SchemaName SYSNAME NOT NULL,
	ObjectName SYSNAME NOT NULL,
	IndexID INT NOT NULL,
	PartitionNumber INT NOT NULL,
	CurrentCompressionType SYSNAME NOT NULL,
	EstimatedSizeWithNoCompressionKB INT NULL,
	EstimatedSizeWithRowCompressionKB INT NULL,
	EstimatedSizeWithPageCompressionKB INT NULL,
	EstimatedCompressionRatioRowVsNone AS 100 - ((EstimatedSizeWithRowCompressionKB * 100) / (NULLIF(EstimatedSizeWithNoCompressionKB, 0))),
	EstimatedCompressionRatioPageVsNone AS 100 - ((EstimatedSizeWithPageCompressionKB * 100) / (NULLIF(EstimatedSizeWithNoCompressionKB, 0)))
);

INSERT INTO @Objects
(
	SchemaName,
	ObjectName,
	IndexID,
	PartitionNumber,
	CurrentCompressionType
)
SELECT
	S.name,
	T.name,
	I.index_id,
	P.partition_number,
	P.data_compression_desc
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.indexes AS I
		ON I.object_id = T.object_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
		AND P.index_id = I.index_id
WHERE
	T.type = 'U'
ORDER BY
	S.name,
	T.name,
	I.index_id,
	P.partition_number
;

DECLARE @Results TABLE
(
	ObjectName SYSNAME NOT NULL,
	SchemaName SYSNAME NOT NULL,
	IndexID INT NOT NULL,
	PartitionNumber INT NOT NULL,
	SizeWithCurrentCompression INT NOT NULL,
	SizeWithRequestedCompression INT NOT NULL,
	SampleSizeWithCurrentCompression INT NOT NULL,
	SampleSizeWithRequestedCompression INT NOT NULL
);

DECLARE @ItemKey INT = 1;
DECLARE @LastKey INT = (SELECT MAX(ItemKey) FROM @Objects);
WHILE @ItemKey <= @LastKey
BEGIN

	DECLARE @SchemaName SYSNAME, @ObjectName SYSNAME, @IndexID INT, @PartitionNumber INT;
	SELECT
		@SchemaName = SchemaName,
		@ObjectName = ObjectName,
		@IndexID = IndexID,
		@PartitionNumber = PartitionNumber
	FROM
		@Objects
	WHERE
		ItemKey = @ItemKey;



	DELETE FROM @Results;
	INSERT INTO @Results
	EXECUTE sys.sp_estimate_data_compression_savings
		@schema_name = @SchemaName,
		@object_name = @ObjectName,
		@index_id = @IndexID,
		@partition_number = @PartitionNumber,
		@data_compression = 'NONE';
	UPDATE O
	SET
		O.EstimatedSizeWithNoCompressionKB = R.SizeWithRequestedCompression
	FROM
		@Objects AS O
		CROSS JOIN @Results AS R
	WHERE
		O.ItemKey = @ItemKey
		

	DELETE FROM @Results;
	INSERT INTO @Results
	EXECUTE sys.sp_estimate_data_compression_savings
		@schema_name = @SchemaName,
		@object_name = @ObjectName,
		@index_id = @IndexID,
		@partition_number = @PartitionNumber,
		@data_compression = 'ROW';
	UPDATE O
	SET
		O.EstimatedSizeWithRowCompressionKB = R.SizeWithRequestedCompression
	FROM
		@Objects AS O
		CROSS JOIN @Results AS R
	WHERE
		O.ItemKey = @ItemKey



	DELETE FROM @Results;
	INSERT INTO @Results
	EXECUTE sys.sp_estimate_data_compression_savings
		@schema_name = @SchemaName,
		@object_name = @ObjectName,
		@index_id = @IndexID,
		@partition_number = @PartitionNumber,
		@data_compression = 'PAGE';
	UPDATE O
	SET
		O.EstimatedSizeWithPageCompressionKB = R.SizeWithRequestedCompression
	FROM
		@Objects AS O
		CROSS JOIN @Results AS R
	WHERE
		O.ItemKey = @ItemKey

	SET @ItemKey += 1;
END

SELECT
	ItemKey,
	SchemaName,
	ObjectName,
	IndexID,
	PartitionNumber,
	CurrentCompressionType,
	EstimatedSizeWithNoCompressionKB,
	EstimatedSizeWithRowCompressionKB,
	EstimatedSizeWithPageCompressionKB,
	EstimatedCompressionRatioRowVsNone,
	EstimatedCompressionRatioPageVsNone
FROM
	@Objects
ORDER BY
	ItemKey
;

Thoughts

Remember to evaluate CPU usage when choosing a compression type and be weary of it if you’re doing big loads into a table in an ETL process of some sort.

How To Mock Database Objects For (Almost) Isolated Unit Testing In SSDT Database Projects

While writing this Stored Procedure I wanted to properly unit test in SSDT, I faced this annoying little issue:

  • My Stored Procedure, let’s call it MainProcedure, had to call another Stored Procedure, let’s call it SubProcedure.
  • The SubProcedure had some dodgy and time-consuming side-effects that I didn’t want to see happen during a Unit Test. Plus, the SubProcedure was not the one being tested anyway.
  • Considering this, I wanted to isolate the code in MainProcedure using a Dependency Injection / Inversion Of Control pattern somehow.
  • Unfortunately, this is not supported out-of-the-box in SSDT, plus to be fair, IoC is not something the T-SQL language lends itself to easily at all: You can’t just pass a type-safe reference to a Stored Procedure in your parameters.

After some head scratching and searching the web for what wasn’t there, I arrived at a couple of options:

  • Don’t do anything and live with it. It’s always an option.
  • Mimic an IoC pattern by passing in a name for a dependent stored procedure… And then end up writing loads of dynamic SQL, thereby undermining the point of using SSDT in the first pace.
  • Drop the SubProcedure before running the test, create a fake and then issue the original CREATE statement again… And end up with loads of code copied all over the place… And what if wanted to mock tables?
  • Or swap the underlying SubProcedure before the test with a stub/fake and then swap it back after completion. Definitely cheating… Definitely Not IoC… But I wondered if it worked…

So let’s see what happened.

Main Procedure

First of all, I created some sample code to represent the MainProcedure.
Notice how this sample is calling the SubProcedure by passing through its received parameters and its return value. This will help us evaluate whether the correct code is being called later on.

CREATE PROCEDURE [dbo].[MainProcedure]
	@Param1 INT,
	@Param2 INT
AS

PRINT 'Running MainProcedure...';

PRINT 'Calling SubProcedure...';

DECLARE @ReturnValue INT;
EXECUTE @ReturnValue = dbo.SubProcedure
	@Param1 = @Param1,
	@Param2 = @Param2;

PRINT 'Done Calling SubProcedure...';

PRINT 'Done Running MainProcedure...';

RETURN @ReturnValue;

Sub Procedure

Then I created some sample code for the SubProcedure.
The WAITFOR was just to make it really obvious the sluggish procedure was being called, you can take it out if you wish.

CREATE PROCEDURE [dbo].[SubProcedure]
	@Param1 INT,
	@Param2 INT
AS

	PRINT 'Running SubProcedure...';

	/* do something really heavy here */
	WAITFOR DELAY '00:00:05';
	/* done */

	PRINT 'Done Running SubProcedure...';

RETURN 0

Test

Now it was the time to see if I could swap the SubProcedure with a mock.
I created an SSDT Unit Test for the MainProcedure, and then went on to write some testing code.

Below is what I put on the Test code block, to start with.
I also added a Scalar Value Condition to the Test, to verify that the @ResultValue would return the value 30. I would then make the Mock SubProcedure return this as opposed to the simple zero of the original code. This would provide evidence that the mock code was being called.

SET XACT_ABORT ON
BEGIN TRANSACTION

BEGIN TRY

	-- ARRANGE
	DECLARE @Param1 INT = 10;
	DECLARE @Param2 INT = 20;
	DECLARE @ResultValue INT;

	-- ACT
	EXECUTE @ResultValue = dbo.MainProcedure
		@Param1 = @Param1,
		@Param2 = @Param2;

	-- ASSERT
	SELECT @ResultValue;

END TRY
BEGIN CATCH

	ROLLBACK;
	THROW;

END CATCH;

Pre-Test

So now it was time for some good old cheating.
I stuffed the code below in the Pre-Test phase of this Unit Test.
This creates a Mock at run time, just like you would do in plain .NET Unit Tests, and then swaps it with the original code, whilst backing it up too.
Notice we have to resort to using Dynamic SQL here (sigh…) because we can’t include multiple clean batches in a SQL Server Unit Test. (e.g. no GO for you)

SET XACT_ABORT ON
BEGIN TRANSACTION

BEGIN TRY

	-- create a mock object
	EXECUTE sp_executesql N'
	CREATE PROCEDURE dbo.SubProcedure_Mock
		@Param1 INT,
		@Param2 INT
	AS
		PRINT ''Mock SubProcedure Called With { @Param1 = '' + CAST(@Param1 AS VARCHAR(10)) + '', @Param2 = '' + CAST(@Param2 AS VARCHAR(10)) + '' }'';
		RETURN @Param1 + @Param2;
	';

	-- swap out the original object
	EXECUTE sp_rename 'SubProcedure', 'SubProcedure_Original';

	-- swap int the mock object
	EXECUTE sp_rename 'SubProcedure_Mock', 'SubProcedure';

END TRY
BEGIN CATCH

	ROLLBACK;
	THROW;

END CATCH

COMMIT;

Post-Test

To undo the blatant cheating of the Pre-Test phase, I also added the code below to the Post-Test phase.
This swaps in the original code and then drops the mocked procedure that was created earlier.
I couldn’t help but to keep recalling a known issue with the Post-Test phase though: it wouldn’t run if the Unit Test code failed in the first place.
But I would get back to that in a minute.

SET XACT_ABORT ON
BEGIN TRANSACTION

BEGIN TRY

	-- swap out the mock object
	EXECUTE sp_rename 'SubProcedure', 'SubProcedure_Mock';

	-- swap in the original object
	EXECUTE sp_rename 'SubProcedure_Original', 'SubProcedure';

	-- remove the mock object
	EXECUTE sp_executesql N'DROP PROCEDURE dbo.SubProcedure_Mock';

END TRY
BEGIN CATCH

	ROLLBACK;
	THROW;

END CATCH;

COMMIT;

Results

This actually worked better than expected.
When I ran the MainProcedure on its own in SSMS, like so:

-- ARRANGE
DECLARE @Param1 INT = 10;
DECLARE @Param2 INT = 20;
DECLARE @ResultValue INT;

-- ACT
EXECUTE @ResultValue = dbo.MainProcedure
	@Param1 = @Param1,
	@Param2 = @Param2;

I got the results I was expecting:

Running MainProcedure...
Calling SubProcedure...
Running SubProcedure...
Done Running SubProcedure...
Done Calling SubProcedure...
Done Running MainProcedure...

However, when running the all the test phases together (by copy/pasting from the Unit Test Editor), these were the results:

Running MainProcedure...
Calling SubProcedure...
Mock SubProcedure Called With { @Param1 = 10, @Param2 = 20 }
Done Calling SubProcedure...
Done Running MainProcedure...

It was working! This unclean cheating was actually working!
Running the Test itself in Visual Studio also proved to work, and with Scalar Value Condition being satisfied too.

Test Cleanup

Now there was just one issue to resolve. It is a fact that the in the Post-Test will only run if your test does not throw errors or fails execution otherwise. I tested this to be true by changing the code of MainProcedure to THROW a fake error:

CREATE PROCEDURE [dbo].[MainProcedure]
	@Param1 INT,
	@Param2 INT
AS

PRINT 'Running MainProcedure...';

PRINT 'Calling SubProcedure...';

DECLARE @ReturnValue INT;
EXECUTE @ReturnValue = dbo.SubProcedure
	@Param1 = @Param1,
	@Param2 = @Param2;

THROW 50000, 'Something went wrong...', 1;

PRINT 'Done Calling SubProcedure...';

PRINT 'Done Running MainProcedure...';

RETURN @ReturnValue;

Running the Unit Test now left me with a left-over SubProcedure_Original object in the database, also causing subsequent test runs to fail.
However, I didn’t want to bring the Pre-Test and Post-Test code together to the Test block, as these phases are meant to be run under different user credentials.
Fortunately, this is exactly what the Test Cleanup phase is for.
This phase will always run after all the Unit Tests in a given file, regardless if they fail or not. That’s the point of it: To clean up the mess!

So I added the code below to the Test Cleanup phase:

SET XACT_ABORT ON
BEGIN TRANSACTION

BEGIN TRY

	-- check if the original objects exists as a backup
	IF OBJECT_ID('SubProcedure_Original') IS NOT NULL
	BEGIN

		-- check if the current object is present and drop it
		IF OBJECT_ID('SubProcedure') IS NOT NULL
		BEGIN
			EXECUTE sp_executesql N'DROP PROCEDURE SubProcedure';
		END

		-- swap in the original object
		EXECUTE sp_rename 'SubProcedure_Original', 'SubProcedure';

	END

END TRY
BEGIN CATCH

	ROLLBACK;
	THROW;

END CATCH;

COMMIT;

And voilà! Everything now ran perfectly!

Thoughts

SSDT Unit Testing is getting there. It doesn’t yet go head-to-head with the full functionality of tSQLt, like its dependency isolation capabilities (which, technically, are glorified cheating mechanisms), but it’s definitely getting there, and I hope some easier mocking abilities will get added soon.

However, it looks likely that we’ll never see true IoC development in T-SQL unless some major restructuring is done to the language itself. And why break something that works? That doesn’t mean we can’t get stuff done though.

How To List SQL Server Instances In PowerShell

How? As easy as running the code below:

[System.Data.Sql.SqlDataSourceEnumerator]::Instance.GetDataSources() | Format-Table -Auto

And you’ll be presented with something like this (shown names are fake, of course):

ServerName InstanceName IsClustered Version     
---------- ------------ ----------- -------     
SERVER01                No          10.50.1600.1
SERVER02   INSTANCE01   No          11.0.3000.0 
SERVER03   INSTANCE02   No          11.0.3000.0 

Which is a regular PowerShell object, so you’re free to mangle it at will.

If this doesn’t work for you, you can try using SMO:

$Current = Get-Location;
Import-Module SQLPS -DisableNameChecking;
Set-Location $Current;

[Microsoft.SqlServer.Management.Smo.SmoApplication]::EnumAvailableSqlServers()

And this still doesn’t work for you, or if you’re trying to query a computer not on a network connection, you can try WMI too:

[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SqlWmiManagement") | Out-Null;

$MC = New-Object Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer("COMPUTER_NAME_HERE");
foreach ($Instance in ($MC.Services | Where-Object { $_.Type -Eq "SqlServer" }))
{
    Write-Host $Instance.Name;
}

The WMI option has the advantage of listing other services too if you need, like SSAS, SSRS, SQL Agent, etc.


Index: PowerShell HowTo’s