Query Plans In The Race Of Extended Events

In the previous articles we took a look at some simple ways to hunt for those valuable query plans and then at how we can accomplish the same using SQL Trace. There is yet another way of discovering and extracting query plans from an in-use SQL Server and that is to use Extended Events.

Continue reading “Query Plans In The Race Of Extended Events”

Advertisements

SQL Trace And The Stream Of Query Plans

In 8 Ways To Hunt T-SQL Query Plans, we looked at some simple ways to get our hands on those valuable query plans. However, sometimes we might want to record query plans as part of a bigger effort to profile and troubleshoot a server, or perhaps optimize it for a given workload.

When this need arises, there are two additional tools we can make use of:

  • SQL Trace
  • Extended Events

In this pumpkin we’ll take a look at how we can use SQL Trace to accomplish this goal.

Continue reading “SQL Trace And The Stream Of Query Plans”

8 Ways To Hunt T-SQL Query Plans

In How Can Query Plans Help Us? we looked at how query plans can be made to help us troubleshoot a runaway query. Then in Are Query Plans All We Need? we looked at other tools we need to keep ready in our performance tuning utility belt. With that in mind, it is now time to understand how we can capture query plans to begin with.

So Where Do We Start?

To understand how we can capture query plans, we must first understand what types of query plans are there to capture. Luckily, there is only two of them.

Continue reading “8 Ways To Hunt T-SQL Query Plans”

Are Query Plans All We Need?

In How Can Query Plans Help Us?, we looked at the nature of T-SQL and then at some of the benefits that understanding query plans can bring us. But are query plans a silver bullet? Can we overcome all optimization challenges with query plans only or are there other tools we need on our utility belt?

Continue reading “Are Query Plans All We Need?”

How Can Query Plans Help Us?

In T-SQL Query Plan Analysis Prelude we looked at some common reasons why queries can sometimes go on a snail speed contest and that query plans are one of the ways we can know how to transform those carefree molluscs into caffeinated cheetahs.

So how exactly can query plans help us do that? Are they like magic wands that turn over-ripe pumpkins into dubious-expiry carriages and make used teeth disappear in return for equally used pennies?

Well… Hold that thought for a minute.

Continue reading “How Can Query Plans Help Us?”

T-SQL Query Plan Analysis Prelude

Have you ever found yourself pulling your own hair with both hands, desperate to understand why that stubborn T-SQL query runs so desperately slow? Were you intrigued by the cool if complex visuals of its query plan? Did you try to turn its knowledge into insight but couldn’t figure out a Hash Match from a Nested Loop?

Query Plan Example

If so, this series might be for you.

Continue reading “T-SQL Query Plan Analysis Prelude”

How To Add NonClustered ColumnStore Indexes To Every Table

Sometimes I feel really lazy. I’m just tinkering with a proof of concept database, a decent amount of test data, I just want to do some moderately fast querying over it and don’t really feel like going about creating proper indexes for everything just yet. I really only want to slap a ColumnStore on everything at once so I can get on with work and run some fast questions over the data.

Luckily, that’s what the cheat script below is for:

/* set the prefix for the indexes here */
DECLARE @IndexPrefix NVARCHAR(MAX) = 'TEST_NCCSI_';

/* dont worry about these */
DECLARE @SQL NVARCHAR(MAX) = '';
DECLARE @NL NVARCHAR(MAX) = NCHAR(13) + NCHAR(10);
DECLARE @TB NVARCHAR(MAX) = NCHAR(9);

/* and off we go */
WITH Query AS
(
	SELECT
		SchemaId = S.schema_id,
		SchemaName = S.name,
		TableId = T.object_id,
		TableName = T.name,
		ColumnId = C.column_id,
		ColumnName = C.name,
		IsFirstColumn = CASE WHEN C.column_id = MIN(C.column_id) OVER (PARTITION BY S.schema_id, T.object_id) THEN 1 ELSE 0 END,
		IsLastColumn = CASE WHEN C.column_id = MAX(C.column_id) OVER (PARTITION BY S.schema_id, T.object_id) THEN 1 ELSE 0 END
	FROM
		sys.tables AS T
		INNER JOIN sys.schemas AS S
			ON S.schema_id = T.schema_id
		INNER JOIN sys.columns AS C
			ON C.object_id = T.object_id
)
SELECT
	@SQL +=

		/* add ddl statement with first column */
		CASE WHEN Q.IsFirstColumn = 1 THEN
				'CREATE NONCLUSTERED COLUMNSTORE INDEX ' + QUOTENAME(@IndexPrefix + Q.TableName) + ' ON ' + QUOTENAME(Q.SchemaName) + '.' + QUOTENAME(Q.TableName) + @NL +
				'('
		ELSE
			''
		END

		/* add column name */
		+
		@NL + @TB + QUOTENAME(Q.ColumnName)

		/* add comma after interim columns */
		+
		CASE WHEN Q.IsLastColumn = 0 THEN ',' ELSE '' END

		/* close ddl after last column */
		+
		CASE WHEN Q.IsLastColumn = 1 THEN @NL + ')' + @NL + @NL ELSE '' END
FROM
	Query AS Q
ORDER BY
	SchemaId,
	TableId,
	ColumnId
;

SELECT @SQL;
PRINT @SQL;

Note that this straightforward script doesn’t try to partition align the indexes, detect invalid columns or any other fancy stuff. If you do end up adding those details, feel free to comment and I’ll add those in too.

How To List Table Row Counts And Index Statistics

On getting an ETL performance testing system of some sort up and running, one of the Socratic questions that arises at some point is:

“When we run a workload, how much data is there in the database and how does it affect the workload itself?”

OK, so not as Socratic as “What is justice?”, as we can actually measure this one.

The starting point being, of course, to identify that data in the first place.
The snippets below provide a starting point for getting that data out of SQL Server.

Row Counts Only

The starting point can be to just list row counts of all tables in the target database.
The short example below will do just that.

-- tables only, clustered or heap, nothing else
SELECT
	S.name As SchemaName,
	T.name As TableName,
	SUM (P.rows) AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
WHERE
	T.type = 'U' -- user tables only
	AND P.index_id IN (0, 1) -- HEAP and CLUSTERED only
GROUP BY
	S.name,
	T.name
ORDER BY
	SchemaName,
	TableName
;

Rows Counts Per Partition

Next, it can be interesting to look at row counts per partition. This might be helpful if you’re running some crazy updates on specific partitions as part of your ETL process and you’d like to check whether it makes too big a difference to run those over low populated partitions vs highly populated ones.

-- tables only, clustered or heap, counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.rows AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
WHERE
	T.type = 'U' -- user tables only
	AND P.index_id IN (0, 1)-- HEAP and CLUSTERED only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber
;

Row Counts Per Partition & Index

It can also be interesting to see if ETL performance is affected by loads on top of tables with growing indexes, both in number and size.
This can go both ways, of course, but it also prepares for the next step.

-- tables, clustered or heap, plus indexes, counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.index_id AS IndexId,
	I.name AS IndexName,
	I.type_desc AS IndexType,
	P.rows AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
	INNER JOIN sys.indexes AS I
		ON I.object_id = P.object_id
		AND I.index_id = P.index_id
WHERE
	T.type = 'U' -- user tables only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber,
	IndexId
;

Row Counts Per Partition & Index, Plus Index Statistics

If we’re not yet content with the previous data, we can roll up our sleeves by digging into the index stats themselves.
Ask yourself, how many times did you see that T-SQL-done-ETL run incredibly slow and then magically fix itself when the underlying tables got rebuilt? As in REBUILD, really.
Fragmentation can have a big impact on any query, and ETL-ities are no exception.
The example below will grab basic fragmentation stats from all indexes so you can see if that’s a possible cause of red herrings in your workload.
Be careful running this query on a production system during business hours, it can quite heavy, especially if you pump up the call to sys.dm_db_index_physical_stats to increase the scan accuracy.

-- tables, clustered or heap, plus indexes, and physical stats counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.index_id AS IndexId,
	I.name AS IndexName,
	I.type_desc AS IndexType,
	P.rows AS Rows,
	PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
	PIS.index_depth AS IndexDepth,
	PIS.index_level AS IndexLevel,
	PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
	PIS.fragment_count AS IndexFragmentCount,
	PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
	PIS.page_count AS IndexPageCount
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
	INNER JOIN sys.indexes AS I
		ON I.object_id = P.object_id
		AND I.index_id = P.index_id
	CROSS APPLY sys.dm_db_index_physical_stats(
		/* database_id */ DB_ID(),
		/* object_id */ T.object_id,
		/* index_id */ I.index_id,
		/* partition_number */ P.partition_number,
		/* mode */ NULL) AS PIS
WHERE
	T.type = 'U' -- user tables only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber,
	IndexId
;

All For All

At some point you might also need to grab a snapshot of some or all the databases in your server for workload documentation purposes.
Doing this on all databases one by one can be time-consuming.
The example below lets you skip the RBAR pain and grab the same stats as the previous example, but for all the databases you need at once.
Remember to add a WHERE+LIKE filter on the first block to select only the database name patterns you want.
Again, be weary of running this on a production system during business hours.

-- list databases to loop
DECLARE @Databases TABLE
(
	DatabaseKey INT IDENTITY (1, 1) NOT NULL PRIMARY KEY CLUSTERED,
	DatabaseName SYSNAME NOT NULL
)
INSERT INTO @Databases (DatabaseName)
SELECT name FROM sys.databases;

-- this will hold the final result
IF OBJECT_ID('tempdb.dbo.#Data') IS NOT NULL
DROP TABLE #Data;

CREATE TABLE #Data
(
	DatabaseName SYSNAME NOT NULL,
	SchemaName SYSNAME NOT NULL,
	TableName SYSNAME NOT NULL,
	PartitionNumber INT NOT NULL,
	IndexId INT NOT NULL,
	IndexName SYSNAME NULL,
	IndexType SYSNAME NOT NULL,
	Rows BIGINT NOT NULL,
	IndexAllocationUnitType SYSNAME NULL,
	IndexDepth TINYINT NULL,
	IndexLevel TINYINT NULL,
	IndexAverageFragmentationPercent FLOAT NULL,
	IndexFragmentCount BIGINT NULL,
	IndexAverageFragmentSizeInPages FLOAT NULL,
	IndexPageCount BIGINT NULL,

	PRIMARY KEY CLUSTERED
	(
		DatabaseName,
		SchemaName,
		TableName,
		PartitionNumber,
		IndexId
	)
);

DECLARE @DatabaseKey INT = 1;
DECLARE @MaxDatabaseKey INT = (SELECT MAX(DatabaseKey) FROM @Databases);
WHILE @DatabaseKey <= @MaxDatabaseKey
BEGIN

	DECLARE @DatabaseName SYSNAME =
	(
		SELECT
			DatabaseName
		FROM
			@Databases
		WHERE
			DatabaseKey = @DatabaseKey
	);

	DECLARE @SQL NVARCHAR(MAX) = '
		INSERT INTO #Data
		(
			DatabaseName,
			SchemaName,
			TableName,
			PartitionNumber,
			IndexId,
			IndexName,
			IndexType,
			Rows,
			IndexAllocationUnitType,
			IndexDepth,
			IndexLevel,
			IndexAverageFragmentationPercent,
			IndexFragmentCount,
			IndexAverageFragmentSizeInPages,
			IndexPageCount
		)
		SELECT
			''' + @DatabaseName + ''' AS DatabaseName,
			S.name As SchemaName,
			T.name As TableName,
			P.partition_number AS PartitionNumber,
			P.index_id AS IndexId,
			I.name AS IndexName,
			I.type_desc AS IndexType,
			P.rows AS Rows,
			PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
			PIS.index_depth AS IndexDepth,
			PIS.index_level AS IndexLevel,
			PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
			PIS.fragment_count AS IndexFragmentCount,
			PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
			PIS.page_count AS IndexPageCount
		FROM
			' + QUOTENAME(@DatabaseName) + '.sys.tables AS T
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.schemas AS S
				ON S.schema_id = T.schema_id
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.partitions AS P
				ON P.object_id = T.object_id
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.indexes AS I
				ON I.object_id = P.object_id
				AND I.index_id = P.index_id
			CROSS APPLY sys.dm_db_index_physical_stats(
				/* database_id */ DB_ID(''' + @DatabaseName + '''),
				/* object_id */ T.object_id,
				/* index_id */ I.index_id,
				/* partition_number */ P.partition_number,
				/* mode */ NULL) AS PIS
		WHERE
			T.type = ''U'' -- user tables only
		ORDER BY
			SchemaName,
			TableName,
			PartitionNumber,
			IndexId
		;
		';

	EXECUTE (@SQL);

	SET @DatabaseKey += 1;

END

SELECT * FROM #Data;

How To Extract Serialized Data From String In One Go In T-SQL

Consider the following data:

DECLARE @OrderString VARCHAR(MAX) = '1=1;|2=|3=|4=1;|5=|';

This data was being returned in an externally generated XML file, so out of the control of the developer.

Each element of information is delimited by a pipe (|).
Within each element, the first number, to the left the equal sign (=), represented an “Item Number”.
The second number, to the right of the equal sign, represented an “Order Quantity”.
This could be any number, and if empty, it meant zero.

The developer needed to return this data in form of a table.
He also needed to return only the rows with an Item Quantity greater than zero.

In T-SQL, this conversion can be done in one go, using a single query.
Let’s go through the steps.

Remove Unnecessary Characters

First of all, if take a look at the source data, we can clearly see an oddity there.
The semi-colon character seems to appear only after and when there is a declared Item Value.
This is extra information we don’t need, as we already have each block delimited by the pipe character.
So let’s take this out of the equation:

WITH Step0 AS
(
	SELECT
		String = REPLACE(@OrderString, ';', '')
),

Identify Elements

Now, to make our lives easier, let’s split each element, or, “block” into a separate row.
This can be accomplished with a good old recursive CTE:

Step1 AS
(
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step0

	UNION ALL
	
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step1
	WHERE
		LEN(String) > 0
),

Identify Columns

Now that we have a row identified for each block, we can then extract the columns themselves:

Step2 AS
(
	SELECT
		ItemNumber = SUBSTRING(Block, 1, CHARINDEX('=', Block) - 1),
		ItemValue = SUBSTRING(Block, CHARINDEX('=', Block) + 1, LEN(Block))
	FROM
		Step1
),

Convert & Filter

Finally, now with proper columns in place, we can filter out the zero quantity items:

Step3 AS
(
	SELECT
		ItemNumber = CAST(ItemNumber AS INT),
		ItemValue = CAST(ItemValue AS INT)
	FROM
		Step2
	WHERE
		CAST(ItemValue AS INT) > 0
)

SELECT
	*
FROM
	Step3;

And that was it. Nothing to it.

Full Code

DECLARE @OrderString VARCHAR(MAX) = '1=1;|2=|3=|4=1;|5=|';

WITH Step0 AS
(
	SELECT
		String = REPLACE(@OrderString, ';', '')
),
Step1 AS
(
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step0

	UNION ALL
	
	SELECT
		Block = SUBSTRING(String, 1, CHARINDEX('|', String) - 1),
		String = SUBSTRING(String, CHARINDEX('|', String) + 1, LEN(String))
	FROM
		Step1
	WHERE
		LEN(String) > 0
),

Step2 AS
(
	SELECT
		ItemNumber = SUBSTRING(Block, 1, CHARINDEX('=', Block) - 1),
		ItemValue = SUBSTRING(Block, CHARINDEX('=', Block) + 1, LEN(Block))
	FROM
		Step1
),

Step3 AS
(
	SELECT
		ItemNumber = CAST(ItemNumber AS INT),
		ItemValue = CAST(ItemValue AS INT)
	FROM
		Step2
	WHERE
		CAST(ItemValue AS INT) > 0
)

SELECT
	*
FROM
	Step3;

8 DBCC commands you’ll want to know

DBCC commands can be really useful while debugging your SQL Server. Below I’ve listed 8 commands you’ll want to keep in your pocket.

DBCC DROPCLEANBUFFERS

DBCC DROPCLEANBUFFERS

Amazingly does what is says on the tin. Dumps all pages from memory. Great for performance testing. Try not to do this on a production system unless you enjoy answering support calls from annoyed users.

DBCC SHRINKDATABASE

DBCC SHRINKDATABASE (N'MyDatabase', 10 /* = Percent To Leave Free */);

Also does what is says on the tin. This will shrink all files in the target database (plus the log) and release unused space from files back to the operation system. This works by blindly moving pages from the end of the file up the free spaces in the beginning of the file, without any consideration for consolidation of your data. Great if you enjoy increasing fragmentation like a maniac, or you let that development database grow a wee bit too much.

If you do have to run it, consider DBCC SHRINKFILE instead and do it one file at a time, so you don’t have to come back the next day.
It also might be a good idea to run a full re-index too on your database, to get those pages back in shape again.

DBCC SHRINKFILE

DBCC SHRINKFILE ('SomeFile', 10 /* = Target File Size In MB */)

Same as DBCC SHRINKDATABASE but for a specific file. Same warnings apply. Best to use this one file at a time. But then again, best to pre-allocate your database files to begin with, if you’re able to.

DBCC FREEPROCCACHE

DBCC FREEPROCCACHE

Dumps all the cached query plans. Useful for progressive load testing when paired with DBCC DROPCLEANBUFFERS. Also useful if SQL Server has gone bonkers and cached a really nasty plan for some query. If this does happen, check if your statistics are updated, as the Query Optimizer usually knows best – at least with the information it has.

DBCC CHECKDB

DBCC CHECKDB /* runs on the current database */
DBCC CHECKDB(MyDatabase) /* runs on the target database */
DBCC CHECKDB(MyDatabase, REPAIR_REBUILD) /* tries to repair the database, if only recoverable corruption has occurred */
DBCC CHECKDB(MyDatabase, REPAIR_ALLOW_DATA_LOSS) /* tries to repair the database, even if data loss occurs */

Verifies database consistency and corruption.
Errors get returned in nice red messages where you can find “etc etc etc Page (1:789) etc etc etc”.
The 1 refers to the number of the file containing the fault.
The 789 is the faulty page number in the referred file.
Recovering from this feels like theoretical physics from what I’ve seen out there, and seems to range from letting CHECKDB run its course, to selectively retrieve rows from specific backups.

DBCC SQLPERF

Shows what your log file usage looks like. Useful to see if some database is getting ready to blow up due to lack of log space, especially if you have fixed log file size.

DBCC SQLPERF(LOGSPACE)

DBCC INPUTBUFFER

Now this will work magic. If you give it a SPID, it will show you the last query ran by that SPID.
Great quick and dirty way to figure out what query is blocking some other process.

DBCC INPUTBUFFER(<SPID>)

To lookup a list of running processes, you can, for example, run:

SELECT * FROM sys.sysprocesses

DBCC HELP

For a refresher on what DBCC stuff you can do, DBCC HELP will provide information on the usage of specific DBCC commands, plus a list of available commands to start with… that have help available, at least.

For a list of stuff just run:

DBCC HELP ('?')

To get help on say, INPUTBUFFER, just run:

DBCC HELP ('INPUTBUFFER')