MAS - Part 1: Smashing SQL Server's Plan Cache

So, you want to bring your application speed down? Well, I've got you covered. The "Making Applications Slower"-section of my blog should provide you with a few helpful tips. In this part we'll explore how SQL Server handles your application's queries and give you a few tips on how to mess things up.

Theory section - Important

When you open up a SqlConnection and issue a SqlCommand, Sql server will first hash your query, then consult the "Plan Cache" if the plan was seen before. If not, the query is parsed, compiled and then stuck in the Plan Cache. Sql Server does this to optimize things a bit. No reason to compile the same query more than once, right?

The way the query hash is computed, is by reading the raw bytes of your query, sprinkle with your environment variables (like ANSI-padding, COLLATION, stuff like that) and push it through some one-way function. The take-away being, that if you alter anything in you query - it's a new entry in the plan cache.

The size of the Plan Cache is dictated by the amount of memory available to your Sql Server. Not the individual databases, no. The Instance. So, if you have say 10 databases on your instance, well, they have to compete for space in the plan cache. This means that if your plan cache is too small, the database receiving the most traffic will most likely hog the plan cache making all other applications slower by forcing Sql Server to compile each plan at every execution.

Of course I'm putting things a bit on edge here. In practice there's a sliding window of expiration at play, but if your Sql Server is running low on memory, it may well be, that your plans will get evicted very quickly.

How to take advantage?

Yes, theory is all well and good, but how do you make this work for you in practice? Well - you need to consider how you'll be able to take advantage of this weakness.

Remove parameterization

One way would be to flood the plan cache with so many queries, that it'll run out of memory, simply by not parameterizing your queries. Consider this:

-- non parameterized
SELECT FirstName, LastName
FROM Customers
WHERE Id = 140924

-- parameterized
SELECT FirstName, LastName
FROM Customers
WHERE Id = @id

In the first case, Sql Server would have to make a separate plan and put it cache for each Customer Id you would ever ask the names for. The second case, Sql Server can store just one plan in the cache. If you have enough customers, and you request their names often enough you're already done. Cache flooded, which means every query needs to be compiled before running.

Inject comments

As stated above - the raw bytes of the query all go into the hash. So you could just add innocent comments to your query, like this:

/* Executed at [Some timestamp] By user [UserId] */

SELECT FirstName, LastName
FROM Customers
WHERE Id = @id

Now it looks like you parameterized it, but the dynamic timestamp + UserId in the comment forces a new plan. Neat, huh?

Use WHERE IN (@p1, @p2, ..., @pN)

This one is also quite neat. You'll see it happen a lot when using ORMs like Entity Framework or NHibernate, where you be able to write code like this:

List<int> customerIds = new List<int>{2,3,5,7,11,13,17, ..., 1001}
var names = ... ORM Specific Code Here ...
                    .Where(c => c.Id.In(customerIds))

And why would this cause problems? Well, because you get a variable number of parameters. Neither Entity Framework nor NHibernate know how to convert your list to a Table Valued Parameter (TVP), which makes sense, since you need to declare the type explicitly in the database, and the syntax to feed a TVP parameter into a query is a bit involved, because you need to add it as Type = Structured, TypeName = "Name in database". And the query-syntax needs to be somthing like "WHERE Customer.Id IN (select [colname] from @tvptype)"

But I digress - the cool thing here is that EF/NHibernate produces a query with the number of parameters equal to the size of customerIds. So if that is never a constant value, you get a new plan per set of customerids. You should however batch your queries when this list exceeds 2000 customerIds, because Sql Server will not accept more than 2100 parameters.

Of course this only produces up to 2100 plans in the cache, but if you have the pattern more places in your application, it should accumulate nicely.

Don't use OPTION (RECOMPILE) Hints

Just to make sure, this hint does in fact force a recompile of the plan - but only because it works differently than you might think. It is actually more of a OPTION (DONT_PUT_IN_PLANCACHE). So, it'll have to be recompiled every time, yes, but only because it is never added to the plan cache. Which will of course require some time to recompile, but won't flood the plan cache - it's actually detrimental to our efforts.

Remember to disable "Optimize for Ad-hoc Workloads"

You'd almost think they saw it coming, but someone invented this pesky option which basically does the following:
- First time you issue a query, Sql Server stores a stub of the compiled plan but doesn't put it in the plan cache.
- Next time the same query is issued, Sql Server sees that the query is in fact reused and puts it in the plan cache.

This effectively removes all the one-off queries that you so craftily constructed, so make sure this is turned off.

Advanced: Put a "bad" plan in the plan cache

So, the whole idea of the plan cache is to prevent recompiles and promote plan reuse. But what if you could construct a plan (even a parameterized one) that is horrible for performance?

Exploit skewed data

Like - say you have one very important customer that buys a lot. And 10.000 customers who don't. The query to get all the sales for a customer would probably like to go parallel for "Big Spender" but not for the other customers. Also, the amount of memory required for the query is much bigger. So whenever the database starts, make sure to prime it by querying the Salestable for "Big Spender", effectively making sure that the parallel, memory hogging plan is used.

And make sure that your application isn't able to differentiate between "Big Spender"s and regular customers by using different queries for the two.

Exploit Queries that use intervals

Like above, but a bit more obvious. Say you have a query that can query sales for a given period:

SELECT CustomerId, OrderNumber, OrderAmount
FROM Sales
WHERE OrderDate BETWEEN @from AND @to

If your @from and @to variables are close together, and OrderDate is indexed (somehow), it may use this index to speed the query up. But if you prime it using a very large interval (like say @from = DateTime.MinValue and @to = DateTime.MaxValue) - the cached plan will be to scan the entire index. The same goes for any interval you may want to search for. You'll see people trying to get around this by tasting the parameters beforehand and injecting comments specifying "large interval".


But if you successfully "poison" your plan cache, you need to make sure that the plan stays put. If you are successful in flooding the plan cache, your bad plan will soon be evicted unless you make sure to keep it alive, ie. by forcing the plan using Query Store. 

Monitor your progress

Like in performance optimization, you will want to keep track of your progress. To do so, you need some kind of metric and what I recommend is looking at your plancache and monitor 

1) The number of cached plans
2) The median age of all the cached plans.

The first measures how many plans you are caching. As mentioned there is only so much memory available to your plan cache, and the more plans in it, the more likely it is, that older, unused queries will get evicted. The second measurement is to guage how old you cache is, once it's been filled. To get them execute this:

-- Number of cached plans
SELECT count(1/0)
FROM sys.dm_exec_query_stats;

-- median creation_time
SELECT TOP 1 sub.creation_time 
	SELECT TOP 50 PERCENT stats.creation_time as creation_time
	FROM sys.dm_exec_query_stats stats
	ORDER BY stats.creation_time ASC	
) AS sub
ORDER BY sub.creation_time ASC;

You want to make sure that the number of cached plans is always above a certain threshold, which will be specific to your environment. And you want the median age to be as small as possible, which indicates that plans don't survive long in your cache.






Comments are closed