Blog ala Vidar

SQL, AppFrame and other cool technologies

Monthly Archives: April 2009

The how’s and why’s about indexes, part 2

SQL Server uses something called a B+tree. Some of you have probably gone to school and learned what this is, but I’ve used some time to really understand this. What is a B+ tree and what does that have to do with indexes? Well, when you insert a record into a table (with an index), these values will be populated into a tree to make it easier to search.

Download a little application called STX B+ Tree Demo. Set data type to integer and node slots to 4 and click “Insert Random” and choose “Insert 20 Random Integer Pairs”. This will leave you with something like this:

On the top we’ve got a root node (410). This is the KEY. If you look a bit more down, you’ll find 410 once more, but then with 472 under it. 472 is the VALUE for KEY 410. On the left of the root node, you’ll find all keys that are LESS OR EQUAL to the root node. Both 104 and 190 is less than 410. This level is called the intermediate nodes, and on the bottom you’ve got the leaf nodes. So, if we’re for example looking for the key 679, we start at the top. It’s more than 410, and therefore we move to the right. It’s between 575 and 788, therefore it’s in the middle of these. Then there’s only 4 keys to search in till we find what we looked for: 679, and the value is 277.

Now, compare this to our world. If you set an index on three columns, all three of them will be the key in this index. If you add included columns, these will only exist in the leaf node. But, what if you DON’T add included columns in the index. What will then be the value? RID. Row ID. You might have seen “RID lookup” in the execution plan. In the value of the leaf nodes, there’s always stored a RID (and if exists, included columns). This will point it to the correct row in the table. So, the reason for adding included columns is to get rid of the RID lookups. Also, it might be a good idea for covering indexes, since the total amount of bytes in an index (excluding the included columns) can maximum be 900 bytes.

Advertisements

Visit from Jon Jahren

Mr. Jon Jahren visited us yesterday, and we talked about SQL Server for 4 hours. It was fun, and very informative. I learned A LOT! In this blog I’ll try to sum up the most interesting discussions we had. We had a list of "questions" prepared for him:
– Tuning of (the setup of the) main SQL Server
– Specking new servers
– B-trees
– Mobile sync vs Replication
– Resource Governor
– Reporting Services

 

Tuning of (the setup of the) main SQL Server
First of all we logged on to the main SQL Server in Omega and started looking at hardware setup, settings on the server, database etc. The hardware setup was as it should be, but he explained a couple of settings that he recommended.
    "Boost SQL Server priority". This will, as it says, boost the priority of SQL Server on the CPUs. This should be turned off, because IF the server has some queries running 100% of the CPU you might not be able to log on to the server and kill those processes. He also explained that if we had network related problems, it might be a good idea to uncheck one core pr CPU, to let them deal with OS-related stuff (network). That way, SQL Server will not try to use these cores and therefore not be put in a queue.
    Under security on the server, there’s a option called "Enable Common Criteria compliance". This is a security feature that makes SQL Server not only release memory but overwrite memory with white-spaces or similar. Many customers has this as a criteria when setting up SQL Servers, so be aware!
    Named pipes should be disabled when you’ve got more than 500 users because this might cause performance issues if there’s more than 500 connections!
    When we talked about hardware setup, he reminded us about the problems with RAID5 and said he was a member of BARF(.com)

NB! Before SQL Server 2008 it was recommended to have the same amount of MDF files for the database as you’ve got CPU cores. This no longer applies to user databases, but this still applies to tempdb.

 

Specking new servers
Fast Track is collaboration between Microsoft, HP, Dell and Bull. This is a sort of standard for specking new servers. The only "thumb rule" we got was CPU cores X 4 = GB of RAM. We should double the amount of memory on our SQL Server.

 

B-trees
What’s the difference of a column in the index and an included column? First of all, you can’t filter on the included column. The reason they added included columns (or leaf node columns, as Jon wanted it to be called) was that the limit for indexes are 900 bytes. But in some cases you might want to select more than 900 bytes, but only filter on one or two fields.

 

Mobile sync vs Replication
We talked a bit about the differences between SQL Server Compact Edition and Express edition. CE was actually built with "cloud computing" in mind. CE should only be used as a local cache of data while it’s offline. EE could be used as a subscriber, or as a publisher/subscriber in merge-replication.

 

Resource Governor
This is a great tool for allocating resource. I’ve now set up a resource pool that restricts backups to only use 20% of the CPU. This means that users will hardly notice when we run backups (especially transaction log in the middle of the day). You can set various restrictions. For example I could make two pools and check for SUSER_SNAME(). If it’s me, then no restrictions, if it’s anyone else, use max 10% of CPU, RAM etc. I bet that would make my queries even faster! 😀

 

Reporting Services
We’ve changed our reporting engine a couple of times, and I know people are tired of recreating reports. Pr now, Reporting Services isn’t "good enough" for us, but MY PERSONAL OPINION (NB!!!! THIS IS JUST MY PERSONAL OPINION!!!!!) is that when reporting services is good enough, we’ll change to this. SQL Server 10.5 (2008 SP2?) will be released among with Office 14. In this release there will be no big changes to Reporting Services, but the release after this (11… Q2 2011?) will include a lot of changes to the Reporting Services tools. Most of them is actually ready already, but not included. So, no worries. I don’t think we’ll change reporting engine for at least 2-3 years 😉

 

I mentioned office 14. This will include an application codenamed Gemini. This is excel on steroides. The main goal for this application is to suite the needs of the heavy excel experts out there. With this you can connect to almost any database (dataware house), and play with millions of records in a couple of milliseconds. Looking forward to try this!

BTW! Did you know that the S in SUSER_SNAME() stands for Server?

If you log in to teamdocuments you can download all the files he sent me.