Blog ala Vidar

SQL, AppFrame and other cool technologies

Tag Archives: SQL Server

SQLXML

Ever tried working with XML in SQL Server? Some find it very difficult. I find it.. Interesting? I’m not sure. My opinion is that if possible, stay away from XML, but if you need to use it, here’s some quick starter tips:

USE tempdb;

IF EXISTS (SELECT *
		FROM sys.sysobjects
		WHERE name = 'xmltest1')
	DROP TABLE xmltest1

CREATE TABLE xmltest1 (
	ID INT,
	Name NVARCHAR(200)
)

INSERT INTO xmltest1
  (ID, Name)
VALUES
  (1, 'Jens Stoltenberg'),
  (2, 'Erna Solberg'),
  (3, 'Siv Jensen'),
  (4, 'Lars Sponheim')

SELECT *
	FROM xmltest1
	FOR XML AUTO

SELECT *
	FROM xmltest1
	FOR XML AUTO, ELEMENTS

DECLARE @MyXML AS XML
SET @MyXML = (SELECT *
		FROM xmltest1
		FOR XML PATH('Politician'), ROOT('Politicians'))

SELECT @MyXML
/*
<Politicians>
  <Politician>
    <ID>1</ID>
    <Name>Jens Stoltenberg</Name>
  </Politician>
  <Politician>
    <ID>2</ID>
    <Name>Erna Solberg</Name>
  </Politician>
  <Politician>
    <ID>3</ID>
    <Name>Siv Jensen</Name>
  </Politician>
  <Politician>
    <ID>4</ID>
    <Name>Lars Sponheim</Name>
  </Politician>
</Politicians>
*/

DECLARE @docHandle AS INTEGER
EXEC sp_xml_preparedocument @docHandle OUTPUT, @MyXML

SELECT *
	FROM OPENXML(@docHandle, N'/Politicians/Politician')
	WITH (ID INTEGER 'ID', Name NVARCHAR(200) 'Name')

SELECT @MyXML.query('/Politicians/Politician/Name')

/* Clean up*/
EXEC sp_xml_removedocument @docHandle
DROP TABLE xmltest1

I’m creating a test-table with two columns, and adding some records. These are well known Norwegian politicians. If you run this script, you’ll see it will return 5 record sets. The first one is from FOR XML AUTO. This means it will have one element pr record, with attributes for each field. The next one (FOR XML AUTO, ELEMENTS) will return one element pr record, with child elements for each field. The third (SELECT @MyXML) will have one root element, Politicians, with one element (Politician) pr record, and child elements for each field, as shown in the comments.

The fourth is using OPENXML instead of FOR XML. This means I have to “prepare” a document for the SQL server. I’m adding it, and getting back an ID to that document (using @docHandle). With this I’m querying the XML, to get SQL results, so the fourth is actually the same as running SELECT * FROM xmltest1. Be aware that I’m naming the fields. ID INTEGER ‘ID’. This means that it’s picking the ID element, and putting it in the ID field. For getting attributes, you’ll use @ID. For example ID INTEGER ‘@ID’. You can also query parent elements by doing ‘../ID’ or ‘../@ID’.

The last one is querying the @MyXML variable. Here I’m getting a XML based on @MyXML. I only get the Name elements.

NB!!! I’m using INSERT with multiple values. This is only supported by SQL Server 2008. The rest is supported by both 2005 and 2008.

Clearing the Server List

You’ve probably created some test account on a SQL server, and then tried to log in using this account. It works, but now Management studio remembers this account NO MATTER WHAT you do. Even though you try to press delete, log in using your account and check of “Remember password”, it still remembers this test-account you’ve logged in with ONCE. Why?! Have no idea. This must be the most annoying thing in the whole wide world when you’re a DBA.

Delete (or rename) this file: %APPDATA%\Microsoft\Microsoft SQL Server\100\Tools\Shell\SqlStudio.bin
For 2005 it’s using this file: %APPDATA%\Microsoft\Microsoft SQL Server\90\Tools\Shell\mru.dat

Thank you Aleksandr N 🙂

NULL?

A couple of days ago Torgeir came over to my desk and told me proud “you shouldn’t use SELECT when setting variables. You should use SET”. Then he told me the reason, and it got me thinking.. The reason is simple. If you do this:

DECLARE @PrimKey AS UNIQUEIDENTIFIER
SELECT @PrimKey = PrimKey
	FROM stbl_WinClient_Projects
	WHERE ProjectName = 'Sys.Security.ManageUsers'
PRINT @PrimKey
SELECT @PrimKey = PrimKey
	FROM stbl_WinClient_Projects
	WHERE ProjectName = 'A.Project.That.Does.Not.Exist'
PRINT @PrimKey

Notice that it will return the same PrimKey both times. When you’re writing the query, you might think that this will give you NULL on the last SELECT, but it won’t because there’s no records returned, and it will therefore not touch the variable. The correct way of doing this would then be:

DECLARE @PrimKey AS UNIQUEIDENTIFIER
SET @PrimKey = (SELECT PrimKey
		FROM stbl_WinClient_Projects
		WHERE ProjectName = 'Sys.Security.ManageUsers')
PRINT @PrimKey
SET @PrimKey = (SELECT PrimKey
		FROM stbl_WinClient_Projects
		WHERE ProjectName = 'A.Project.That.Does.Not.Exist')
PRINT @PrimKey

In some cases you might want to add ISNULL. For instance in subselects, this might be important. In the next query you’ll notice what I’m talking about. The first column will return NULL, and the other will return 0.000000:

SELECT
  (SELECT TOP 1 ISNULL(Amount, 0) AS Amount
	FROM atbv_Accounting_Transactions
	WHERE Series = 'SomethingThatDoNotExists'
	ORDER BY Period DESC) AS Amount1,
  ISNULL((SELECT TOP 1 Amount AS Amount
	FROM atbv_Accounting_Transactions
	WHERE Series = 'SomethingThatDoNotExists'
	ORDER BY Period DESC), 0) AS Amount2

Both queries are using ISNULL, but the first is using INSIDE the subquery, but since this will never return any records, it will never come that far.. The second uses ISNULL OUTSIDE the subquery, so if the subquery returns NULL, it will replace it with 0.0000

NB! One last thing… Some bright souls may have noticed that I haven’t used   NOLOCK on the first queries. I’ll send you a price. Good boy/girl 😀 ALWAYS REMEMBER TO USE WITH (NOLOCK) WHEN SELECTING FROM TABLES!

Restore to point-in-time

What do you do if one of your customers calls and sais he’s just deleted something and he REALLY needs it back? We do this test about ones a month, to check that our backup-routines is working. For this example, we’ll say that it’s a Friday and the current time is 16:00. Your customer deleted what he needed 30 minutes ago (15:30). Here’s our setup:
We take full backup ones a week to C:\SQL\Backup\MyDatabase.bak
Differential is taken every night to C:\SQL\Backup\MyDatabase_Diff.bak
Transaction log backup is taken every day at 12:00 to C:\SQL\Backup\MyDatabase.trn

First thing to do is to take a backup of your transaction log:

BACKUP LOG [MyDatabase]
	 TO DISK = 'C:\SQL\Backup\MyDatabase_CustomLog.trn'
	 WITH NAME = N'MyDatabase-Transaction Log Backup'

When this is done, start restoring your last FULL backup of the database:

RESTORE DATABASE MyDatabase_BU
	 FROM DISK = 'C:\SQL\Backup\MyDatabase.bak'
	 WITH
	 MOVE 'MyDatabase_Data' TO 'C:\SQL\Backup\MyDatabase_BU.MDF',
	 MOVE 'MyDatabase_Log' TO 'C:\SQL\Backup\MyDatabase_BU.LDF',
	 NORECOVERY

We’re using “NORECOVERY” because we’re going to push more files on this database before making it available. In our environment we take full backup ones a week, with differentials every night, and transaction logs at 12:00. Here’s the code for adding the diff: NB! You only need the latest differential.

RESTORE DATABASE MyDatabase_BU
	 FROM DISK = 'C:\SQL\Backup\MyDatabase_Diff.bak'
	 WITH FILE = 1,
	 NORECOVERY

After adding the differential, we’ll add the transaction log from 12:00:

RESTORE LOG MyDatabase_BU
	 FROM DISK = 'C:\SQL\Backup\MyDatabase.trn'
	 WITH NORECOVERY

And when that’s done, we’ll add the transaction log we just created, but with a “point in time” attribute, STOPAT. Since the customer did the delete at 15:30, we’ll restore till 15:29. Notice that we’re using WITH RECOVERY on the last one!

RESTORE LOG MyDatabase_BU
	 FROM DISK = 'C:\SQL\Backup\MyDatabase_CustomLog.trn'
	 WITH RECOVERY, STOPAT = '2008-10-24 15:29'

After this is done, you can insert the records FROM MyDatabase_BU to MyDatabase, and your customer will be VERY happy 🙂

Table Valued Parameters?

This is a new feature to SQL Server 2008, where you can send a table to a SP or UDF. I’ve got the question about how to handle this in 2005 many times, and I’ve now decided to post a blog about it. Before you read more about this, please consider. IS IT REALLY NECCESSARY to do this from the client? Can we do everything on the SQL Server? If so, DO IT! My following example can all be done on the server, but this is only to show you how to do it if you really need to get it from the client.

Let’s say we got a bunch of data we want to send to a SP. In this case, I’ve just built up a dataset from stbv_System_Users, where I’ve got one column, Login.

Dim vUsersDataSet As New DataSet
Dim vUsers As New afRecordSource("stbv_System_Users")

vUsers.SelectColumns.Add("Login")
vUsers.MaxRecords = 10

vUsersDataSet.Tables.Add(afAccessLayer.GetData(vUsers))

Dim vProcedure As New AppFrame3.Common.Data.afProcedureCall
vProcedure.ProcedureName = "astp_Vidar_Test1"
vProcedure.Parameters.Add("MyXML", vUsersDataSet.GetXml().ToString())

Try
	AppFrame3.Common.Data.afAccessLayer.ExecProcedure(vProcedure)
Catch ex As Exception
	AppFrame3.Win.CR3.HandleException(ex)
End Try

Here you see I’ve put it in a dataset, then used .GetXml to send it as a parameter to my SP:

CREATE PROCEDURE astp_Vidar_Test1 (@MyXML AS XML)
AS

DECLARE @DocID AS INT

EXEC sp_xml_preparedocument @DocID OUTPUT, @MyXML

SELECT *
	FROM OPENXML(@DocID, '/NewDataSet/Table1', 2)
	WITH (Login NVARCHAR(128))

EXEC sp_xml_removedocument @DocID

My SP takes one parameter, and “converts” it to SQL using OPENXML. Now I could go on, inserting these records somewhere. OPENXML takes three parameters. First one is docid, which is created by sp_xml_preparedocument. Second one is the path in the XML you want to handle. Third one is what type of input it is. If it’s with elements or attributes. 2 is elements, 1 is attributes. Under you’ll see the XML created by the .GetXml method on the dataset. I don’t care about the PrimKey field, so I’ve only specified the Login field in the WITH clause.

<NewDataSet>
  <Table1>
    <Login>vidar</Login>
    <PrimKey>63444aa2-e0b8-4c89-b476-229c81145961</PrimKey>
  </Table1>
  <Table1>
    <Login>vidar2</Login>
    <PrimKey>e72d74d0-3c8f-4256-b188-8001557d9849</PrimKey>
  </Table1>
</NewDataSet>

Common mistakes

When doing QA on SQL-objects, I’ve come over some very common mistakes. Number one has to be people forgetting that the Inserted and Deleted views inside triggers may contain more than more record.
This is a very good (and simplyfied) example of code from an insert trigger:

DECLARE @Login AS NVARCHAR(128)

SELECT @Login = Login
	FROM Inserted

IF NOT EXISTS (SELECT * FROM stbl_System_Users WITH (NOLOCK) WHERE Login = @Login)
BEGIN
	INSERT INTO stbl_System_Users (Login) VALUES (@Login)
END

This will only work if you’re inserting one record, but what if you’re inserting several? It will only do the code you’ve added for ONE (random) record.
The solution for this is very simple:

INSERT INTO stbl_System_Users
  (Login)
SELECT
  I.Login
	FROM Inserted AS I
	WHERE NOT EXISTS (SELECT *
				FROM stbl_System_Users WITH (NOLOCK)
				WHERE Login = I.Login)

In some cases you might want to execute a SP. Instead of INSERT, then EXEC:

DECLARE @Login AS NVARCHAR(128)

SELECT @Login = Login
    FROM Inserted

IF NOT EXISTS (SELECT * FROM stbl_System_Users WITH (NOLOCK) WHERE Login = @Login)
BEGIN
    EXEC astp_Some_SP @Login = @Login
END

Solution to this:

DECLARE @Login AS NVARCHAR(128)

SELECT  I.Login
	INTO #LoopLogins
	FROM Inserted AS I
	WHERE NOT EXISTS (SELECT *
				FROM stbl_System_Users WITH (NOLOCK)
				WHERE Login = I.Login)

WHILE EXISTS (SELECT * FROM #LoopLogins)
BEGIN
	SELECT TOP 1
	  @Login = Login
		FROM #LoopLogins

	EXEC astp_Some_SP @Login = @Login

	DELETE
		FROM #LoopLogins
		WHERE Login = @Login
END

Query Optimization

When doing query optimization, there is a couple of things to remember:

NEVER USE “OR”!
This KILLS performance! Instead of using OR, use UNION (ALL) or IN

UNION vs UNION ALL
The performance difference between these are big. UNION first groups and then orders. UNION ALL just adds the resultsets together

EXISTS vs IN
In many cases, you can use:

SELECT *
	FROM table1
	WHERE Login IN (SELECT Login
				FROM table2)

instead of using:

SELECT *
	FROM table1
	WHERE EXISTS (SELECT *
			FROM table2
			WHERE table2.Login = table1.Login)

Subselects vs JOIN
In some cases, like in the queryoptimizing webcast, subselects are MUCH quicker than using JOIN. This is not the case in all queries, so test both!

WITH (NOLOCK)
When you use NOLOCK, you read uncommited data from the table. This means that there MIGHT be data that are not written to the table yet. In most of our cases this is not a problem.

DO NOT USE *
… unless you need it. When using *, you can’t use indexes to select fields. Using a * forces the execution plan to select from the table.

Indexes and Statistics
Always make good indexes! Make sure indexes and statistics are rebuild when they need to! I’ll post an input here later on about indexes and statistics.

SQL Query tips

Happy new year, guys!

I’ve been very busy lately. In Ølen we’re making a new framework, based on winforms. The first thing we did was DROP TABLE stbl_MSAccess%. So long, suckers! Now there’s no turning back! (PS!! We will still support access in AppFrameR2). So, what are we doing? Well, we’re making a framework, based on .NET 2.0 and controls from DevExpress. We have already started making some forms and it’s starting to look pretty nice! Today I did a quick test. Made a simple form with a couple of subforms. Took me not more than 15 minutes! I hope everyone’s looking forward to this new framework 🙂

I ran into a little problem when I was making a form today. In one of the subforms we need to list all mails from/to one address, with from, to, cc, subject and the body. The problem was that the users want’s CCs in ONE column, comma separated, and it’s stored as one record pr address in the table. How did I solve this? Of course, with a little help from Torgeir and some XML-queries

SELECT M.FromEMail, M.Subject, MC.TextHTML, M.PrimKey,
  (SELECT (SELECT [data()] = EMail + ','
		FROM stbl_Mail_MessagesRecipients
		WHERE MessageRef=M.PrimKey
		  AND Type='To'
		FOR XML PATH(''), TYPE).value('.', 'varchar(max)')) AS ToEMail,
  (SELECT (SELECT [data()] = EMail + ','
		FROM stbl_Mail_MessagesRecipients
		WHERE MessageRef=M.PrimKey
		  AND Type='Cc'
		FOR XML PATH(''), TYPE).value('.', 'varchar(max)')) AS CCEMail
	FROM stbl_Mail_Messages AS M
	INNER JOIN stbl_Mail_MessagesContent AS MC ON M.PrimKey = MC.MessageRef

Replication in SQL Server

Tuesday we had pizza & learning in Ølen. I held the session, and the topic was replication in SQL Server. We even had guests from a company called Quantum Solutions. Some of you might know Roar or Erik from before.
Replication is a BIG topic. We used about three hours on it, but we could have used three years on it. It’s HUGE! In addition to replication, I added Database Snapshots, Database Mirroring and Log Shipping, because I feel this is very relevant in Omega.

Database Snapshots
This is a “point in time” snapshot of a database. The purpose of this is to have a read only database to run reports, or for quick backup. Say, for instance you are going to make significant changed in a table. Delete 100000 rows or similar. For insurance you take a backup of your database before you run the query. If something fails, you have to restore the database. If we’re talking about AppFrameR2 (Omegas main database), backup plus restore takes about 1.5 hours. For 1 delete statement?! Instead of this, you could create a snapshot before you run the query, and if the query fails, you can just revert to the snapshot. When you make changes to a database with a snapshot attached, the changes will only be done on the main database. The original pages of the changes will be copied to the snapshot. This is why it’s called a point in time snapshot. If you’re going to revert, you can only have ONE snapshot. If not, you can have several.

Database Mirroring
Database mirroring is exactly what the name says: You have one database with a mirror database. The mirror database is not accessible before you change the roles of the databases. There are three different roles in database mirroring; principal, mirror and witness. Principal is the main database, where all users are connected and changes are done. Mirror is a “slave” database which gets transactions from the principal. The purpose of this is failover. If you have a witness server, you can change the roles. The principal can then be mirror, and vice versa. You can then take down the old principal (now mirror), run upgrades, add hardware etc, without the database to actually go down. When the database is up and running again, it will be synchronized with the running principal (old mirror), and when that’s done, the old mirror will be set as mirror again, and the old principal will be set to principal again. To be able to read from a mirror, you have to make a snapshot of it. It is not possible to access a mirror. Not even in management studio, right-clicking and properties.
The witness server is used for automatically failover. If the principal is not accessible (hardware crash or similar), witness will set mirror as principal till principal is up and running again. To use database mirroring you must have two standard (or enterprise) editions of SQL Server. Developer does not work. The witness may run as express edition though. For communication between the servers, database mirroring uses endpoints.
There are three modes in database mirroring: High Availability, High Performance and High Protection.

  • High Availability commits synchronous transactions. If it commits on the mirror, it commits on the principal AFTER it commits on the mirror. If it rolls back on the mirror, it doesn’t run the transaction on the principal. This of course takes a bit more time than running a transaction on only one database.
  • High Performance does the opposite. It runs the transaction on the principal, and sends it asynchronous to the mirror if it commits on the principal.
  • High Protection is the same as High Availability except that it does not use a witness server. For failover, you must do it manually.

Log Shipping
Log shipping is sort of like database mirroring, except it’s not running every transaction over to the mirror. In log shipping the roles are called primary, secondary and monitor. The primary server takes a backup of the transaction log, copies it to a network share. Then the secondary (you can have multiple secondary servers) copies it from the network share to a local folder, and then restores the transaction log. The monitor’s responsibility is to keep track of the process and raise alerts when the process fails.

Replication
Now over to what Microsoft calls replication. There are three different types of roles; Publisher, Distributor and Subscriber. The publisher is in most cases the only place you can do data modifications. This is the primary server. The distributor is a server which takes care of distributing changes to the subscribers, and a subscriber is the database that receives copies from the publisher (via the distributor). In some cases the subscriber is read-only. The reason for this is that many companies have the need of several read-only databases for reporting etc.
There are two types of subscription types. Push and pull. Push should be used when there’s a stable and permanent connection. The distributor copies data TO the subscriber. Pull subscription should be used when there’s not a stable connection. The subscriber then copies the data FROM the distributor.

There are four types of replication, Snapshot, Transactional, Peer-to-Peer and Merge.

  • Snapshot replication copies datasets and does not use the transaction log. This should only be used on small amounts and static data.
  • Transactional replication uses the transaction log and sends “bulks” of the transaction log.
  • Peer-to-Peer is the same as transactional except it all servers are publishers, distributors AND subscribers. This means that if you have 4 servers, all of these will send transactions to the other three servers.
  • Merge replication is like snapshot replication, except that all servers allows modifications. All servers are then both publisher and subscriber.

Tuning SQL code

Have you ever though of tuning your car? Yeah, why not?
It makes your car run faster and use less fuel. Well, for starters it’s a bit expensive. But, tuning SQL is free, and it’s even more fun than tuning your car. Tuning your SQL means your queries run faster and use less fuel (CPU, memory etc)!

I’m at Conocophillips Indonesia at the moment upgrading to SQL Server 2005. I’ve used a couple of days to tune different queries, and at the most I got a query to run from an average of 4 minutes, to about 10 seconds, with basically the same code! Do NOT tell me that this was a temporary solution, because my solution was even easier! 🙂

When writing SQL code, keep these simple rules in mind:
Do not use sub-queries unless it’s ABSOLUTELY necessary
If you are setting several variables from the values of fields in tables/views, use one SELECT statement instead of three SET statements.

For example, do not use:

SET @Variable1 = 'Test'
SET @Variable2 = (SELECT Test FROM Test1 WHERE Test3='asdf')
SET @Variable3 = (SELECT Test2 FROM Test1 WHERE Test3='asdf')

Use:

SELECT TOP 1
  @Variable1 = 'Test',
  @Variable2 = Test,
  @Variable3 = Test2
    FROM Test1
    WHERE Test3 = 'asdf'

You can also use SELECT instead of SET to set static values
If possible, do not use CASE or UNION (ALL)
Use EXISTS, instead of 0 < (SELECT COUNT(*) something)
Use TOP 1. Always remember that a sub-query MAY return more than one record. Your query might then fail if you’re using Field = (SELECT SubField FROM subtable).

These rules are things that every developer should keep in mind when writing SQL code.

Of course, there might be places where you’ve followed every rule, but it still takes minutes to run. Then you might want to see the execution-plan. There are a couple of major things too look for here. For example, Scans (Index Scan, Clustered Index Scan, Table Scan etc) is NOT good. Seeks are a good thing.

You might also want to see what’s really happening when you’re running your query. To turn on execution-plan in Server Management Studio, press ctrl + M and rerun your query. It will appear as a new tab beside Results and Messages.

And, as always; Profiler might be a good tool. If you let it run for a couple of hours you will discover queries which take longer time than others. For example, you can set a filter on the Reads field, for >= 50000.

Do you have any other simple roules you’d like to share with us? Please mail them to me, and I’ll publish them here!