Blog ala Vidar

SQL, AppFrame and other cool technologies

Tag Archives: SQL

Renaming users

One of our customers are moving from one hosting provider to another, and because of this we have to migrate their databases and stuff. While doing this, they’re also changing their domain name (in Active Directory), and since that wasn’t enough change, all users will have a new naming convention in their usernames. With this in mind I’ve created a SQL script that will take “OldUser” and “NewUser”, look for all columns that have NVARCHAR(128), which is not computed, and update all of them. To make it easier, I’ve disabled the triggers and then enabled them when the query is finished.

SET NOCOUNT ON

DECLARE @OldUser AS NVARCHAR(128)
DECLARE @NewUser AS NVARCHAR(128)
DECLARE @TableName AS NVARCHAR(128)
DECLARE @ColumnName AS NVARCHAR(128)
DECLARE @SQL AS NVARCHAR(MAX)

SELECT
  @OldUser = 'OldDomain\NordnesVidar',
  @NewUser = 'NewDomain\NordVida'

SELECT
  O.name AS TableName, C.name AS ColumnName
    INTO #columns
    FROM sys.sysobjects AS O
    INNER JOIN sys.syscolumns AS C ON O.id = C.id
    WHERE O.xtype = 'U'
      AND O.name LIKE '[as]tbl%' -- Naming convention in Omega for tables
      AND C.xusertype = 231 --NVARCHAR
      AND C.length = 256 -- 128 (unicode)
      AND C.iscomputed = 0
      AND C.name NOT IN ('Domain')
    ORDER BY O.name, C.name

WHILE EXISTS (SELECT * FROM #columns)
BEGIN
    SELECT TOP 1
      @TableName = TableName,
      @ColumnName = ColumnName
        FROM #columns

    PRINT @TableName + ' - ' + @ColumnName

    SET @SQL = 'DISABLE TRIGGER ALL ON [' + @TableName + ']'
    EXEC (@SQL)

    SET @SQL = 'UPDATE [' + @TableName + '] '
    SET @SQL = @SQL + 'SET [' + @ColumnName + '] = ''' + @NewUser + ''' '
    SET @SQL = @SQL + 'WHERE [' + @ColumnName + '] = ''' + @OldUser + ''''
    EXEC (@SQL)

    SET @SQL = 'ENABLE TRIGGER ALL ON [' + @TableName + ']'
    EXEC (@SQL)

    DELETE
        FROM #columns
        WHERE TableName = @TableName
          AND ColumnName = @ColumnName
END

DROP TABLE #columns

Columns that are not in use

I’m working on upgrading a client to our new framework. While doing this, we’re creating new, clean and sexy, data models. Changing column names that are partly Norwegian and partly English into only English.

While I was doing this, I found a table which were heavily used, and had A LOT of fields. 173 to be exact. So, how do I check which columns that are not in use? Fastest way is to check all fields if they’ve got any values at all. This is of course not that accurate, since a field could have been used once, 10 years ago, and never later, but we’ll start with this. Here’s a script to find all fields in one table, that does not have any values:

SET NOCOUNT ON

DECLARE @TableID AS NVARCHAR(128)
SET @TableID = 'your_table'

DECLARE @ColumnName AS NVARCHAR(128)
DECLARE @SQL AS NVARCHAR(MAX)

SELECT
  name AS ColumnName
    INTO #columns
    FROM sys.syscolumns
    WHERE id = OBJECT_ID(@TableID)

WHILE EXISTS (SELECT * FROM #columns)
BEGIN
    SELECT TOP 1
      @ColumnName = ColumnName
        FROM #columns

    SET @SQL = 'IF NOT EXISTS (SELECT * '
    SET @SQL = @SQL + ' FROM ' + @TableID
    SET @SQL = @SQL + ' WHERE ' + @ColumnName + ' IS NOT NULL) '
    SET @SQL = @SQL + 'BEGIN '
    SET @SQL = @SQL + '    PRINT '''+ @ColumnName + ''''
    SET @SQL = @SQL + 'END'
    EXEC(@SQL)

    DELETE
        FROM #columns
        WHERE ColumnName = @ColumnName
END

DROP TABLE #columns

Missing Indexes

In SQL Server 2005, Microsoft introduced Dynamic Management Views and Functions (DMV/DMFs). These provides real-time data about indexes, hardware, query usage and many many other things. Using this, you can actually get suggested indexes from the SQL Server itself, that in it’s mind it would gain performance if you implemented them. So, why isn’t these generated automatically? The simple answer is disk-usage. If you implemented all these, you’d end up with A LOT of indexes, and therefore a much bigger database. So, which should you implement?

SELECT
  DENSE_RANK() OVER (ORDER BY MIGS.avg_user_impact DESC, MIG.index_handle) AS IndexRank,
  RIGHT(MID.statement, CHARINDEX('[', REVERSE(MID.statement))) AS table_name,
  MIC.column_id, MIC.column_name, MIC.column_usage,
  MIGS.unique_compiles, MIGS.user_seeks,
  MIGS.avg_total_user_cost, MIGS.avg_user_impact
    FROM sys.dm_db_missing_index_details AS MID
    CROSS APPLY sys.dm_db_missing_index_columns (MID.index_handle) AS MIC
    INNER JOIN sys.dm_db_missing_index_groups AS MIG
        ON MIG.index_handle = MID.index_handle
    INNER JOIN sys.dm_db_missing_index_group_stats AS MIGS
        ON MIG.index_group_handle = MIGS.group_handle
    WHERE statement LIKE '%atbl%'
      AND MIGS.unique_compiles > 10
    ORDER BY
      MIGS.avg_user_impact DESC,
      MIG.index_handle,
      CASE
        WHEN column_usage = 'EQUALITY' THEN 1
        WHEN column_usage = 'INEQUALITY' THEN 2
        ELSE 3
      END,
      column_id

Using this query, you get indexes, with it’s columns, ordered by the average user impact. You should consider those that have a relatively high avg_user_impact and/or avg_total_user_cost.

Backup Stored Procedure

I’ve had a couple of issues regarding the maintenance plan-tool in management studio (at least before 2008), and therefore I’ve never really put my trust in this tool. Instead of this, I’ve created a SP which now is available as an s-object in AppFrame. Will be available with the next build, but if you need it NOW, here it is:

This will take backups, (full, diff or transactional) using default settings and saving files with the following syntax:
DatabaseName_backup_201006231500.bak

CREATE PROCEDURE sstp_Database_Maintenance_Backup
(
	@BackupType AS CHAR(4) = 'FULL',
	@BackupPath AS NVARCHAR(300)
)
AS

IF @BackupType NOT IN ('FULL', 'DIFF', 'TRAN')
BEGIN
	RAISERROR('BackupType must be set to FULL, DIFF or TRAN!', 18, 1)
	RETURN
END
ELSE IF ISNULL(@BackupPath, '') = ''
BEGIN
	RAISERROR('BackupPath does not have any value!', 18, 1)
	RETURN
END
ELSE IF RIGHT(@BackupPath, 1)  '\'
BEGIN
	SET @BackupPath = @BackupPath + '\'
END


DECLARE
  @DBNAME AS NVARCHAR(128),
  @Filename AS NVARCHAR(128),
  @SQL AS NVARCHAR(MAX)


SET @DBNAME = (SELECT DB_NAME())
SET @Filename =
	@DBNAME +
	'_backup_' +
	CONVERT(VARCHAR(8), GETDATE(), 112) +
	REPLACE(CONVERT(VARCHAR(5), GETDATE(), 108), ':', '')

IF @BackupType = 'FULL'
BEGIN
	SET @Filename = @Filename + '.bak'
	SET @SQL = 'BACKUP DATABASE [' + @DBNAME + '] '
END
ELSE IF @BackupType = 'DIFF'
BEGIN
	SET @Filename = @Filename + '.diff'
	SET @SQL = 'BACKUP DATABASE [' + @DBNAME + '] '
END
ELSE IF @BackupType = 'TRAN'
BEGIN
	SET @Filename = @Filename + '.trn'
	SET @SQL = 'BACKUP LOG [' + @DBNAME + '] '
END

SET @SQL = @SQL + 'TO DISK = N''' + @BackupPath + '' + @Filename + ''' WITH '

IF @BackupType = 'DIFF'
BEGIN
	SET @SQL = @SQL + 'DIFFERENTIAL, '
END

SET @SQL = @SQL + 'NOFORMAT, INIT, NAME = N''' + @DBNAME + ' ' + @BackupType
SET @SQL = @SQL + ' Database Backup'', SKIP, NOREWIND, NOUNLOAD, STATS = 10'

EXEC(@SQL)

You can download the .sql file from skydrive.

SQL Server 2008 R2

R2 has finally hit RTM (Release To Manufacturing). Is it worth upgrading? Always! But there aren’t many changes that’s not related to the BI (Business Intelligence) part. Here are the keywords: Better SQL Azure compatibility, support for 256(!!!) logical processors. 2008 “only” supported 64. SSMS dashboards.

They’ve also done a couple of things regarding licensing. Datacenter and Parallel Data Warehouse editions have been introduced. Oh, yeah and Express Edition will now support up to 10GB databases, instead of 4GB.

SQL Server Analysis Services

Today a couple of us in Ølen had the pleasure of attending a session with Jan Ove Halvorsen with SSAS (SQL Server Analysis Services) as topic. This was an introduction to SSAS, although most of it was heavy stuff. I’ve played with SSAS before, but have had problems understanding some of the basics. This might help you on the way.

What is SSAS?

SSAS is part of the OLAP BI (Business Intelligence) part of SQL Server.  It’s also a part of the SQL Server licensing. This means that if you own a SQL Server license, you also own a SSAS license. It is recommended to install it on a server beside SQL Server, because it’s heavy on RAM usage. To connect to it you either use Visual Studio, Management Studio or an API. You can also connect to it using ADOMD.NET.

SSAS Modes

There are three main modes in SSAS:
MOLAP – Multidimensional OLAP – Data, aggregations and structures are stored in OLAP
HOLAP – Hybrid OLAP – Mix of MOLAP and ROLAP
ROLAP – Relational OLAP – Uses data from SQL Server but stores some structural data in OLAP

This means that you don’t need to run non-live data. For instance you can run MOLAP on monthly updates with a SQL Server Agent job running every month (or similar), storing everything in SSAS. This makes it VERY fast to query. If you need more live data, you can run ROLAP, so you get live data all the time. This will not be as fast, and it will also give a load on the SQL Server each time a query is executed.

With the 2005 version there came a new feature to SSAS, enabling it to “subscribe” to updates on the SQL Server. So, if you’re running ROLAP, it will generate the cube, and just use that local cube till there’s updates in the underlying data (on the SQL Server). When there’s an update SSAS gets a notification that there are changes, and it will rebuild the cube. Either the whole cube, or just parts of it.

OLAP Cubes and Dimensions

In SSAS you’re browsing cubes. A cube is a “collection” of dimensions. A dimension is a set of data, in many cases a one to one relation to a table or similar in SQL Server. This might for example be “Products”, “Cities”, “Countries” or similar. You also must have a time dimension. This means that all your data are aggregated so that every possible join of these (Products, Cities, Countries and Time) are stored as one “table”. This is what’s making this so fast to query, because everything’s “prejoined”.

Hierarchy

You can create hierarchies in SSAS. The most common is the Year –> Month –> Day hierarchy. This enables you to group by year, and just by clicking expand, you’ll get group by month in your client without having to get new data from the server.

Creating cubes

There are several ways of creating cubes, dimensions and other stuff. You can either use Visual Studio, Management Studio, or you can use a query language. This can be done using MDX (MultiDimensional eXpressions) which looks a lot like Transact SQL, but is also VERY different. An example could be:

SELECT
  { [Measures].[Store Sales] } ON COLUMNS,
  { [Date].[2002], [Date].[2003] } ON ROWS
	FROM Sales
	WHERE ( [Store].[USA].[CA] )

In this example Sales is the name of the cube. Store is one of their dimensions.

Handling strings in SQL Server

In .NET, if you have a string it’s:
VB: Dim vMyString As String
C#: string vMyString;

In SQL Server you’ve got more than one choice. varchar, nvarchar (with or without MAX), char, nchar, text and ntext. That’s why SQL Server is better than .NET. No, just kidding. When you’ve got a String in .NET, what’s the equivalent? nvarchar. If you’re trying to put varchar in .NET, you’ll end up with an nvarchar (since the text is changed to Unicode).

SQL is built to store large amounts of data and retrieving it faster than lightning. It’s not built to handle strings, but sometimes you need to do it on the SQL Server. That’s when functions like SUBSTRING, LEFT, RIGHT, REVERSE and PARSENAME are useful! The last one, PARSENAME, is a function that’s saved me A LOT of time. The purpose of this function is to retrieve table names, schema names etc from a string. For example a string like “Server.Database.Schema.Table”. If you put PARSENAME and set the second parameter to 1, you’ll always get the Table. 2 will always give you Schema etc. Now, how can we use this other than retrieving object-names?

DECLARE @Name AS NVARCHAR(100)

SET @Name = 'Nordnes, Vidar Stangeland'
SET @Name = REPLACE(@Name, ',', '.')

SELECT
  PARSENAME(@Name, 1) AS FirstName,
  PARSENAME(@Name, 2) AS LastName

The @Name variable contains a name where you’ve got “LastName, FirstName (MiddleName)”. Replacing comma with a period enables you to use PARSENAME. The cool thing about this is that it returns NULL if it doesn’t have any values. If you didn’t use PARSENAME, you’d have to do something like:

DECLARE @Name AS NVARCHAR(100)

SET @Name = 'Nordnes, Vidar Stangeland'

SELECT
  CASE WHEN CHARINDEX(',', @Name) > 0
	THEN SUBSTRING(@Name,
			CHARINDEX(',', @Name) + 2,
			LEN(RIGHT(REVERSE(@Name),
				CHARINDEX(',',
					REVERSE(@Name)))))
	ELSE @Name
  END AS FirstName,
  CASE WHEN CHARINDEX(',', @Name) > 0
	THEN LEFT(@Name, CHARINDEX(',', @Name) - 1)
	ELSE @Name
  END AS LastName

WAY more time consuming, for doing something that should be pretty easy. REVERSE is also a great function to know about. As is REPLICATE, STUFF, ISNULL (or COALESCE), LTRIM and RTRIM.

While I’m at it, “asdf “ (with trailing space) isn’t always the same as “asdf” (without trailing space). Just take a look at this example:

SELECT '1' WHERE 'asdf' = ' asdf'
UNION ALL
SELECT '2' WHERE 'asdf' = 'asdf '
UNION ALL
SELECT '3' WHERE 'asdf ' = ' asdf'
UNION ALL
SELECT '4' WHERE 'asdf ' = 'asdf'
UNION ALL
SELECT '5' WHERE 'asdf' LIKE ' asdf'
UNION ALL
SELECT '6' WHERE 'asdf' LIKE 'asdf '
UNION ALL
SELECT '7' WHERE 'asdf ' LIKE ' asdf'
UNION ALL
SELECT '8' WHERE 'asdf ' LIKE 'asdf'

Before you run the code, think about what you THINK will be returned. 1-4 is the same as 5-8 except the = vs LIKE. My first guess was 0 records. My second guess was 2 and 6. None of my guesses was correct though.

Remember also that a nvarchar(MAX) field can contain very much data. Returning this to the client could be very slow, especially if you include this field in other fields (for instance one field returning HTML-data, and one without the HTML-formatting), or as we often see here in Omega, a SearchColumn which is used by our framework for filtering. This field normally consist of all the (n)varchar-fields in the table.

Recursive CTEs

Some might panic now and think “OMG WTF does this mean LOL?!?!?!?!”. Let’s start with the second word (or acronym), CTE stands for Common Table Expressions and means that you can write something like a sub-select in T-SQL. An example would be:

WITH MyCTE AS
(
	SELECT TOP 1 *
		FROM Employees
)

SELECT *
	FROM MyCTE

This doesn’t make much sense, as you could just do:

SELECT TOP 1 * FROM Employees

But, if we add the other word, “Recursive”, which means doing something repeatedly, we might add some sense to it. First we’ll create a test table and put in some test data.

CREATE TABLE Employees
(
	ID TINYINT NOT NULL,
	Name VARCHAR(20) NOT NULL,
	ManagerID TINYINT NULL
)

INSERT INTO Employees
  (ID, Name, ManagerID)
SELECT 1, 'Petter', NULL UNION ALL
SELECT 2, 'Morten', 1 UNION ALL
SELECT 3, 'Trygve', 2 UNION ALL
SELECT 4, 'Vidar', 3 UNION ALL
SELECT 5, 'Håvard', 3 UNION ALL
SELECT 6, 'Stian', 5 UNION ALL
SELECT 7, 'Trine', 5 UNION ALL
SELECT 8, 'Werner', 4 UNION ALL
SELECT 9, 'Jakob', 2 UNION ALL
SELECT 10, 'Geir Ove', 9 UNION ALL
SELECT 11, 'Øystein', 10

As you see from this, Petter is the CEO. He’s only got one directly under him, and that’s Morten. Under Morten you’ve got Trygve, which has two directly under him: Vidar and Håvard. THIS IS JUST TEST DATA! Now that we got some test data, we can try to find out who’s got who as managers. We want it in the correct order, so that the hierarchy is correct. Petter first, then Morten, Trygve, Jakob. Then Trygve and Jakob’s employees. How to solve this? EASY!

WITH MyCTE (ID, Name, ManagerID, SortOrder) AS
(
	SELECT ID, Name, ManagerID, 0 AS SortOrder
		FROM Employees
		WHERE ManagerID IS NULL
	UNION ALL
	SELECT E.ID, E.Name, E.ManagerID, C.SortOrder + 1
		FROM Employees AS E
		INNER JOIN MyCTE AS C ON E.ManagerID = C.ID
)

SELECT *
	FROM MyCTE
	ORDER BY SortOrder, ManagerID

First we’re selecting out the one where ManagerID IS NULL, and then we’re recursively joining this and unioning it to show the output. Try it yourself and you’ll see for yourself!

Off topic… My new awesome background image, thanks to Werner:

LINQ to SQL

LINQ stands for Language INtegrated Query and is a part of .NET Framework 3.5. You can use LINQ to any object you want as long as it implements the IQueryable interface, including XML and SQL which makes it very interesting. When I first heard about LINQ I thought that it sounded cool, but I was worries about performance. I still am worried, but it’s even cooler than I imagined! To test it out I started creating TeamDoc in WPF, using LINQ. I connected to one of our SQL servers, drag and dropped a couple of tables and sim salabim my LINQ to SQL was ready to be used. I then started creating a user control for “recent updates”. The SQL query behind this looks something like this:

SELECT
  *,
  (SELECT TOP 1 Url
		FROM stbl_TeamDoc_Websites WITH (NOLOCK)
		WHERE TeamDocRef = Table1.PrimKey) AS Url
	FROM (SELECT TOP 100
		  C.PrimKey, C.Title, C.Status, C.Updated,
		  CASE WHEN C.Updated > ISNULL(US.LastRead,'1999-01-01')
			AND C.HideUntil  ISNULL(US.LastRead,'1999-01-01')
			     AND C.HideUntil <= GETUTCDATE()
			   THEN 1 ELSE 0 END DESC, C.Updated DESC) AS Table1

This, in LINQ looks something like this:

Dim vUpdatedDocuments = _
	(From d In vTeamDoc.Documents _
	Join us In vTeamDoc.UserSettings _
	On d.PrimKey Equals us.TeamDocRef _
	Where us.Login = Globals.Username _
	  AndAlso us.Notify = "Inbox" _
	  AndAlso d.Status = DocumentTypes.ToString() _
	  AndAlso d.Deleted Is Nothing _
	  AndAlso (From p In vTeamDoc.Permissions _
		   Join gm In vTeamDoc.GroupsMembers _
		   On p.GroupRef Equals gm.GroupRef
		   Where gm.Login = Globals.Username
		   Select p.TeamDocRef).Contains(d.PrimKey)
	Order By If(d.Updated > _
			If(us.LastRead.HasValue, us.LastRead, d.Updated.AddDays(-1)), 1, 0) Descending, _
	  d.Updated Descending _
	Take 30 _
	Distinct _
	Select New UpdatedDocument With _
	  {.Title = d.Title, .Updated = d.Updated, _
	   .LastRead = us.LastRead, .PrimKey = d.PrimKey})

I’m terrible sorry about the indenting and newlines, but I had to cut it to fit in here. Anyways, if you take a look at the LINQ query you’ll probably understand most of it. The only big difference is that I’m doing a Select New UpdatedDocument (on line 20). What this is doing is creating one UpdatedDocument (a class I’ve created) for each row returned by this query, and sets the properties .Title, .Updated etc. So, how do I loop this? VERY easy!

For Each vDocument As UpdatedDocument In vUpdatedDocuments
	Dim vTextBlox As New TextBlock With { _
		.Height = 15, _
		.VerticalAlignment = Windows.VerticalAlignment.Top, _
		.Text = vDocument.Title, _
		.Margin = New Thickness(0, vFromTop, 0, 0), _
		.Tag = vDocument _
	}

	If vDocument.Unread Then
		vTextBlox.FontWeight = Windows.FontWeights.Bold
	End If

	Me.RecentUpdates.Children.Add(vTextBlox)
Next

Here I’m creating a new textblock (some kind of mix between textbox and label in WPF), where I’m setting the .Tag property to my UpdatedDocument, so that I can use this object when for instance clicking on the textblock. And, if the document is unread, I’m setting it to Bold and then adding it to my RecentUpdates user-control.

I’ve struggled with LINQ a couple of days now, and I’m starting to get a hold of it. It’s still weird since I’m used to T-SQL, but I like the possibilities it gives me when developing. One thing I wasn’t too impressed by was the query it generated on the SQL Server though. I used profiler to check it out. It’s actually so ugly that I don’t dare to put it here 🙂 So, if you want to see it, test it yourself! Btw, to get started, Microsoft’s “LINQ To SQL Samples” is a good place to start.

Incremental for each domain

I’m currently creating a new data model for a customer. They’ve got an Elements table with a field called “Num” (Number) which is supposed to be incremental for each domain. How to solve that? Well, you can create a UDF and set the default value of the No field to this UDF, but then you don’t support inserting more than one row. So, how did I solve it? There’s (at least) two ways. You either create an instead of trigger, or do as I did. Here’s an example:

CREATE TABLE MyElements (
	Domain NVARCHAR(128),
	Num INT,
	PrimKey UNIQUEIDENTIFIER
)
GO

ALTER TABLE MyElements
	ADD CONSTRAINT DF_MyElements_PrimKey
	DEFAULT (NEWSEQUENTIALID())
	FOR PrimKey
GO

ALTER TABLE MyElements
	ADD CONSTRAINT DF_MyElements_Num
	DEFAULT (RAND(CONVERT([INT], CONVERT([VARBINARY](36), NEWID(), 0), 0))*(100000))
	FOR Num
GO

CREATE TRIGGER MyElements_ITrig
ON MyElements
FOR INSERT
AS

WITH MyCTE AS
(
	SELECT
	  PrimKey, Num,
	  ROW_NUMBER() OVER(PARTITION BY Domain ORDER BY PrimKey) AS RowID
		FROM MyElements AS E
)

UPDATE E
	SET E.Num = C.RowID
	FROM MyElements AS E
	INNER JOIN Inserted AS I ON E.PrimKey = I.PrimKey
	INNER JOIN MyCTE AS C ON I.PrimKey = C.PrimKey AND C.Num  C.RowID

First we create a table. Notice that the PrimKey field has NEWSEQUENTIALID() as default, and that the Num field has a default value with some cool random functionality. Then we create the trigger where we have a CTE that creates a ROW_NUMBER partitioned by Domain, ordered by PrimKey. Then we do an update of our table. Try yourself!

INSERT INTO MyElements
  (Domain)
SELECT TOP 1000
  'Omega'
	FROM sys.syscomments