Tuesday, April 30, 2013

Database Files, Recovery Models and Backups Part 1

In your Microsoft SQL Server database, there are a minimum of two database files: an .MDF and .LDF file.  The .MDF file contains the data for your database, while the .LDF file is the transaction log file for your database.  Each of these files will grow as your data grows, but depending on your recovery model, your .LDF file may fill up at a slightly quicker pace.

There are three recovery models in Microsoft SQL Server 2012.  They are as follows:
  • Simple
    • The .LDF space is automatically reclaimed when new transactions have been entered.
    • No transaction log backups available.  Since no transaction log backups are created, if your database goes down for any reason, any transaction changes made on the database since last full backup would be lost and need to be re-created.
  • Bulk Logged
    • Transaction log backups are created, which allows for data recovery from last backup, but not point in time.
    • Fully logs regular transactions, but minimally logs bulk transactions.
  • Full
    • The .LDF file continues to grow as you have more transactions in the database.
    • Log backups are available.  Data can be restored from the last log backup, which would first need to be accompanied by a restoration from a full database backup.
For more info on the different recovery models, visit here: Recovery Models and for more info on bulk logging, visit here: Bulk Logs

To know which recovery model you should select, first review the type of data contained within the database.  For example, if your company manages online transactions, you will more than likely want to make sure your data is backed up regularly (Full Recovery Model).  If, on the other hand, your database pulls data from other servers and runs reports on the data, you could get by with the Simple Recovery Model (granted that you have a backup copy of the database schema).

A company I used to work for had our main database, which housed our real time data, and a second server that used SSIS packages to create tables, pull the data and then summarize it.  We could get by with the Simple Recovery Model on the report server, but we needed the Full Recovery Model on the real-time servers.

Tomorrow we will cover what to do about growing database logs and backups.

Monday, April 29, 2013

The Magic of Triggers

I was recently asked a question at work by one of our developers.  He needed some help on updating one of his stored procedures.  He was looking to see if there was some kind of variable that he could use to pass the current table he was querying.  This made me curious as to why he would pass such a variable.  After some discussion, I realized that he was looking to create audit trail for our internal website using that stored procedure.

My suggestion is quite uncommonly known for people new to Microsoft SQL Server.  I introduced him to the idea of triggers.

SQL Triggers

Now, what are triggers and why are they so useful?  Triggers are a special kind of stored procedure that occurs (is triggered) when a certain event happens on a given table.  Different events that can activate a Trigger include:
  • Insert
  • Update
  • Delete
So, now onto why they are useful.  Let's say, for instance, that you have a Customer table and every time a new customer is added, a contact is created in your Contact table.  Now, you could write a stored procedure that you would call once you've added the customer account.  On the other hand, why not create a trigger that would insert the new contact into the Contact table automatically, without a need for calling anything.  The trigger would look something like this:

CREATE TRIGGER tContact
ON Customer
AFTER INSERT 
AS 
INSERT INTO Contact (CustomerId, FirstName, LastName, EMail, PhoneNumber)
SELECT CustomerId, FirstName, LastName, Email, PhoneNumber
from Inserted
GO

SQL Server also creates two temporary tables that can be used during the Trigger: Deleted and Inserted.  The Deleted table contains original data from the table during Update and Delete statements.  The Inserted table contains data that is effected during Insert and Delete statements.  These tables could be used to grab data for whatever the purpose of your trigger is.

Overall, Triggers can be very useful, depending on the purpose.  Don't create too many triggers, though, as this could create too much overhead on the system.  Also, triggers are extremely hard to debug, so for debugging purposes, stored procedures might be better.  For more on the disadvantages triggers, visit the following blog:

Disadvantages of Triggers

Friday, April 26, 2013

SQL Tutorials

Today being Friday, I figure we should all take a break from the long posts.  I wanted to share with you what I used when I was first teaching myself T-SQL around 10 years ago put together by W3Schools.

SQL Tutorials

Here, you can learn much of the SQL basics, including the following:

  • Select statement
  • Where clause
  • Insert statement
  • Joins
  • Creating tables
  • Deleting tables
  • And much more

The great thing about the website is it allows you to try out what you've learned through quizzes and so much more.

Have fun.


Bill

Thursday, April 25, 2013

Database Normalization

Recently I was venturing through a forum on one of the SQL sites I regularly visit.  While there, I found someone who was asking a question about how to run a particular query on their database.  At first, I thought the query would have been pretty easy to run, but after a few attempts I found that I was still not getting the results I expected.  I took a step back and looked at the data again.  Finally, I realized my queries weren't working out as expected because their database wasn't normalized.

What is database normalization?  Database normalization is taking your data and organizing it into a more efficient manner.  The main goals of database normalization include:

  • Avoiding redundant data
  • Creating a table for each group of related data
  • Putting subsets of data into a separate table
  • Building relationships between each of the tables
Let's use, as an example, that you own a company that sells bike parts.  Would it be more efficient to store all your invoice data into one table?  Would you store the invoice details (charge totals, taxes, customer address) in the same table as the order details (quantity ordered, quantity shipped, items)?  You certainly could, but the data becomes ugly to try to query.

You would be better off having different tables, such as the following:
  • Customer (contains customer information, such as: account number, address, contact phone numbers, email addresses, etc.)
  • Invoice (contains account number, invoice number, charge total, taxes, shipping address, amount paid, etc.)
  • Invoice Details (contains invoice detail number, invoice number, item number, item description, item quantity, etc.)
  • Item Detail (contains item number, serial number, quantity available, etc.)
You will notice there is no duplicated data and we have what is called a Primary Key, or PK.  The PK is a value that uniquely identifies each row in the table.  This primary can then be used in other tables, where it then is called a Foreign Key (FK).  These keys can then be used to select data across each table.

Before I show you how primary keys are used, let me outline which is a PK or FK in each table.  See below:
  • Customer
    • Account Number (PK)
  • Invoice
    • Invoice Number (PK)
    • Account Number (FK)
  • Invoice Details
    • Invoice Detail Number (PK)
    • Invoice Number (FK)
    • Item Number (FK)
  • Item Detail
    • Item Number (PK)
Let's say we want to get the customer information for a particular invoice number.  We would run the following query:

SELECT
from Customer c
join Invoice i
on c.AccountNumber = i.AccountNumber
where i.InvoiceNumber = 12

Since we have the account number in both the Customer and Invoice tables, we can use it to join the tables together.

For more information on Database Normalization, take a look at the following article:

Wednesday, April 24, 2013

Using CASE Statements in Your Select

Let's say that you have a table of data that you want to group together.  Your data might be one of the following:

  • You want to give a category or title to a list of ranges (using int, numeric, float, etc.)
  • You want to group values that aren't necessarily in the same range (varchar, etc)
  • You want to give more description to a single value or list of values
  • You rely on more than one condition to fall under a particular category
There are a few ways that you could approach grabbing this data.  You could write separate queries, just grabbing the data you want grouped together, and then tying all data together using UNION ALL.  That query would look something like this:

SELECT 'Less than $10' as Category, count(*)
from Sales
where SalesAmount < 10

UNION ALL

SELECT 'Between $10 and $50' as Category, count(*)
from Sales
where SalesAmount between 10 and 50

UNION ALL

SELECT 'Over $50' as Category, count(*)
from Sales
where SalesAmount > 50

This certainly gets the job done, but leaves too much room for error.  Your other choice is to write a CASE statement in your select statement (which would need to be accompanied by using the same CASE statement in a group by).  Accomplishing the same output, you could do something like this:

SELECT Category = CASE
WHEN SalesAmount < 10 then 'Less Than $10'
WHEN SalesAmount between 10 and 50 then 'Between $10 and $50'
WHEN SalesAmount > 50 then 'Over $50'
END,
count(*)
from Sales
group by CASE
WHEN SalesAmount < 10 then 'Less Than $10'
WHEN SalesAmount between 10 and 50 then 'Between $10 and $50'
WHEN SalesAmount > 50 then 'Over $50'

END

You can also use the CASE statement to provide more than one condition:


SELECT Category = CASE
WHEN SalesAmount < 9 or SalesAmount > 99 then 'Small and Big Sales'
WHEN SalesAmount between 10 and 50 then 'Middle Sales'
END,
count(*)
from Sales
group by CASE
WHEN SalesAmount < 9 or SalesAmount > 99 then 'Small and Big Sales'
WHEN SalesAmount between 10 and 50 then 'Middle Sales'
END

Go here to read more on SQL CASE statements:

SQL CASE

Monday, April 22, 2013

Common Table Expressions or CTE


At work, I've created over 40 reports that run on a daily, weekly or monthly basis.  I've created these reports using a SQL Server tool called SQL Server Integration Services, or SSIS.  The one problem I've found with SSIS is that SQL won't allow the use of temporary tables (tables that are generated on the fly and are dropped as soon as your query is over).  Through my research, there is one option within SQL Server that allows for table called Common Table Expressions, or CTE.

Using Common Table Expressions

A CTE is much like a temporary table, it is generated on the fly and disappears as soon as your query is over.  Creating a CTE is pretty simple.  All you need to provide is the following:


  • Expression Name (table name)
  • Column name list (not required)
  • CTE Query Definition (statement selecting your records)
  • Your actual query on the new CTE

From there, a query can then be run off of the CTE data to summarize or join with other tables.  Here is an example:

WITH Sales_CTE (SalesPersonID, OrderID, SalesDate)
AS
(
    SELECT SalesPersonID, SalesOrderID, YEAR(SalesDate) AS SalesDate
    FROM Sales
    WHERE SalesPersonID IS NOT NULL
)

SELECT SalesPersonID, COUNT(OrderID) AS TotalSales, SalesDateFROM Sales_CTE
GROUP BY SalesDate, SalesPersonID
ORDER BY SalesPersonID, SalesDate

There are a few differences I've seen between CTE's and temporary tables that may discourage you from using them on a regular basis.

  • No option to alter (add or remove columns) the CTE and it's contents
  • No option to update or delete rows from the CTE
If you don't need to do either, then a CTE is your best option.  There are also many advantages to using CTE's

  • Use the CTE multiple times within the same query, resulting in not having to write the same query over and over.
  • Avoid making mistakes like referencing incorrect columns and/or tables.

Friday, April 19, 2013

SQL COALESCE

I know it seems to be a recurring theme with me dealing with NULLs, probably because I come up with situations dealing with NULLs all the time at work.  You can't use equality symbols on NULLs, so if you have NULL, you need to figure out how to deal with them.

Here is another fancy way of dealing with NULLs if you have some equations you have to run.

Let's say that you work for an organization that deals with different salary types:


  • Hourly
  • Salary
  • Commissions
If you work at a company that has a sales force, you more than likely have all three of these salary types.  Now, do you keep a different table for each salary type and store the employees in those different tables?  Sounds like a little too much work for such an easy concept.  That is where SQL COALESCE() comes into play.

SQL COALESCE

COALECE allows you to specify a comma delimited list of expressions and it will return the first non-NULL expression.  So, in our example above, we have columns that show an employee's: hourly wage, salary, commission amount and number of sales.  We can then specify each expression with the wage calculation, as seen below:


SELECT COALESCE(hourly_wage * 40 * 52, salary, commission * salesnum) AS 'Total Salary'
FROM wages

This will then return the wage for each employee, no matter if they are hourly, salary or commission based.

Thursday, April 18, 2013

Working with Divide by Zero in SQL

One of the biggest annoyances when working with numbers in SQL Server is the possibility of dividing by zero.  Nothing can get a programmer more angry than seeing one of these errors in his/her programs:


Msg 8134, Level 16, State 1, Line 1
Divide by zero error encountered.


Some solutions I have tried before include adding the smallest of values to your denominator, but this could throw your results off.  If you are working with thousandths of a penny, like I do, this could really mess up your reports.

select 1/(0+0.0000005)

Most of the time, a zero value is the most appropriate solution, so you shouldn't be adding anything to it.  I have found a pretty simple solution in SQL to this (what could be) complex problem.  Microsoft SQL Server has a fancy expression called NULLIF().

MSSQL NULLIF()

NULLIF works this way: specify the value or expression that may contain the zero and then specify the value you are looking for (in this case, zero).  When the value you are looking for is found, a NULL is returned.  From there, you are working with a NULL value instead of working with an error.

This returns a NULL:

select 1/NULLIF(0, 0)

You could then update your statement so the final result is a zero, if that is appropriate:

select ISNULL(1/NULLIF(0, 0), 0)

In this way, you can avoid getting a divide by zero error while actually showing the zero as a result.

Wednesday, April 17, 2013

Commenting Your Code

At my last job, I was the IT Manager.  I oversaw anything that had to deal with IT, but most of my time was spent reviewing TSQL code created by the programmers.  One of our programmers was horrible at making notes in her code (and by horrible, I mean she didn't do it at all).  When I had to review the code, I had no idea what specifically she was trying to accomplish.

One SQL project she was working on dealt with rating our customer's calls.  Since our rating was quite complicated, so was the code.  Reviewing her code was a long and arduous process and since her program didn't work they way we wanted, I had to inspect each and every single line of code.  Some of the suggestions from the following article would have helped:


Some different things I did while going through her code:
  • Comments
    • With each line of code or query I made a comment about what exactly that code was doing.  I used /*.......*/ for long notes and -- for short notes
    • These notes can be pretty short and to the point, just so people know what it's doing.
  • Formatting Code
    • I like to put my SELECT, FROM, WHERE on their own lines
    • I like to put only about two conditions on a line, depending on how long a condition is
  • Naming Guidelines
    • Unfortunately, I was one of those people who would use table and column aliases that weren't descriptive.  Code is much easier to look at if the aliases give a little information about what the data actually contains
    • Variables with descriptive names are very important, especially for long code.
I promise you that if each of these guidelines are followed, your code will be easier to read and debug, not just for you, but anyone else who could possibly read your code.

Tuesday, April 16, 2013

Working with NULLS in your CASE statement

Lately I've been trying to focus more on database administration (DBA) techniques and how to improve my work.  I've signed up for a few Microsoft SQL Server websites and try to learn something new each day by reading their articles.  Today I came across the following link and am proud to know that I came up with this method for dealing with NULLs long before this article was written:

Writing CASE expressions properly when NULLs are involved

My situation was much like this persons: I went to investigate why a stored procedure wasn't returning values as I expected.  Once I jumped into the code, I found that the NULL in my case wasn't being recognized.  This is the best way to deal with that situation.