Thursday, April 25, 2013

Database Normalization

Recently I was venturing through a forum on one of the SQL sites I regularly visit.  While there, I found someone who was asking a question about how to run a particular query on their database.  At first, I thought the query would have been pretty easy to run, but after a few attempts I found that I was still not getting the results I expected.  I took a step back and looked at the data again.  Finally, I realized my queries weren't working out as expected because their database wasn't normalized.

What is database normalization?  Database normalization is taking your data and organizing it into a more efficient manner.  The main goals of database normalization include:

  • Avoiding redundant data
  • Creating a table for each group of related data
  • Putting subsets of data into a separate table
  • Building relationships between each of the tables
Let's use, as an example, that you own a company that sells bike parts.  Would it be more efficient to store all your invoice data into one table?  Would you store the invoice details (charge totals, taxes, customer address) in the same table as the order details (quantity ordered, quantity shipped, items)?  You certainly could, but the data becomes ugly to try to query.

You would be better off having different tables, such as the following:
  • Customer (contains customer information, such as: account number, address, contact phone numbers, email addresses, etc.)
  • Invoice (contains account number, invoice number, charge total, taxes, shipping address, amount paid, etc.)
  • Invoice Details (contains invoice detail number, invoice number, item number, item description, item quantity, etc.)
  • Item Detail (contains item number, serial number, quantity available, etc.)
You will notice there is no duplicated data and we have what is called a Primary Key, or PK.  The PK is a value that uniquely identifies each row in the table.  This primary can then be used in other tables, where it then is called a Foreign Key (FK).  These keys can then be used to select data across each table.

Before I show you how primary keys are used, let me outline which is a PK or FK in each table.  See below:
  • Customer
    • Account Number (PK)
  • Invoice
    • Invoice Number (PK)
    • Account Number (FK)
  • Invoice Details
    • Invoice Detail Number (PK)
    • Invoice Number (FK)
    • Item Number (FK)
  • Item Detail
    • Item Number (PK)
Let's say we want to get the customer information for a particular invoice number.  We would run the following query:

SELECT
from Customer c
join Invoice i
on c.AccountNumber = i.AccountNumber
where i.InvoiceNumber = 12

Since we have the account number in both the Customer and Invoice tables, we can use it to join the tables together.

For more information on Database Normalization, take a look at the following article:

No comments:

Post a Comment