tomcarnell.com

Wednesday, September 14, 2011

Why database column names should be globally unique

After years of working with relational databases (RDBMS's), I have just started to realise the importance of globally unique names for database columns. Generally I use the term 'globally unique' to mean unique within the database schema, but I it could potentially extend to being unique across all schemas.

The problem that arises from using repeating column names is when two or more tables are used in a single query. For example, if we have two tables: 'customers' (which has a column 'id') and 'customer_orders' (which also has the column 'id') when we perform the following query:

"SELECT * FROM customers c, customer_orders o WHERE c.name = 'tom';"

The "*" clause means we will return every column of both tables. So considering a single row returned from the query, what would be the value of 'id'? Would it be the 'customers.id' field or the 'customer_orders.id' field? What if our application needed to use both values? Of course, we could change our query and give aliases to the columns, which would also mean we would need to explicitly list the columns we wish to be returned:

"SELECT c.id customer_id, o.id order_id, c.name, o.value FROM customers c, customer_orders o WHERE c.name = 'tom';"

So we can now reference either the customers.id value and the order.id value. But this solution makes our query fragile and susceptible to schema changes - what if we later rename the column 'c.name' to 'c.first_name'? The query would then break. In addition, and for the purposes of consistancy of any 'data interface' we might define, we would be forced to use these explict column aliases in all our queries, which would reduce the readability of the SQL and introduce the same problems associated with any form of code duplication.

In reality, the first SQL query would be perfectly good had we used unique column names. For example, 'customers.cst_id' instead of 'customer.id' and 'customer_orders.cso_id' instead of 'customer_orders.id'. Here I have chosen a three character 'prefix' for column names that gives some clue to the name of the containing table. Therefore 'customers.name' would also become 'customers.cst_name'.

Implementing a simple column naming convention across the entire database schema will greatly improve the effectiveness of SQL queries and will reduce the likelihood of application errors that may occur by accidentally referencing the wrong data within an application.