Optimize SQL databases – Denormalization

2011-02-08

In this article I discuss three different denormalization techniques that can be used to increase performance in a SQL database. I turn to merging, vertical split and horizontal split. I also try to give an example in which situation they may be used.

Merging

In a normalized database where the tables are separated to avoid duplication of data storage can deteriorate the performance of searches over time as the tables grow bigger. It is so mainly because for every query involving multiple tables, they must be joined before the search. With several large tables, the data set would be enormous.

Tables that are often joined before a search and that requires good performance, can be merged together. The risk is that some data becomes redundant, for example, if one stores information about a user at each post they’ve written. Chances are also high that there will be a large amount of null values in table. Null values can be a bit difficult to handle and, above all, they are unnecessary. But they take almost no time at all to search through so therefore, the negative affect on the performance is almost nonexistent. The table will be much larger and more complex, but searches in it will go much faster than in a normalized database.

In a normalized database, it is easy to update data without the need to influence other data. For example, posts that a user enters often save a reference to the user, typically the user ID number. The user can then update their user name without the need to update all posts. In a merged table between users and posts, all posts a user has written needs to be updated if the user wants to change something.

Vertical split

If you have a large table with many columns, the chances are that it is only a few columns that are frequently used and where you want good performance in queries. Then there are probably many columns that are used less frequently and that may store a larger amount of data that is heavy to search through. One can then break apart such a table into two small tables, one containing the smaller amount of data that is used more often, and the second one that stores larger volumes of data and doesn’t have the same requirement for performance. The rows in the larger table contains the key to the row in the smaller table as it relates to. Then it is possible to merge the tables to do more searches.

An example would be a table that stores user information. The columns that might be most frequently used in such a table is ID number, username and password. These values are usually short so you can quickly search through them. These columns can then be placed in their own table while other information such as descriptions, pictures, etc. can be stored in a large table that need not be used as often and not subject to the same performance requirements.

Horizontal split

For example, an order management system, the plans are quite obviously to store all new orders, but perhaps also all the old orders  to keep the history and to generate statistics. It could in time be very hard to search when the number of rows in this table can grow pretty fast. A solution to this problem may be to split apart the table horizontally, so that all new orders are stored in their own table, and when they are completed, they are sent over to another table. Then, the table with active orders can keep good performance while all the old orders are also stored. You can do this by creating two identical tables, one for each type of orders.

You may make a decision whether it is worth to denormalize the database. For smaller databases that do not contain as much data, it probably does not matter as the queries are going quick enough. But over time as the tables grow larger and are becoming more complex, you can begin to investigate whether something should be changed to improve performance.

Other articles in this series: