## Relationary: The Art of Database Normalization

One Fact In One Place

The relational model is said to be based on Cartesian mathematics.  Descartes greatest contribution was a Cartesian geometry which was the creation of a coordinate system for Euclidean geometry and geometry in general.  Whatever database you use, it has a coordinate system which gives it its structure.  Even a single bit is a structure all by itself.  Cartesian geometry took Euclidean geometry out of synthesis based on theorems and logic and moved it into coordinates based on analysis and algebra.  We are dealing with linear analysis and linear algebra and it forms the foundation for calculus and relational calculus which lead to  E. F. Codd’s inventing the relational model for database management, the theoretical basis for relational databases.

The truth about a relational database is conceptually it is only one table.  Only one.  Any relational database could be converted into a single spreadsheet.  So, why all the tables?

First, every database can be separated into the logical structure and the physical content.  The logical structure is called the schema and the physical content is called the data.  Logical and physical separation optimizes the administration, management and employment of the database.  As soon as you separate the logical and physical tables you eliminate an enormous amount of duplication in the system as well as all the effort required to maintain the correct values among all the duplicates.  This optimization is the zero normalization of the database.  A database that does not separate logical and physical data is the single table I mentioned and this I call a “complex normal form”.  Metadata could be regarded as “middle management tables” between the logical and physical tables.

Database normalization eliminates duplicate data which can lead to data anomalies. The normalization process involves getting data to conform to progressive normal forms, and a higher level of normalization cannot be achieved unless the previous levels have been satisfied. Below is a summary of the Normal Forms followed by an example of each.

SUMMARY OF NORMAL FORMS

EXAMPLES OF NORMAL FORMS

ZERO NORMAL FORM

Zero normal form (ZNF) is simply a set of data that has not gone through the process of normalization. It is quite possible that it meets the requirements of normalization already. However, going through the normalization steps on any data set is a best practice.

In our example, the data is not normalized and the normalization process will be followed.

[goto summary]

FIRST NORMAL FORM

The first normal form (1NF) requires that the values in each column of a table are atomic. By atomic we mean that there are no sets of values within a column.

In our example, the Database column in the MemberCompanyDatabase table contains repeating groups. Normalization requires us to create a new table called DatabaseMember where there is only one column with one value for Database per row.

[goto summary]

SECOND NORMAL FORM

Where the First Normal Form deals with atomicity of data, the Second Normal Form (or 2NF) deals with relationships between composite key columns and non-key columns.

The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key.

In our example, the Database column was only dependent on part of the MemberDatabase composite key. Normalization requires creating a new table we call Database where the column values are unique.

[goto summary]

THIRD NORMAL FORM

Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency).

In our example, the Company column in the MemberCompany table is a transitive dependency. Normalization requires creating a new table we call Company where the column values are unique.

[goto summary]

BOYCE-CODD NORMAL FORM

Boyce-Codd normal form (BCNF) can be referred to as a more stringent version of third normal form only used in rare cases.

Problem

In our example, the purpose of the table is to show which members use which databases. The table’s candidate keys are:

• {MemberID, DatabaseID}
• {MemberSocialSecurityNumber, DatabaseID}

Therefore all three attributes of the table are prime attributes: that is, all three attributes belong to candidate keys.

Recall that 2NF prohibits partial functional dependencies of non-prime attributes on candidate keys, and that 3NF prohibits transitive functional dependencies of non-prime attributes on candidate keys. Since the table above lacks any non-prime attributes, it adheres to both 2NF and 3NF.

BCNF is more stringent than 3NF in that it does not permit any functional dependency in which the determining set of attributes is not a candidate key (or superset thereof). The dependency of MemberID on Member Social Security Number is such a dependency. Accordingly, the table above is not in BCNF.

Any table that falls short of BCNF will be vulnerable to logical inconsistencies. In the table above, there is nothing to prevent two different Member IDs from being shown, illegitimately, as corresponding to the same Member Social Security Number.

Solution

Correcting the problem in this case would be a simple matter of using only one scheme of identifiers for Membersrs: either IDs or Social Security Numbers, but not both. In this case I would move Member Social Security Number into the Member Table.

[goto summary]

FOURTH NORMAL FORM

Problem

Fourth normal form (4NF) requires that independent multivalued facts are correctly and efficiently represented.

In our example, we have three entities with two many to many relationships.

Incorrect Solution

Often what happens is these three tables are combined into a ternary (three way) relationship which satisfies third normal form, but not fourth normal form.

Correct Solution

Notice that because the table has a unique key and no non-key attributes, it does not violate any normal form up to BCNF. But because the manuals a member owns are independent from the databases the member uses, there is redundancy in the table. In formal terms, this is described as manual having a multivalued dependency on database.

To satisfy 4NF, we must place the facts about manuals owned into a different table from the facts about databases used:

[goto summary]

FIFTH NORMAL FORM

Problem

Fifth normal form (5NF) is required to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.

No Rule Solution

The member is able to offer support for the given database and when he owns a given manual. In the absence of any rules restricting the valid possible combinations of member, database, and manual, the three-attribute table Member-to-Database-to-Manual is necessary in order to model the situation correctly.

Rule Solution

However, what if the above model the following rule applies: When a member owns a manual M and the member uses database D, then – in the event that the manual M covers the database D– it must be true that the member is able to offer support on database D and is supported by manual M.

With these constraints it is possible to split the relation into three parts.

[goto summary]

DOMAIN/KEY NORMAL FORM

Domain/key normal form (DKNF) is a normal form used in database normalization which requires that the database contains no constraints other than domain constraints and key constraints.

A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes that uniquely identify a row in a given table.

The domain/key normal form is achieved when every constraint on the relation is a logical consequence of the definition of keys and domains, and enforcing key and domain restraints and conditions causes all constraints to be met. Thus, it avoids all non-temporal anomalies.

[goto summary]

SIXTH NORMAL FORM

A table is in sixth normal form (6NF) if and only if it satisfies no non-trivial join dependencies at all, meaning that the fifth normal form is also satisfied. The sixth normal form was only defined when extending the relational model to take into account the temporal dimension. Most SQL technologies, as of 2005, do not take into account this work, and most temporal extensions to SQL are not relational. See work by Date, Darwen and Lorentzos for a relational temporal extension, Zimyani for further discussion on Temporal Aggregation in SQL, or TSQL2 for a non-relational approach.

[goto summary]

OPTIMAL NORMAL FORM

At this point, we have done all we can with Entity-Relationship Diagrams (ERD). However, Object Role Modeling (ORM) can display relationships that cannot be expressed in ERD. Therefore there are more normal forms beyond 5th. With Optimal Normal Form (OMF) It is defined as a model limited to only simple (elemental) facts, as expressed in ORM.

[goto summary]