summarize Normalisation.

 
Normalization
What is Normalisation?
Normalisation is the process by which a flat file (list) of data may be converted into a set of well-structured relations. A well-structured relation is one that contains the minimum amount of data redundancy, and allows users to insert, delete and modify rows in a table without producing anomalies, errors or inconsistencies. The principles of normalisation were first defined by E F Codd in 1970 and have since been defined further by such academics as Fagin who provided a fourth and fifth normal form.

Why Normalise?
Often, when designing a database, an analyst/programmer is provided with a form or spreadsheet that is currently used by an organisation. In such flat list type information there are hidden problems that prevent a straight transformation (i.e. 1 document does not mean 1 table). Normalisation provides a framework of rules to ensure that the transformation of data from such sources is carried out systematically and removes any possible anomalies. These anomalies if left uncorrected may work their way into a final system and at a later time (maintenance) require a larger amount of effort to correct.

Data Anomalies
During the life of a database there are three anomalies that signify flaws in the underlying data-models these are:

Insertion Anomalies
An insertion anomaly occurs where to add a new record into a database requires duplication of data that already exists.

Deletion Anomalies
A deletion anomaly occurs when the deletion of a single piece of data results in a loss of valid data on the same row

Update Anomalies
An update anomaly occurs when the same information is stored multiply in the one table and thus any update to that information requires multiple changes.
 
Keys
Before discussing the process of normalisation it is necessary to understand the role of keys in the normalisation process. The identification of the correct keys is almost the sole purpose of normalisation; as if the keys are identified correctly then there is no chance of any of the three previously mentioned side effects occurring.
The term "key" refers to columns in any data set used to:
1.  Uniquely identify a row in a given data set, known as a Primary Key.
2.              Link two columns in two or more separate data sets together, known as a
Foreign Key.
Primary Keys
As mentioned primary keys are columns in the data set used to uniquely identify a row and are usually denoted by underlining the name of any primary key column i.e.
Thus once primary keys have been declared duplicate information is not allowed in the primary key.
The compound key identified here means that there can be no repetition in the combined values of the two columns, as may be seen from the above data set, where a value may be repeated in one (or more) of the key columns as long as the compound key in its entirety is unique.

Foreign Keys
Foreign keys are columns in a data set that link to the primary key of a related data set, unlike primary keys, foreign keys are not required to contain unique values and they exist independently of each other.
 
Normalisation: The Process
The normalisation process follows a standard series of steps, which will be outlined in the following sections, it is important to note that unless each step is carried out properly the next step will be flawed.

Gather the Un-Normalised data set
This step is often rushed, but is perhaps the most critical of the entire normalisation process. If the un-normalised data set contains errors then it is more than likely that these errors will be carried throughout the entire normalisation resulting in possible data anomalies.
While the order of the items in an un-normalised data set isn't governed by a strict rules the following principles are often applied to ensure an easier transition throughout the process:
         Items that are likely primary keys (will be discussed later) should be put to the
left of their dependant items.
         Repeating groups are usually placed to the right in an un-normalised data set.
Notice how the second data set ignores the multiple repeating groups; this answer is still valid as it is often preferable to remove/ignore nested repeating groups in un-normalised data sets where the repeating group is seen to be minor or is not fully understood. The rationale behind this will be explored later in first normal form.

Convert Un-normalised data set to First Normal Form (1NF)
Once we have identified the un-normalised data set (R1) we must convert it into first normal form (1NF). This is achieved through the removal of any repeating groups in the un-normalised data set and the identification of primary keys in any resultant data sets.

Convert 1NF to Second Normal Form (2NF)
Once an un-normalised data set has been converted into first normal form (1NF) the resulting data sets are then revised further to upgrade them to second normal form (2NF) by removing of all partial functional dependencies. Partial functional dependency is where a column is not wholly-dependant on the primary key i.e. the column is dependant on part of the primary key. Thus partial functional dependencies may only occur in data sets that have more than one primary key attribute (i.e. a compound key).

Convert 2NF to Third Normal Form (3NF)
When 2NF is reached, you must then check the resultant data sets to upgrade to third normal form (3NF). To achieve third normal form you must remove any transitive/hidden dependencies between non-key attributes. A transitive dependency exists where one or more non-key columns are more/wholly dependant on another non-key column rather than on the designated primary key(s).
It is important to note that often 3NF is achieved without having to change any of the
existing data sets. This is exemplified by the two problems we have been normalising.

Name the resultant Data-Sets
The final step in this process is to give each of the final data sets a meaningful name. While this step is not mandatory it is useful especially when we cover converting normalised data sets into E-R modelling.

Thus in the enrolment example:
R11 = (Student#, Student Name) -> Student
R121 = (Student#, Unit Code) -> Enrolment
R122 = (Unit Code, Unit Name) -> Unit
The naming of these data sets is based on the type of data they will be used to store. At first you may find it hard to think of a proper name but through experience you will begin to see trends and gain a better 'feel' for the purpose of each data set.
 
In summary:
 
1.      1NF from normalized list and removed repeating term to one of new table
2.      2NF from 1NF (Remove partial-key dependent)
a.       The table has single key, then goes straight to 2NF
b.      Remove only on the table which has multi-key
c.       find out more depth relationship into new table
3.      3NF from 2NF(Remove transitive dependent key)
a.       If only one non-key attribute in the table then the table automatically goes to 2NF
b.      Find out the stronger link in the table, then remove out.
c.       Find out that is more dependent relationship in the table.
 
 
 
 
 
 
 
 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值