Thứ Tư, 7 tháng 3, 2012

Optimal Normal Form

6. Optimal Normal Form
At this point, we have done all we can with Entity-Relationship Diagrams (ERD). Most people will stop here because this is usually pretty good. However, another modeling style called Object Role Modeling (ORM) can display relationships that cannot be expressed in ERD. Therefore there are more normal forms beyond 5th. With Optimal Normal Form (OMF)
It is defined as a model limited to only simple (elemental) facts, as expressed in ORM.

7. Domain-Key Normal Form
This level of normalization is simply a model taken to the point where there are no opportunities for modification anomalies.
  • "if every constraint on the relation is a logical consequence of the definition of keys and domains"

  • Constraint "a rule governing static values of attributes"

  • Key "unique identifier of a tuple"

  • Domain "description of an attribute’s allowed values"
    1. A relation in DK/NF has no modification anomalies, and conversely.
    2. DK/NF is the ultimate normal form; there is no higher normal form related to modification anomalies
    3. Defn: A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains.
    4. Constraint is any rule governing static values of attributes that is precise enough to be ascertained whether or not it is true
    5. E.g. edit rules, intra-relation and inter-relation constraints, functional and multi-valued dependencies.
    6. Not including constraints on changes in data values or time-dependent constraints.
    7. Key - the unique identifier of a tuple.
    8. Domain: physical and a logical description of an attributes allowed values.
    9. Physical description is the format of an attribute.
    10. Logical description is a further restriction of the values the domain is allowed
    11. Logical consequence: find a constraint on keys and/or domains which, if it is enforced, means that the desired constraint is also enforced.
    12. Bottom line on DK/NF: If every table has a single theme, then all functional dependencies will be logical consequences of keys. All data value constraints can them be expressed as domain constraints.
    13. Practical consequence: Since keys are enforced by the DBMS and domains are enforced by edit checks on data input, all modification anomalies can be avoided by just these two simple measures.


  • Rules of Data Normalization

    1NFEliminate Repeating Groups - Make a separate table for each set of related attributes, and give each table a primary key.
    2NFEliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove it to a separate table.
    3NFEliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table.
    BCNFBoyce-Codd Normal Form - If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.
    4NFIsolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m relationships that are not directly related.
    5NFIsolate Semantically Related Multiple Relationships - There may be practical constrains on information that justify separating logically related many-to-many relationships.
    ONFOptimal Normal Form - a model limited to only simple (elemental) facts, as expressed in Object Role Model notation.
    DKNFDomain-Key Normal Form - a model free from all modification anomalies.
    Important Note!
    All normal forms are additive, in that if a model is in 3rd normal form, it is by definition also in 2nd and 1st.

    http://web.archive.org/web/20080805014412/http://www.datamodel.org/NormalizationRules.html#five

    Source: Datamodel.org

    Thứ Ba, 6 tháng 3, 2012

    SQL SERVER – Difference Between Candidate Keys and Primary Key

    Introduction

    Not long ago, I had an interesting and extended debate with one of my friends regarding which column should be primary key in a table. The debate instigated an in-depth discussion about candidate keys and primary keys. My present article revolves around the two types of keys.
    Let us first try to grasp the definition of the two keys.
    Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.
    Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.
    One needs to be very careful in selecting the Primary Key as an incorrect selection can adversely impact the database architect and future normalization. For a Candidate Key to qualify as a Primary Key, it should be Non-NULL and unique in any domain. I have observed quite often that Primary Keys are seldom changed. I would like to have your feedback on not changing a Primary Key.

    An Example to Understand Keys

    Let us look at an example where we have multiple Candidate Keys, from which we will select an appropriate Primary Key.
    Given below is an example of a table having three columns that can qualify as single column Candidate Key, and on combining more than one column the number of possible Candidate Keys touches seven. A point to remember here is that only one column can be selected as Primary Key. The decision of Primary Key selection from possible combinations of Candidate Key is often very perplexing but very imperative!
    On running the following script it will always give 504 rows in all the options. This proves that they are all unique in database and meet the criteria of a Primary Key.
    Run the following script to verify if all the tables have unique values or not.
    USE AdventureWorks
    GO
    SELECT *FROM Production.Product
    GO
    SELECT DISTINCT ProductIDFROM Production.Product
    GO
    SELECT DISTINCT NameFROM Production.Product
    GO
    SELECT DISTINCT ProductNumberFROM Production.Product
    GO
    All of the above queries will return the same number of records; hence, they all qualify as Candidate Keys. In other words, they are the candidates for Primary Key. There are few points to consider while turning any Candidate Key into a Primary Key.

    Select a key that does not contain NULL

    It may be possible that there are Candidate Keys that presently do not contain value (not null) but technically they can contain null. In this case, they will not qualify for Primary Key. In the following table structure, we can see that even though column [name] does not have any NULL value it does not qualify as it has the potential to contain NULL value in future.
    CREATE TABLE [Production].[Product]([ProductID] [int] IDENTITY(1,1) NOT NULL,[Name] [dbo].[Name] NULL,[ProductNumber] [nvarchar](25) NOT NULL,[Manufacturer] [nvarchar](25) NOT NULL
    )

    Select a key that is unique and does not repeat

    It may be possible that Candidate Keys that are unique at this moment may contain duplicate value. These kinds of Candidate Keys do not qualify for Primary Key. Let us understand this scenario by looking into the example given above. It is absolutely possible that two Manufacturers can create products with the same name; the resulting name will be a duplicate and only the name of the Manufacturer will differ in the table. This disqualifies Name in the table to be a Primary Key.

    Make sure that Primary Key does not keep changing

    This is not a hard and fast rule but rather a general recommendation: Primary Key values should not keep changing. It is quite convenient for a database if Primary Key is static. Primary Keys are referenced in numerous places in the database, from Index to Foreign Keys.  If they keep changing then they can adversely affect database integrity, data statistics as well as internal of Indexes.

    Selection of Primary Key

    Let us examine our case by applying the above three rules to the table and decide on the appropriate candidate for Primary Key. Name can contain NULL so it disqualifies as per Rule 1 and Rule 2. Product Number can be duplicated for different Manufacturers so it disqualifies as per Rule 2. ProductID is Identity and Identity column cannot be modified. So, in this case ProductID qualifies as Primary Key.
    Please note that many database experts suggest that it is not a good practice to make Identity Column as Primary Key. The reason behind this suggestion is that many times Identity Column that has been assigned as Primary Key does not play any role in database. There is no use of this Primary Key in both application and in T-SQL. Besides, this Primary Key may not be used in Joins. It is a known fact that when there is JOIN on Primary Key or when Primary Key is used in the WHERE condition it usually gives better performance than non primary key columns. This argument is absolutely valid and one must make sure not to use such Identity Column. However, our example presents a different case. Here, although ProductID is Identity Column it uniquely defines the row and the same column will be used as foreign key in other tables. If a key is used in any other table as foreign key it is likely that it will be used in joins.

    Quick Note on Other Kinds of Keys

    The above paragraph evokes another question – what is a foreign key? A foreign key in a database table is a key from another table that refers to the primary key in the table being used. A primary key can be referred by multiple foreign keys from other tables. It is not required for a primary key to be the reference of any foreign keys. The interesting part is that a foreign key can refer back to the same table but to a different column. This kind of foreign key is known as “self-referencing foreign key”.

    Summary

    A table can have multiple Candidate Keys that are unique as single column or combined multiple columns to the table. They are all candidates for Primary Key. Candidate keys that follow all the three rules – 1) Not Null, 2) Unique Value in Table and 3) Static – are the best candidates for Primary Key. If there are multiple candidate keys that are satisfying the criteria for Primary Key, the decision should be made by experienced DBAs who should keep performance in mind.
    Reference: Pinal Dave (http://blog.sqlauthority.com)DNS