1NF (First Normal Form) Rule
It has two very distinct rules:
- There must not be duplicate columns within a row of a table.
- There must not be more than one value for any column in a table.
It has
been my observation that a lot of us lack the basic skills required to model
the simplest of database objects. In my
years of data design there is one over-glaring mistake that a lot of
"newbie" data modelers and database administrators make that cause
enormous amounts of problems for themselves and others in the development
chain. It is their inability to conform to the very basics of data modeling
principles. Of those principles, adhering to 1NF is probably the most broken
rule that I have seen.
When the model does not follow 1NF, queries are hard to
develop and producing usable data from the model is even harder. Typically, the
developer will have to put some unusual logic, functions or application code
around or within queries just to make the data manageable.
This
article will walk you through these common mistakes and how you might begin to
start working with those mistakes, if you find yourself in this unfortunate
predicament.
Mistake 1: creating another column for a
relationship
The first
form of breaking 1NF is the creation of another column in a table, which really
just duplicates a relationship to a primary key. For this example, let's assume
that a distributor wants to keep a relationship for all of the cities in each
of the states that they have a presence. Our junior modeler quickly creates a
CITY table with two columns, one to hold the state value and the other to hold
the city.
CITY
|
STATE
|
DENVER
|
COLORADO
|
This
table is modeled quite well but because of the modelers' inexperience and
because he has just learned that there are four other cities in Colorado that
are relevant, he quickly adds four more columns to his CITY table. The table
now looks like the following:
CITY
|
STATE
|
CITY2
|
CITY3
|
CITY4
|
CITY5
|
DENVER
|
COLORADO
|
BOULDER
|
ASPEN
|
MONTROSE
|
PUEBLO
|
DETROIT
|
MICHIGAN
|
LANSING
|
|
|
|
DALLAS
|
TEXAS
|
HOUSTON
|
PARIS
|
|
|
Now you
may say to yourself that this isn't so bad because we will only ever have 5
cities in the state of Colorado and therefore putting all of these for quick
access in one table is quite efficient. While you are correct to some extent
the real problem comes into play when you wish to extract the information.
One of
the first questions you will get is to list all the cities in each of the
states where the company has a distributor. You could
easily issue the following SQL in Listing 1. This works just fine but
when you try and put the result set into an application or a developer requests
that the information be put into a single column output so they do not have to
traverse the list of cities, the real problems begin. This request equates to
producing output where one city is represented on one line with the state to
which it belongs.
To do
this there, the SQL gets a bit interesting and quickly cumbersome to work with.
Listing 2 gives just such an example where we must union together a
select statement for each iteration of the cities contained in the row. The
reason for the NOT NULL is because we do not want to return an empty row where
there is not a city for the individual iteration. In addition, for small tables
this is not quite a big deal, but this query will produce multiple table scans
and if there are more than a few rows, you will have a performance problem.
Just as
you start to pat yourself on the back, the next question you will be asked is
if you can provide a query to determine if the company has a distributor in a
particular city. Listing 3 gives you a solution to this question.
Remember we must build on the prior example since typically, this will be
coming from a developer and they would like the output in a single column
output. This query can quickly be tailored to answer the question of in which
cities in a particular state do we have distributors, or how many cities in a
particular state do we have a distributor. This last question is answered in Listing
4.
Listing
1. Simple method to display
cities for states
SQL>
Select state, city, city2, city3, city4, city5 from CITY;
STATE CITY
CITY2 CITY3 CITY4
CITY5
----------
---------- ---------- ---------- ---------- ----------
TEXAS DALLAS
HOUSTON PARIS
COLORADO DENVER
BOULDER ASPEN MONTROSE
PUEBLO
MICHIGAN DETROIT
LANSING
Listing
2. Display cities for states in
a true single column output
SQL>
select state, city city from CITY where city is not null union
select state,city2 city from CITY where
city2 is not null union
select state,city3 city from CITY where
city3 is not null union
select state,city4 city from CITY where
city4 is not null union
select state,city5 city from CITY where
city5 is not null
order by state;
STATE CITY
----------
----------
COLORADO ASPEN
COLORADO BOULDER
COLORADO DENVER
COLORADO MONTROSE
COLORADO PUEBLO
MICHIGAN DETROIT
MICHIGAN LANSING
TEXAS DALLAS
TEXAS HOUSTON
TEXAS PARIS
Listing
3. Do we have a distributor in
Denver
SQL>
select state,city from (
select state,city city from CITY where city is not null union
select state,city2 city from CITY where
city2 is not null union
select state,city3 city from CITY where
city3 is not null union
select state,city4 city from CITY where
city4 is not null union
select state,city5 city from CITY where city5
is not null)
where city = 'DENVER';
STATE CITY
----------
----------
COLORADO DENVER
Listing
4. How many distributors do we
have in each state
SQL>
select state, count(*) Distributor_Count from (
select state, city city from CITY where city is not null union
select state,city2 city from CITY where
city2 is not null union
select state,city3 city from CITY where
city3 is not null union
select state,city4 city from CITY where city4
is not null union
select state,city5 city from CITY where
city5 is not null)
group by state;
STATE DISTRIBUTOR_COUNT
----------
-----------------
COLORADO 5
MICHIGAN 2
TEXAS 3
Mistake 2: internal array element
This
second form of breaking 1NF takes the form of creating a column that contains
many values of an attribute. Taking from our CITY table example, this mistake
makes our CITY table look like the one in Listing 5.
This is a
much worse form of breaking 1NF than our first mistake. It has the added
difficulty of no boundaries for columns in the city except for the comma
separator. It is prone to errors in data entry around the comma-separated
fields. Sometimes we will encounter spaces before or after the comma that can
play havoc on any parsing that we may want to do.
Listing 5. City table with internal array structure
STATE
|
CITY
|
COLORADO
|
DENVER,BOULDER,ASPEN,MONTROSE,PUEBLO
|
MICHIGAN
|
DETROIT,LANSING
|
TEXAS
|
DALLAS,HOUSTON,PARIS
|
Since
there are no columns and we are not quite sure of the number of cities that any
one column may or may not have, our only solution to this problem is to create
some logic (application) to extract the information from this column.
In Listing
6, there is code to make this mistake behave just like a normal table. In
order to accomplish this task we need to do the following.
- Rename the CITY table. This is done since we are going to want to keep access to this data through the same object name of CITY.
- Create a few abstract data types for manipulating our output.
- Create a function that will be called when requesting information from the CITY table
- Create a view called CITY.
- Now you can just select directly from the CITY view.
Listing
6. Making an internal array listing
behave like normal rows
RENAME
city TO city_tb
/
CREATE
TYPE CITY_TY AS OBJECT
(STATE
CHAR(10),
CITY
VARCHAR2(100))
/
CREATE
TYPE CITY_TY_TB AS TABLE OF CITY_TY
/
CREATE OR
REPLACE FUNCTION CITY_FC
RETURN CITY_TY_TB PIPELINED IS
PRAGMA AUTONOMOUS_TRANSACTION;
TYPE ref0 IS REF CURSOR;
cur0 ref0;
out_rec city_ty
:= city_ty(NULL,NULL);
vcity VARCHAR2(100);
vstartpos
NUMBER;
vendpos NUMBER;
vlastpos NUMBER;
BEGIN
OPEN cur0
FOR 'select state,city,instr(city,'','',1),instr(city,'','',-1) from city_tb';
LOOP
vstartpos := 1;
FETCH cur0 INTO out_rec.state, vcity,
vendpos, vlastpos;
EXIT WHEN cur0%NOTFOUND;
IF vlastpos = 0 THEN
out_rec.city := vcity;
PIPE ROW(out_rec);
END IF;
LOOP
EXIT WHEN vlastpos = 0;
select instr(vcity,',',vstartpos) into
vendpos from dual;
IF vendpos = vlastpos THEN
out_rec.city :=
substr(vcity,vstartpos,vendpos-vstartpos);
PIPE ROW(out_rec);
out_rec.city := substr(vcity,vlastpos+1);
PIPE ROW(out_rec);
EXIT;
END IF;
out_rec.city :=
substr(vcity,vstartpos,vendpos-vstartpos);
PIPE ROW(out_rec);
vstartpos := vendpos+1;
END LOOP;
END LOOP;
CLOSE
cur0;
RETURN;
END
CITY_FC;
/
CREATE OR
REPLACE VIEW CITY AS
SELECT a.state,a.city
FROM TABLE(CITY_FC) a
/
The
breaking of 1NF can and does cause havoc in all of our lives. Until you have
the time and resources to fix the problem within the physical model, you must
provide solutions that are easy for everyone to use.
My
suggestion to you is to come up with a solution that will work and allow you to
fix the real problem with the least amount of impact on your user community.
If you
take an approach to put a view in place of the real table, let your user
community select from that view, you will be able to hide the logic behind the
scenes and hopefully one day fix the real problem and allow yourself to get rid
of the view and underlying logic.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.