refer to :
http://stackoverflow.com/questions/663040/query-several-eav-attributes-in-separate-columns
http://structureddata.org/2009/03/19/the-impact-of-good-table-and-query-design/
refer to :http://www.sqlservercentral.com/articles/Database+Design/62386/
Before going deep into examples of this type of approach, let us look at the cons and pros of this approach towards data modelling. The lists are not confined to the one I have pointed but there could be more that you could add. I have listed here the most obvious once.
1. Advantages
- Very flexible to add an attribute that you don't know in advance without redesigning your database structure
- Can be a good idea to collect the attributes of unknown data . For example, it is difficult to know the attributes of some complex research data and using this methodology would help to assemble this attributes that can feed into proper data modelling.
- Database level complexity is simplified when inserting data into the table. The number of procedures to insert into the above table is only one or at most two if you are going to partition your table to increase performance.
2. Disadvantages
Difficult to query and transform the data into meaningful information. To display a data in
tabular format of meaningful way, your query may involve many case statements, sub-queries, self joins and etc. This will impact on performance of your query specially when the number of rows are getting bigger
- Unless the table is partitioned it can grow fast and querying the table would take longer time.
- It is not possible to enforce business rule constraints and default values for an attribute as attributes are modelled as being data.
- Difficult to the integrity of dat a as again as attributes are modelled as being data.
- Could result in changing the value column to text data type, as the value of name value pair can be significantly different ranging from bit to text data type. This will result in ineffective storage design
- Effective index design strategy is difficult. Indexing of the value column is could be very difficult as it can result in wide index and can impact inserting and updating.
So, looking at the above cons and pro's of using EAV towards data modeling, I would like to conclude that there might be occasions that you probably be better using this approach but for data model where most of the attributes are known and is tailored to do specific thing which most databases are for, it is better not even to consider this approach.
Example:
Table 1 : Attribute table
CREATE TABLE EAV (
subscriber_id INT NOT NULL DEFAULT '0',
attribute_id CHAR(62) NOT NULL DEFAULT '',
attribute_value CHAR(62) NOT NULL DEFAULT '',
PRIMARY KEY (subscriber_id,attribute_id)
)
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (4,'color','blue');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (4,'garment','shirt');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (5,'color','blue');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (5,'size','xl');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (5,'garment','shirt');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (6,'color','red');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (6,'garment','shirt');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (1,'color','red')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (1,'size','xl')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (1,'garment','shirt')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (2,'color','red')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (2,'size','xl')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (2,'garment','pants')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (3,'garment','pants')
Table 2 : Main Table
CREATE TABLE #TEMP_Table
(
subscriber_id INT
)
INSERT INTO #TEMP_Table VALUES (1)
INSERT INTO #TEMP_Table VALUES (2)
INSERT INTO #TEMP_Table VALUES (3)
INSERT INTO #TEMP_Table VALUES (4)
INSERT INTO #TEMP_Table VALUES (5)
INSERT INTO #TEMP_Table VALUES (6)
Question :Find out all the 'subscriber_id' with 'red' color and 'xl' size from main table - #TEMP_Table
Solution 1 : create a helper table (Query Cost : 13%)
CREATE TABLE CRITERIA (
attribute_id CHAR(62) NOT NULL DEFAULT '',
attribute_value CHAR(62) NOT NULL DEFAULT '' )
INSERT INTO CRITERIA (attribute_id, attribute_value) VALUES ('color', 'red')
INSERT INTO CRITERIA (attribute_id, attribute_value) VALUES ('size', 'xl')
SELECT subscriber_id
FROM #TEMP_Table
WHERE subscriber_id IN (
SELECT E.subscriber_id
FROM EAV AS E
RIGHT JOIN CRITERIA AS CR ON (E.attribute_id = CR.attribute_id AND E.attribute_value =CR.attribute_value)
GROUP BY E.subscriber_id
HAVING COUNT(E.subscriber_id)=(SELECT COUNT(attribute_id) FROM CRITERIA)
)
subscriber_id
-------------
1
2
(2 row(s) affected)
Solution 2 : Using 'group by '+ 'Having' (Query Cost : 16%)
SELECT t.subscriber_id
FROM #TEMP_Table t
INNER JOIN EAV e ON t.subscriber_id=e.subscriber_id
WHERE (attribute_id='color' and attribute_value='red')
OR (attribute_id='size' and attribute_value='xl')
group by t.subscriber_id
HAVING count(t.subscriber_id)=2
subscriber_id
-------------
1
2
(2 row(s) affected)
Solution 3 : PIVOT (Query Cost : 20%)
SELECT tt.subscriber_id
FROM EAV
PIVOT(
MAX(attribute_value) FOR [attribute_id] in ([color],[garment],[size])
) AS E
INNER JOIN #TEMP_Table tt ON e.subscriber_id=tt.subscriber_id
WHERE [color]='red' and [size]='xl'
subscriber_id
-------------
1
2
(2 row(s) affected)
Solution 4 : using 'exists' (Query Cost : 8%)
SELECT subscriber_id
FROM
#TEMP_Table E
WHERE
EXISTS (SELECT 1 FROM EAV EE WHERE E.subscriber_id=EE.subscriber_id AND EE.[attribute_id]='color' AND EE.attribute_value='red' )
AND
EXISTS (SELECT 1 FROM EAV EE WHERE E.subscriber_id=EE.subscriber_id AND EE.[attribute_id]='size' AND EE.attribute_value='xl' )
subscriber_id
-------------
1
2
(2 row(s) affected)
Solution 5 : self - join (Query Cost : 23%)
SELECT E.subscriber_id
FROM #TEMP_Table E
INNER JOIN EAV E_1 ON E.subscriber_id=E_1.subscriber_id AND E_1.[attribute_id]='color'
INNER JOIN EAV E_2 ON E.subscriber_id=E_2.subscriber_id AND E_2.[attribute_id]='size'
WHERE E_1.attribute_value='red' and E_2.attribute_value='xl'
subscriber_id
-------------
1
2
(2 row(s) affected)
Solution 6 : using sum (Query Cost : 20%)
SELECT EE.subscriber_id
FROM
(
SELECT subscriber_id,
SUM(CASE WHEN [attribute_id]='color' and [attribute_value]='red' THEN 1 ELSE 0 END)+
SUM(CASE WHEN [attribute_id]='size' and [attribute_value]='xl' THEN 1 ELSE 0 END) AS [COUNT]
FROM EAV
GROUP BY subscriber_id
) E INNER JOIN #TEMP_Table EE ON E.subscriber_id= EE.subscriber_id
WHERE E.[COUNT]=2
subscriber_id
-------------
1
2
(2 row(s) affected)
Solution 7 : using sum
alternative way - XML datatype ( from SQL SERVER 2005)
In short, those '(Query Cost : XX% ) ' come from SQL estimated execute plan.
'Exists' solution - cost the least
'Self Join' solution - cost the most
Solution 8 : Excellent - Partitioned View (Multiple table with different type value)
http://www.jefflei.com/post/159.html
http://blog.csdn.net/envykok/archive/2010/08/03/5783983.aspx
--creat multiple tables
CREATE TABLE varchar _values (
subscriber_id INT ,
attribute_id INT CHECK (attribute_id=1),
attribute_value VARCHAR(255)
);
CREATE TABLE integer _values (
subscriber_id INT ,
attribute_id INT CHECK (attribute_id=2),
attribute_value INT
);
CREATE TABLE double _values (
subscriber_id INT ,
attribute_id INT CHECK (attribute_id=3),
attribute_value Floart
);
CREATE TABLE date _values (
subscriber_id INT ,
attribute_id INT CHECK (attribute_id=4),
attribute_value Datetime
);
CREATE TABLE text _values (
subscriber_id INT ,
attribute_id INT CHECK (attribute_id=5),
attribute_value VARCHAR(MAX)
);
--ADD Index in each table
--ignore here....
--creat partitioned view
CREATE VIEW [dbo].[Test_Partitioned]
AS
SELECT * FROM [ varchar_values]
UNION ALL
SELECT * FROM [ int_values]
UNION ALL
SELECT * FROM [ double_values]
UNION ALL
SELECT * FROM [ date_values]
UNION ALL
SELECT * FROM [ text_values]
GO
--select from partitioned view
SELECT *
FROM Test_Partitioned
WHERE Attribute_id=2 and attribute_value>2