EVA Query Solution (纵表查询)

refer to :

http://stackoverflow.com/questions/663040/query-several-eav-attributes-in-separate-columns

http://structureddata.org/2009/03/19/the-impact-of-good-table-and-query-design/

 

refer to :http://www.sqlservercentral.com/articles/Database+Design/62386/

Before going deep into examples of this type of approach, let us look at the cons and pros of this approach towards data modelling. The lists are not confined to the one I have pointed but there could be more that you could add. I have listed here the most obvious once.

1. Advantages


- Very flexible to add an attribute that you don't know in advance without redesigning your database structure
- Can be a good idea to collect the attributes of unknown data . For example, it is difficult to know the attributes of some complex research data and using this methodology would help to assemble this attributes that can feed into proper data modelling.
- Database level complexity is simplified when inserting data into the table. The number of procedures to insert into the above table is only one or at most two if you are going to partition your table to increase performance.

 

2. Disadvantages

Difficult to query and transform the data into meaningful information. To display a data in

tabular format of meaningful way, your query may involve many case statements, sub-queries, self joins and etc. This will impact on performance of your query specially when the number of rows are getting bigger

- Unless the table is partitioned it can grow fast and querying the table would take longer time.

- It is not possible to enforce business rule constraints and default values for an attribute as attributes are modelled as being data.

- Difficult to the integrity of dat a as again as attributes are modelled as being data.
- Could result in changing the value column to text data type, as the value of name value pair can be significantly different ranging from bit to text data type. This will result in ineffective storage design
- Effective index design strategy is difficult. Indexing of the value column is could be very difficult as it can result in wide index and can impact inserting and updating.

So, looking at the above cons and pro's of using EAV towards data modeling, I would like to conclude that there might be occasions that you probably be better using this approach but for data model where most of the attributes are known and is tailored to do specific thing which most databases are for, it is better not even to consider this approach.

 

Example:


Table 1 : Attribute table

CREATE TABLE EAV (
subscriber_id INT NOT NULL DEFAULT '0',
attribute_id CHAR(62) NOT NULL DEFAULT '',
attribute_value CHAR(62) NOT NULL DEFAULT '',
PRIMARY KEY (subscriber_id,attribute_id)
)
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (4,'color','blue');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (4,'garment','shirt');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (5,'color','blue');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (5,'size','xl');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (5,'garment','shirt');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (6,'color','red');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (6,'garment','shirt');
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (1,'color','red')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (1,'size','xl')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (1,'garment','shirt')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (2,'color','red')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (2,'size','xl')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (2,'garment','pants')
INSERT INTO EAV (subscriber_id, attribute_id, attribute_value) VALUES (3,'garment','pants')


Table 2 : Main Table

CREATE TABLE #TEMP_Table
(
   subscriber_id INT
)
INSERT INTO #TEMP_Table VALUES (1)
INSERT INTO #TEMP_Table VALUES (2)
INSERT INTO #TEMP_Table VALUES (3)
INSERT INTO #TEMP_Table VALUES (4)
INSERT INTO #TEMP_Table VALUES (5)
INSERT INTO #TEMP_Table VALUES (6)


Question :Find out all the 'subscriber_id' with 'red' color and 'xl' size from main table - #TEMP_Table



Solution 1 : create a helper table (Query Cost : 13%)

 

CREATE TABLE CRITERIA (
attribute_id CHAR(62) NOT NULL DEFAULT '',
attribute_value CHAR(62) NOT NULL DEFAULT '' )

INSERT INTO CRITERIA (attribute_id, attribute_value) VALUES ('color', 'red')
INSERT INTO CRITERIA (attribute_id, attribute_value) VALUES ('size', 'xl')


SELECT subscriber_id
FROM #TEMP_Table
WHERE subscriber_id IN (
SELECT E.subscriber_id
FROM EAV AS E
RIGHT JOIN CRITERIA AS CR ON (E.attribute_id = CR.attribute_id AND E.attribute_value =CR.attribute_value)
GROUP BY E.subscriber_id
HAVING COUNT(E.subscriber_id)=(SELECT COUNT(attribute_id) FROM CRITERIA)
)


subscriber_id
-------------
1
2

(2 row(s) affected)

 

Solution 2 : Using 'group by '+ 'Having' (Query Cost : 16%)

 

 

SELECT t.subscriber_id
FROM #TEMP_Table t
INNER JOIN EAV e ON t.subscriber_id=e.subscriber_id
WHERE (attribute_id='color' and attribute_value='red')
OR (attribute_id='size' and attribute_value='xl')
group by t.subscriber_id
HAVING count(t.subscriber_id)=2


subscriber_id
-------------
1
2

(2 row(s) affected)

 

Solution 3 : PIVOT (Query Cost : 20%)

 

SELECT tt.subscriber_id
FROM EAV
PIVOT(
MAX(attribute_value) FOR [attribute_id] in ([color],[garment],[size])
) AS E
INNER JOIN #TEMP_Table tt ON e.subscriber_id=tt.subscriber_id
WHERE [color]='red' and [size]='xl'

 

subscriber_id
-------------
1
2

(2 row(s) affected)

 

Solution 4 : using 'exists' (Query Cost : 8%)

 


SELECT subscriber_id
FROM
#TEMP_Table E
WHERE
EXISTS (SELECT 1 FROM EAV EE WHERE E.subscriber_id=EE.subscriber_id AND EE.[attribute_id]='color' AND EE.attribute_value='red' )
AND
EXISTS (SELECT 1 FROM EAV EE WHERE E.subscriber_id=EE.subscriber_id AND EE.[attribute_id]='size' AND EE.attribute_value='xl' )

 

subscriber_id
-------------
1
2

(2 row(s) affected)

 

Solution 5 : self - join (Query Cost : 23%)

 

SELECT E.subscriber_id
FROM #TEMP_Table E
INNER JOIN EAV E_1 ON E.subscriber_id=E_1.subscriber_id AND E_1.[attribute_id]='color'
INNER JOIN EAV E_2 ON E.subscriber_id=E_2.subscriber_id AND E_2.[attribute_id]='size'
WHERE E_1.attribute_value='red' and E_2.attribute_value='xl'

 

subscriber_id
-------------
1
2

(2 row(s) affected)

 

Solution 6 : using sum (Query Cost : 20%)

 

SELECT EE.subscriber_id
FROM
(
SELECT subscriber_id,
SUM(CASE WHEN [attribute_id]='color' and [attribute_value]='red' THEN 1 ELSE 0 END)+
SUM(CASE WHEN [attribute_id]='size' and [attribute_value]='xl' THEN 1 ELSE 0 END) AS [COUNT]
FROM EAV
GROUP BY subscriber_id
) E INNER JOIN #TEMP_Table EE ON E.subscriber_id= EE.subscriber_id
WHERE E.[COUNT]=2

 

subscriber_id
-------------
1
2

(2 row(s) affected)

 

Solution 7 : using sum


alternative way - XML datatype ( from SQL SERVER 2005)

 

In short, those '(Query Cost : XX% ) ' come from SQL estimated execute plan.

'Exists' solution - cost the least

'Self Join' solution - cost  the most

 

 

Solution 8 : Excellent - Partitioned View (Multiple table with different type value)

http://www.jefflei.com/post/159.html

http://blog.csdn.net/envykok/archive/2010/08/03/5783983.aspx

 

--creat multiple tables

 

CREATE TABLE varchar _values (

subscriber_id INT ,

attribute_id INT CHECK (attribute_id=1),

attribute_value VARCHAR(255) 

 

);

CREATE TABLE integer _values (

subscriber_id INT ,

attribute_id INT CHECK (attribute_id=2),

attribute_value INT

 

);

CREATE TABLE double _values (

subscriber_id INT ,

attribute_id INT CHECK (attribute_id=3),

attribute_value Floart

);

CREATE TABLE date _values (

subscriber_id INT ,

attribute_id INT CHECK (attribute_id=4),

attribute_value Datetime

);

CREATE TABLE text _values (

subscriber_id INT ,

attribute_id INT CHECK (attribute_id=5),

attribute_value VARCHAR(MAX)

);

 

 

--ADD Index in each table

 

--ignore here....

 

--creat partitioned view

 


CREATE VIEW [dbo].[Test_Partitioned]
AS
SELECT * FROM [ varchar_values]
UNION ALL

SELECT * FROM [ int_values]

UNION ALL

SELECT * FROM [ double_values]

UNION ALL

SELECT * FROM [ date_values]

UNION ALL

SELECT * FROM [ text_values]


GO

 

--select from partitioned view

 

 

SELECT *

FROM Test_Partitioned

WHERE Attribute_id=2 and attribute_value>2

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值