最近处理大数据量表头疼了一阵子,表 Trade 600w,仅仅是3个月的数据,以后还要暴涨,还没到3个月,数据查询已经出了问题 ! 虽然对栏位建立了索引index,但是无条件查询时, count(*) 都要花费1分48秒,实在是接受不了。
无意中看到一个朋友的QQ空间转了一个帖子,说是大数据量处理要注意的问题,第一条就提到了 分区视图(看来我对数据库大数据量的知识还要仔细研读研读),所以决定google 分区视图。结果到了MSDN的分区视图,开始啃帮助文档。看着看着怎么又提到分区表,我xxxx ,神马情况, 咋有跳到了分区表。 虽然看懂了分区视图,这分区表,又是啥东东。再search the fuck goolge '分区表 分区视图', 看到了别人的帖子:http://blog.csdn.net/heavstar/article/details/6117346。 看来各有优缺,但是基于我的实际情况,还是决定采用分区表,来实验下。
1. 分区表:参考,分区表 MSDN帮助 http://msdn.microsoft.com/zh-cn/library/ms188706(v=SQL.90).aspx ,又安装了AdventureWorks 示例数据库 和 Samples. 改成如下:
USE [ UDP_SYN_TRADE ]
GO
-- Range partition table [TradeRangePF] 订单分页方式
CREATE PARTITION FUNCTION [ TradeRangePF ] ( datetime )
AS RANGE RIGHT FOR VALUES ( ' 2011-05-1 ' , ' 2011-06-1 ' ,
' 2011-07-1 ' , ' 2011-08-1 ' , ' 2011-09-1 ' , ' 2011-10-1 ' ,
' 2011-11-1 ' , ' 2011-12-1 ' , ' 2012-1-1 ' , ' 2012-2-1 ' , ' 2012-3-1 ' );
GO
-- Range partition table [TradeRangePF] 订单分页文件组
CREATE PARTITION SCHEME [ TradePS ]
AS PARTITION [ TradeRangePF ]
TO ( [ PRIMARY ] , [ PRIMARY ] , [ PRIMARY ]
, [ PRIMARY ] , [ PRIMARY ] , [ PRIMARY ]
, [ PRIMARY ] , [ PRIMARY ] , [ PRIMARY ]
, [ PRIMARY ] , [ PRIMARY ] , [ PRIMARY ] , [ PRIMARY ] );
GO
GO
CREATE TABLE [ dbo ] . [ Trade ] (
[ id ] [ bigint ] IDENTITY ( 1 , 1 ) NOT NULL ,
[ tid ] [ bigint ] NOT NULL ,
[ seller_nick ] [ nvarchar ] ( 200 ) NOT NULL ,
[ buyer_nick ] [ nvarchar ] ( 200 ) NOT NULL ,
[ created ] [ datetime ] NOT NULL CONSTRAINT [ DF_Trade_created ] DEFAULT ( GETDATE ()),
[ seller_rate ] [ bit ] NULL ,
[ buyer_rate ] [ bit ] NULL ,
[ status ] [ nvarchar ] ( 50 ) NULL ,
[ payment ] [ nvarchar ] ( 50 ) NULL ,
[ discount_fee ] [ nvarchar ] ( 50 ) NULL ,
[ adjust_fee ] [ nvarchar ] ( 50 ) NULL ,
[ total_fee ] [ nvarchar ] ( 50 ) NULL ,
[ post_fee ] [ nvarchar ] ( 50 ) NULL ,
[ pay_time ] [ datetime ] NULL ,
[ end_time ] [ datetime ] NULL ,
[ modified ] [ datetime ] NULL ,
[ consign_time ] [ datetime ] NULL ,
[ buyer_obtain_point_fee ] [ bigint ] NULL ,
[ point_fee ] [ bigint ] NULL ,
[ real_point_fee ] [ bigint ] NULL ,
[ received_payment ] [ nvarchar ] ( 50 ) NULL ,
[ commission_fee ] [ nvarchar ] ( 50 ) NULL ,
[ title ] [ nvarchar ] ( 200 ) NULL ,
[ type ] [ nvarchar ] ( 50 ) NULL ,
[ pic_path ] [ nvarchar ] ( 200 ) NULL ,
[ num_iid ] [ bigint ] NULL ,
[ price ] [ nvarchar ] ( 50 ) NULL ,
[ num ] [ bigint ] NULL ,
[ cod_fee ] [ nvarchar ] ( 50 ) NULL ,
[ cod_status ] [ nvarchar ] ( 50 ) NULL ,
[ shipping_type ] [ nvarchar ] ( 50 ) NULL ,
[ receiver_name ] [ nvarchar ] ( 50 ) NULL ,
[ receiver_state ] [ nvarchar ] ( 50 ) NULL ,
[ receiver_city ] [ nvarchar ] ( 50 ) NULL ,
[ receiver_district ] [ nvarchar ] ( 50 ) NULL ,
[ receiver_address ] [ nvarchar ] ( 200 ) NULL ,
[ receiver_zip ] [ nvarchar ] ( 50 ) NULL ,
[ receiver_mobile ] [ nvarchar ] ( 50 ) NULL ,
[ receiver_phone ] [ nvarchar ] ( 50 ) NULL ,
[ last_syntime ] [ datetime ] NULL
-- ,CONSTRAINT [PK_Trade] PRIMARY KEY ([id],[created])
) ON [ TradePS ] ( [ created ] );
-- 导入数据
/*
步骤省略
*/
-- 创建聚集索引分区 ??? 是否需要为 created 建立索引
ALTER TABLE dbo. [ Trade ] WITH CHECK ADD
CONSTRAINT [ PK_Trade_ID ] PRIMARY KEY CLUSTERED
(
[ id ] , [ created ]
) ON [ TradePS ] ( [ created ] );
GO
-- 创建seller_nick,buyer_nick索引分区
CREATE INDEX [ IX_Trade_seller_buyer ]
ON [ Trade ] (seller_nick,buyer_nick)
ON [ TradePS ] ( [ created ] );
-- 创建buyer_nick索引分区
CREATE INDEX [ IX_Trade_buyer ]
ON [ Trade ] (buyer_nick)
ON [ TradePS ] ( [ created ] );
-- 创建tid索引分区
CREATE INDEX [ IX_Trade_tid ]
ON [ Trade ] (tid)
ON [ TradePS ] ( [ created ] );
--
我靠,这效果还真立杆见影,select count(*) from Trade 才用了 1秒。
2. 分页查询:
1)用原来的分页查询看看,第4页,fuck 10秒钟还没出来,代码如下:
---------rownumber 分页代码---------
select row_number() over ( order by id) as rownumber, *
from dbo. [ Trade ] )
as [ Trade ] where rownumber > 100
-- -------rownumber 分页代码结束---------
-- -查询了10秒,没出来受不鸟了----------------
2) 难道是row_number对全表进行了,编码查询,这怎么行呢.... 改一下,自己慢慢写top 吧,呀,还要10秒,看了逻辑读22133 次数,我晕了.
( select * from Trade where id in
(
select top 50 id from (select top 1000 id,created from Trade order by created )
as TBTemp order by created desc
)
) as TBTemp2 order by created
--(50 行受影响)1000
--表 'Trade'。扫描计数 612,逻辑读取 22133 次,物理读取 36 次,预读 81 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
3) 再试试表变量看看,查询前1000条,1秒,哈哈,我笑了,哇哈哈..............查询500w条以后,10秒,勉强可以接受了
insert @tb
select top 50 id -- 页码
from ( select top 1000 id,created from Trade order by created ) as TBTemp -- 结束页码
order by created desc
select Trade. * from Trade inner join @tb as t on Trade.id = t.id order by created
表 '#59490ABA'。扫描计数 0,逻辑读取 50 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
(50 行受影响)
(50 行受影响)
表 'Trade'。扫描计数 600,逻辑读取 723 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
表 '#59490ABA'。扫描计数 1,逻辑读取 1 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
*/
----这是为啥? 暂时不知道,待以后有时间,研究好了,再重新编辑帖子吧 。希望以后可以找到更好的优化办法。
备注:选择按照created排序的原因,如下图
select top 100 id from Trade where 1 = 1 order by created -- 逻辑读取 4535 次
select top 100 id from Trade where 1=1 order by id -- 逻辑读取 20889 次