我的目标是为每条PID记录选择两条出现在同一记录上的test_sname值'want'和'want2'的记录entry_date。我对前entry_dates两者都做这件事test_snames。
这是我为实现此目的的查询:
queryBuilder =
"""select PID, test_sname, test_value, units, ref_range, entry_date from labs
where PID=%s and (test_sname='want' or test_sname='want2') and entry_date in
(select entry_date from labs where PID=%s and test_sname in ('want', 'want2')
group by entry_date having count(*) = 2)
order by entry_date limit 10;""" % (pid, pid)
当entry_date只有两行包含test_sname'want'或'want2'时,它可以按预期工作。
PID |test_sname |test_value |units |entry_date
10000000 | want | 343 | U/L | 2008-01-01 01:01:01
10000000 | want2 | 984.34 | | 2008-01-01 01:01:01
10000000 | NA1 | 56 | % | 2008-01-01 01:01:01
10000000 | NA2 | 420 | mg/dL | 2008-01-01 01:01:01
10000000 | NA2 | 420 | mg/dL | 2008-01-02 01:01:01
10000000 | want | 343 | U/L | 2008-01-02 01:01:01
10000000 | want2 | 984.34 | | 2008-01-02 01:01:01
10000000 | NA1 | 26 | % | 2008-01-02 01:01:01
10000000 | NA2 | 410 | mg/dL | 2008-01-02 01:01:01
10000000 | NA2 | 455 | mg/dL | 2008-01-02 01:01:01
查询结果(正确):
PID |test_sname |test_value |units |entry_date
10000000 | want | 343 | U/L | 2008-01-01 01:01:01
10000000 | want2 | 984.34 | | 2008-01-01 01:01:01
10000000 | want | 343 | U/L | 2008-01-02 01:01:01
10000000 | want2 | 984.34 | | 2008-01-02 01:01:01
例如,当在同一entry_date上来自test_sname'want'的多个行时,就会出现问题,因为having count(*) = 2不再有效。像这样的数据没有结果。
PID |test_sname |test_value |units |entry_date
11111111 | want | 343 | U/L | 2009-10-26 07:25:00
11111111 | want2 | 984.34 | | 2009-10-26 07:25:00
11111111 | want | 189 | U/L | 2009-10-26 07:25:00
11111111 | NA1 | 50 | % | 2009-10-26 07:25:00
11111111 | NA2 | 40 | mg/dL | 2009-10-26 07:25:00
11111111 | NA3 | 84.55 | | 2009-10-26 07:25:00
11111111 | NA4 | 4.5 | thou/uL | 2009-10-26 07:25:00
11111111 | NA5 | 14.6 | g/dL | 2009-10-26 07:25:00
11111111 | NA6 | 0.96 | mg/dL | 2009-10-26 07:25:00
11111111 | want | 343 | U/L | 2009-10-30 07:25:00
11111111 | want2 | 984.34 | | 2009-10-30 07:25:00
11111111 | want | 189 | U/L | 2009-10-30 07:25:00
11111111 | NA1 | 6 | % | 2009-10-30 07:25:00
11111111 | NA2 | 40 | mg/dL | 2009-10-30 07:25:00
11111111 | NA3 | 84.55 | | 2009-10-30 07:25:00
11111111 | NA4 | 4.5 | thou/uL | 2009-10-30 07:25:00
11111111 | NA5 | 14.6 | g/dL | 2009-10-30 07:25:00
11111111 | NA6 | 0.96 | mg/dL | 2009-10-30 07:25:00
作为一种限制,我尝试limit 2在子查询中放入a(我知道它本身无法解决问题),但是它给出了此错误,并且我认为我拥有SQL的最新版本,因此显然我无法limit在子查询中使用。
This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
我意识到有多种方法可以解决此问题-我可以选择所有值,然后以编程方式获取Python所需的内容,但是我正在寻找使用Python mySQL-connector编写的mySQL查询解决方案。我不会抱怨python解决方案。
我正在将python v3.4.4与mySQL-connector v2.1.3和MySQL服务器v5.7.11一起使用
谢谢你的时间!
解决方案
考虑通过子查询使用分组的运行计数。然后,在RowNo为1或2的位置进行过滤。这样,您将无需传递参数,因为将处理所有PID。下面假设labs表具有唯一的标识符ID:
SELECT *
FROM
(SELECT PID, test_sname, test_value, units, ref_range, entry_date,
(SELECT count(*) FROM labs sub
WHERE sub.test_sname in ('want', 'want2')
AND sub.PID = labs.PID
AND sub.entry_date = labs.entry_date
AND sub.ID <= labs.ID) As RowNo
FROM labs
WHERE test_sname in ('want', 'want2')
) As dT
WHERE dT.RowNo <= 2
# PID test_sname test_value units ref_range entry_date RowNo
# 10000000 want 33 U/L 4-40 2008-01-01 01:01:01 1
# 10000000 want2 98.34 2008-01-01 01:01:01 2
# 10000000 want 33 U/L 4-40 2008-01-02 01:01:01 1
# 10000000 want2 98.34 2008-01-02 01:01:01 2
# 11111111 want 33 U/L Apr-40 2009-10-26 07:25:00 1
# 11111111 want2 98.34 2009-10-26 07:25:00 2
# 11111111 want 33 U/L Apr-40 2009-10-30 07:25:00 1
# 11111111 want2 98.34 2009-10-30 07:25:00 2