pyDatalog: python的逻辑编程引擎(用于推理、查询等)

最新推荐文章于 2025-04-05 09:14:29 发布

z2664836046

最新推荐文章于 2025-04-05 09:14:29 发布

阅读量4.9k

点赞数 3

分类专栏： python

python 专栏收录该内容

1 篇文章

订阅专栏

一、序言

在探索"知识推理"的时候找到了pyDatalog这个工具。它借鉴了Datalog这种声明式语言，可以很方便自然地表达一些逻辑命题和数学公式，并且它是在我现在最爱的python上实现的。尝试以后，其简洁优雅的形式一下子把我吸引住了。来看一个官网上用它实现阶乘的例子：

from pyDatalog import pyDatalog
pyDatalog.create_terms('factorial, N')
factorial[N] = N*factorial[N-1] 
factorial[1] = 1 
print(factorial[3]==N)  # prints N=6

一个关于推理的例子：

# 一个小推理规则

pyDatalog.create_terms('X,Y,Z,father,fatherOf,grandfatherOf')
(grandfatherOf[X] == Z) <= ((fatherOf[X]==Y) & (fatherOf[Y]==Z))
fatherOf["乾隆"] = "雍正"
fatherOf["雍正"] = "康熙"
print(grandfatherOf["乾隆"] == X)
X 
--
康熙

二、基础教程（上）

变量和表达式

第一步是导入pyDatalog：下一步是声明我们将使用的变量。他们必须以大写字母开头：变量出现在逻辑查询中，返回可打印的结果

from pyDatalog import pyDatalog
pyDatalog.create_terms('X,Y')
# give me all the X so that X is 1
print(X==1)
X
-
1

查询可以包含多个变量和几个条件（'＆'表示and关系）：

In [2]:

# give me all the X and Y so that X is True and Y is False
print((X==True) & (Y==False))
X    | Y    
-----|------
True | False

有些查询返回空结果：

In [3]:

# give me all the X that are both True and False
print((X==True) & (X==False))
[]

除了数字和布尔值之外，变量可以表示字符串(如'Hello')。此外，查询可以包含python表达式(如加法)：

In [4]:

# give me all the X and Y so that X is a name and Y is 'Hello ' followed by the first letter of X
# python2请使用raw_input
print((X==input('Please enter your name : ')) & (Y=='Hello ' + X[0]))
Please enter your name : World
X     | Y      
------|--------
World | Hello W

在第二个等式中，X被称为受第一个等式的约束，也就是说要在第一个等式给X一个值，才使得第二个等式中关于X的表达式（Y）有可能被估值。

pyDatalog没有符号解析器（目前）！如果表达式中的变量未被绑定，则查询返回一个空list：

In [5]:

# give me all the X and Y so that Y is 1 and Y is X+1
print((Y==1) & (Y==X+1))
[]

变量也可以表示（嵌套的）元组，它们可以参与表达式并被切片（0为基）。

In [6]:

print((X==(1,2)+(3,)) & (Y==X[2]))
X         | Y
----------|--
(1, 2, 3) | 3

要在逻辑表达式中使用自己的函数，请在Python中定义它们，然后在pyDatalog为它们创建逻辑术语：

In [7]:

def twice(a):
    return a+a

pyDatalog.create_terms('twice')
print((X==1) & (Y==twice(X)))
X | Y
--|--
1 | 2

同样，pyDatalog变量可以传递给Python标准库中的函数：

In [8]:

# give me all the X and Y so that X is 2 and Y is the square root of X
import math
pyDatalog.create_terms('math')
print((X==2) & (Y==math.sqrt(X)))
X | Y                 
--|-------------------
2 | 1.4142135623730951

循环

循环可以通过使用.in_() 方法创建（我们将在以后看到还有其他方法可以创建循环）：【注：这里没有使用==，但同样执行了一次查询，查询的结果存在X中】

In [9]:

pyDatalog.create_terms('X,Y,Z')
# give me all the X so that X is in the range 0..4
print(X.in_((0,1,2,3,4)))

# python中的等效语句
# for x in range(5):
#     print(x)
X
-
4
3
2
1
0

查询的结果是一组可能的解决方案【行】，以随机顺序排列。每个解决方案对查询中的每个变量【列】都有一个值。用.data 属性可以访问结果。

In [10]:

print(X.in_(range(5)).data)
print(X.in_(range(5)) == set([(0,), (1,), (2,), (3,), (4,)]))
[(4,), (3,), (2,), (1,), (0,)]
True

同样，在查询之后，变量包含所有可能值的元组。它们可以用这些方法访问：

In [11]:

print("Data : ",X.data)
print("First value : ",  X.v())
# below, '>=' is a variable extraction operator
print("Extraction of first value of X: ", X.in_(range(5)) >= X)
Data :  [4, 3, 2, 1, 0]
First value :  4
Extraction of first value of X:  4

'＆'运算符可用于过滤结果。

In [12]:

# give me all X in range 0..4 that are below 2
print(X.in_(range(5)) & (X<2))
X
-
1
0

循环可以很容易地嵌套。使用缩进可以提高可读性：

In [13]:

# give me all X, Y and Z so that X and Y are in 0..4, Z is their sum, and Z is below 3
print(X.in_(range(5)) &
          Y.in_(range(5)) &
              (Z==X+Y) &
              (Z<3))
X | Y | Z
--|---|--
2 | 0 | 2
1 | 1 | 2
1 | 0 | 1
0 | 2 | 2
0 | 1 | 1
0 | 0 | 0

逻辑函数与字典

作为例子，我们将计算员工foo和bar的净工资。

In [14]:

pyDatalog.create_terms('X,Y,Z, salary, tax_rate, tax_rate_for_salary_above, net_salary')
salary['foo'] = 60
salary['bar'] = 110

# Python equivalent【只是作为展示， 实际上_salary 并没有被 define】
# _salary = dict()
# _salary['foo'] = 60
# _salary['bar'] = 110

# give me all the X and Y so that the salary of X is Y
print(salary[X]==Y)
print({X.data[i]:Y.data[i] for i in range(len(X.data))})        #【真正转化为字典的写法】
# python equivalent
# print(_salary.items())
X   | Y  
----|----
bar | 110
foo | 60 
{'bar': 110, 'foo': 60}

请注意，逻辑函数名称（例如 salary），以小写字母开头。函数为给定参数定义一个值。它类似于Python字典。

一个函数可以用值查询。一个函数对同一个参数只能有一个值。【后来值会覆盖旧值】

In [15]:

# foo now has a salary of 70
salary['foo'] = 70
print(salary['foo']==Y)
Y 
--
70

一个函数也可以用键查询。

In [16]:

# give me all the X that have a salary of 110
print(salary[X]==110)
# procedural equivalent in python
# for i, j in _salary.items():
#     if j==110:
#         print i, '-->', j
#  Notice that there is a implicit loop in the query.
X  
---
bar

注意查询中有一个隐式循环。【因此这种查询效率比较低】

查询可以测试一个标准的否定。

In [17]:

# A query can test the negation of a criteria.
print((salary[X]==Y) & ~(Y==110))
X   | Y 
----|---
foo | 70

现在让我们定义一个全球税率。我们将使用 None 函数参数：

In [18]:

# Let's now define a global tax rate. We'll use None for the function argument:
# the standard tax rate is 33%
+(tax_rate[None]==0.33)

# 一个函数可以在公式中调用：
# give me the net salary for all X
print((Z==salary[X]*(1-tax_rate[None])))
X   | Z                
----|------------------
bar | 73.69999999999999
foo | 46.89999999999999

在这种情况下，X受到salary[X]的约束，因此可以评估表达式。

一个函数也可以由一个子句定义。这是一个简单的例子：

In [19]:

# the net salary of X is Y if Y is the salary of X, reduced by the tax rate
net_salary[X] = salary[X]*(1-tax_rate[None])

# give me all the X and Y so that Y is the net salary of X
print(net_salary[X]==Y)
X   | Y                
----|------------------
bar | 73.69999999999999
foo | 46.89999999999999

In [20]:

# give me the net salary of foo
print(net_salary['foo']==Y)
print(net_salary[X]<50)
Y                
-----------------
46.89999999999999
X  
---
foo

现在让我们来定义一个累进税制：默认税率是33％，但是100％以上的工资是50％。

In [21]:

# Let's now define a progressive tax system: the tax rate is 33 % by default, but 50% for salaries above 100.
(tax_rate_for_salary_above[X] == 0.33) <= (0 <= X)
(tax_rate_for_salary_above[X] == 0.50) <= (100 <= X)
print(tax_rate_for_salary_above[70]==Y)
print(tax_rate_for_salary_above[150]==Y)
Y   
----
0.33
Y  
---
0.5

这里第一次出现了“推理”

"<="是上述陈述中的重要标志：它被读作'if'。【可以用来定义“推出”的规则】

首先给出函数的最一般定义。当搜索可能的答案时，pyDatalog从最后定义的规则开始，即更具体的规则，只要找到该函数的有效答案就立即停止。所以，尽管这两条规则似乎都适用于150的薪水，但实际上我们是按照第二条规则得到了50％的税率。

接下来让我们重新定义净工资。在此之前，我们要删除原始定义：

In [22]:

# retract our previous definition of net_salary
del net_salary[X]
# new definition
net_salary[X] = salary[X]*(1-tax_rate_for_salary_above[salary[X]])
# give me all X and Y so that Y is the net salary of X
print(net_salary[X]==Y)
# Please note that we used f[X]=<expr> above, as a shorter notation for (f[X]==Y) <= (Y==expr)

# This short notation, together with the fact that functions can be defined in any order,
# makes writing a pyDatalog program as easy as creating a spreadsheet.
X   | Y                
----|------------------
bar | 55.0             
foo | 46.89999999999999

请注意，我们在上面使用的f[X]=，是(f[X]==Y) <= (Y==expr)的简写。

这个简短的表示法以及可以按任意顺序定义函数的事实，使得编写pyDatalog程序像创建电子表格一样简单。

为了说明这一点，看看这个不能更清晰的Factorial的定义！

In [23]:

# To illustrate the point, this definition of Factorial cannot be any clearer !
pyDatalog.create_terms('N, factorial')
factorial[N] = N*factorial[N-1]
factorial[1] = 1

print(factorial[3]==N)
N
-
6

pyDatalog还可以用接近于谓词逻辑的表示法进行一些推理【准确的说，是一阶谓词逻辑的子集，表达力稍弱一些但效率似乎更高。】，但到目前为止，似乎还没有出现这样的例子，在下一篇文章中将会附上这部分内容, 以及更多有趣的例子。

三、基础教程（下）

聚合函数

聚合函数是一种特殊类型的函数【与数组类元素相关】。我们首先创建说明所需的数据。

In [1]:

from pyDatalog import pyDatalog
pyDatalog.create_terms('X,Y,manager, count_of_direct_reports')
# the manager of Mary is John
+(manager['Mary'] == 'John')
+(manager['Sam']  == 'Mary')
+(manager['Tom']  == 'Mary')

最基本的聚合函数是len_()，它计算数组的规模

In [2]:

from pyDatalog import pyDatalog
pyDatalog.create_terms('X,Y,manager, count_of_direct_reports')
# the manager of Mary is John
+(manager['Mary'] == 'John')
+(manager['Sam']  == 'Mary')
+(manager['Tom']  == 'Mary')

pyDatalog 寻找 manager['Mary']==Y的所有可能解Y, 然后计算Y的数目。 .

聚合函数包括：

len_ (P[X]==len_(Y)) <= body ：P[X]是Y的值的计数（通过子句的主体与X关联）

sum_ (P[X]==sum_(Y, for_each=Z)) <= body ：P[X]是每个Z中的Y的总和。（Z用于区分可能相同的Y值）

min_， max_ (P[X]==min_(Y, order_by=Z)) <= body ：P[X]是按Z排序的Y的最小值（或最大值）。

tuple_ (P[X]==tuple_(Y, order_by=Z)) <= body ：P[X]是一个元组，包含按Z排序的Y的所有值。

concat_ (P[X]==concat_(Y, orde\r_by=Z, sep=',')) <= body ：与'sum'相同，但是用于字符串。字符串按Z排序，并用'，'分隔。

rank_ (P[X]==rank_(grou\p_by=Y, order_by=Z)) <= body ：P[X]是列表按Z排序时的Y值列表中的X的序列号。

running_sum_ (P[X]==running_sum_(N, group_by=Y, order_by=Z)) <= body ：P[X]是当Y按Z排序时每个在之前X或等于X的N值的总和 。 mean_和 linear_regression：请参阅我们的参考（ https://sites.google.com/site/pydatalog/reference ）

（对这块不是很理解，官网例子也不多，可能写得不是很清楚。）

字面值和集合

就像pyDatalog函数的行为与Python中的字典一样，pyDatalog字面值的行为与Python中的集合很相似。

In [3]:

from pyDatalog import pyDatalog
pyDatalog.create_terms('X,Y,Z, works_in, department_size, manager, indirect_manager, count_of_indirect_reports')

向集合中添加事实的方法：

In [4]:

# Mary works in Production
+ works_in('Mary', 'Production')
+ works_in('Sam',  'Marketing')

+ works_in('John', 'Production')
+ works_in('John', 'Marketing')

同样，字面值也可以按值查询,比在原生python中的操作简洁【隐藏了一个循环】。

In [5]:

# give me all the X that work in Marketing
print(works_in(X,  'Marketing'))
# procedural equivalent in Python
# for i in _works_in:
#     if i[1]=='Marketing':
#         print i[0]
X   
----
Sam 
John

字面值也可以通过子句来定义。

【从这里就可以看到，“字面值”这个概念与谓词逻辑的形式是十分相似的】

In [6]:

# one of the indirect manager of X is Y, if the (direct) manager of X is Y
indirect_manager(X,Y) <= (manager[X] == Y)
# another indirect manager of X is Y, if there is a Z so that the manager of X is Z, 
#   and an indirect manager of Z is Y
indirect_manager(X,Y) <= (manager[X] == Z) & indirect_manager(Z,Y)
print(indirect_manager('Sam',X))
X   
----
Mary
John

请注意，这里使用了2个独立的子句实现了隐式的“或”。

【自己总结了“字面值”和上一篇中的“函数”的区别与联系： 1.前者使用圆括号，没有值（也就不能用==）；后者使用方括号，对于括号中的元素有一个取值。 2.在用法上，manager[X] == Y 与 manager(X,Y)也是相似的，但是按关键字查询中,manager['Mary'] == X 与 manager('Mary',X)相比会更高效。前者是一个哈希操作，而后者依然需要循环。】

当解析查询时，pyDatalog可以记得中间结果，通过这个过程被称为记忆化。这使查询更快，而且它也有助于处理无限循环！

In [7]:

# the manager of John is Mary (whose manager is John !)
manager['John'] = 'Mary'
manager['Mary'] = 'John'
print(indirect_manager('John',X))       # no infinite loop

X   
----
John
Mary

这使pyDatalog成为在复杂数据结构上实现递归算法的一个很好的工具，例如表示网络。

也可以删除事实：

In [8]:

# John does not work in Production anymore
- works_in('John', 'Production')
# 补充：
# 【也可以用增减事实(fact)同样的方式增减定理，但注意加上括号，加定理没有加号，但是减定理需要减号】
# - (indirect_manager(X,Y) <= (manager[X] == Z) & indirect_manager(Z,Y))
# print(indirect_manager('John',X))

聚合函数也可以在字面值上定义：

In [9]:

(count_of_indirect_reports[X]==len_(Y)) <= indirect_manager(Y,X)
print(count_of_indirect_reports['John']==Y)             
Y
-
4

In [10]:

# 一个小推理规则

pyDatalog.create_terms('X,Y,Z,father,fatherOf,grandfatherOf')
(grandfatherOf[X] == Z) <= ((fatherOf[X]==Y) & (fatherOf[Y]==Z))
fatherOf["乾隆"] = "雍正"
fatherOf["雍正"] = "康熙"
print(grandfatherOf["乾隆"] == X)
X 
--
康熙

树，图，与递归算法

树和图可以用它们的结点之间的连接定义：

In [11]:

pyDatalog.create_terms('link, can_reach')

# there is a link between node 1 and node 2
+link(1,2)
+link(2,3)
+link(2,4)
+link(2,5)
+link(5,6)
+link(6,7)
+link(7,2)

# 无向图，边双向连接
link(X,Y) <= link(Y,X)

Out[11]:

link(X,Y) <= link(Y,X)

下面两个子句解释了如何确定两个结点X,Y之间的可达关系：

In [12]:

# can Y be reached from X ?
can_reach(X,Y) <= link(X,Y) # direct link
# via Z
can_reach(X,Y) <= link(X,Z) & can_reach(Z,Y) & (X!=Y)

print (can_reach(1,Y))
Y
-
2
6
7
3
4
5

请注意，尽管图中有循环，但pyDatalog足够聪明以解决查询问题。

这个例子（ https://github.com/pcarbonn/pyDatalog/blob/master/pyDatalog/examples/graph.py ）中有更多的图算法的例子。

8皇后问题

通过结合我们迄今为止所学的，我们可以用声明式编程处理复杂问题，并让计算机找到解决它们的过程。作为一个例子，让我们为8皇后问题编程找到一个有效的解决方案。任何N皇后问题的解决方案可以在这里找到( https://github.com/pcarbonn/pyDatalog/blob/master/pyDatalog/examples/queens_N.py ) 。

In [13]:

from pyDatalog import pyDatalog
pyDatalog.create_terms('N,X0,X1,X2,X3,X4,X5,X6,X7')
pyDatalog.create_terms('ok,queens,next_queen')

# the queen in the first column can be in any row
queens(X0)                      <= (X0._in(range(8)))

# to find the queens in the first 2 columns, find the first one first, then find a second one
queens(X0,X1)                   <= queens(X0)                   & next_queen(X0,X1)

# repeat for the following queens
queens(X0,X1,X2)                <= queens(X0,X1)                & next_queen(X0,X1,X2)
queens(X0,X1,X2,X3)             <= queens(X0,X1,X2)             & next_queen(X0,X1,X2,X3)
queens(X0,X1,X2,X3,X4)          <= queens(X0,X1,X2,X3)          & next_queen(X0,X1,X2,X3,X4)
queens(X0,X1,X2,X3,X4,X5)       <= queens(X0,X1,X2,X3,X4)       & next_queen(X0,X1,X2,X3,X4,X5)
queens(X0,X1,X2,X3,X4,X5,X6)    <= queens(X0,X1,X2,X3,X4,X5)    & next_queen(X0,X1,X2,X3,X4,X5,X6)
queens(X0,X1,X2,X3,X4,X5,X6,X7) <= queens(X0,X1,X2,X3,X4,X5,X6) & next_queen(X0,X1,X2,X3,X4,X5,X6,X7)

# the second queen can be in any row, provided it is compatible with the first one
next_queen(X0,X1)                   <= queens(X1)                       & ok(X0,1,X1)

# to find the third queen, first find a queen compatible with the second one, then with the first
# re-use the previous clause for maximum speed, thanks to memoization
next_queen(X0,X1,X2)                <= next_queen(X1,X2)                & ok(X0,2,X2)

# repeat for all queens
next_queen(X0,X1,X2,X3)             <= next_queen(X1,X2,X3)             & ok(X0,3,X3)
next_queen(X0,X1,X2,X3,X4)          <= next_queen(X1,X2,X3,X4)          & ok(X0,4,X4)
next_queen(X0,X1,X2,X3,X4,X5)       <= next_queen(X1,X2,X3,X4,X5)       & ok(X0,5,X5)
next_queen(X0,X1,X2,X3,X4,X5,X6)    <= next_queen(X1,X2,X3,X4,X5,X6)    & ok(X0,6,X6)
next_queen(X0,X1,X2,X3,X4,X5,X6,X7) <= next_queen(X1,X2,X3,X4,X5,X6,X7) & ok(X0,7,X7)

# it's ok to have one queen in row X1 and another in row X2 if they are separated by N columns
ok(X1, N, X2) <= (X1 != X2) & (X1 != X2+N) & (X1 != X2-N)

# give me one solution to the 8-queen puzzle
print(queens(X0,X1,X2,X3,X4,X5,X6,X7).data[0])

(7, 3, 0, 2, 5, 1, 6, 4)