Thesauruslab
实验背景
在所给的字典或者词料库之中,给定任意两个词,出乎意料地,一般可以找到一条或者多条语义线路,使得在找寻一个词的同义词的过程之中,完成找寻给定的另外一个往往毫不相干的词的工作。
本实验首先通过设计算法解决无向图的最短路径问题(ASP),其次视词间的同一关系为边,以此给出两个词之间的变化线路。
实验细节
1 Unweighted Shortest Paths
1.1 Graph type
type vertex = key
type edge = vertex * vertex
type nyi = (vertex seq) Table.table
type graph = nyi
type asp = nyi
使用邻接表表示图 Table.key即为起点,Table.value即为该点的所有邻居构成的串
1.2 makeGraph
val makeGraph : edge seq -> graph
函数功能:
将所给的图的边集转换成邻接表所表示的graph类型
函数思路:
对于所给边集的每一条边(u,v),通过map操作映成(u,<v>),(v,<>)的串,之后通过Table.collect生成一个Table之后对每一个value进行合并操作flatten即可
函数代码:
fun makeGraph (E : edge seq) : graph =
let
val PreL = map (fn x => case x of (l,r) => (r,empty ())) E
val PreR = map (fn x => case x of (l,r) => (l,singleton r)) E
val Res1 = Table.collect (append (PreL,PreR))
val Res2 = Table.map (fn x => (flatten x)) Res1
in
Res2
end
渐进复杂度分析:
PreL、PreR: W=O(n) S=O(1)
Res1: W=W(append)+W(Table.collect)=O(n)+O(nlogn)=O(nlogn)
S=S(append)+S(Table.collect)=O(1)+O(log^2n)
Res2: W=∑W(flatten)=∑(2m)=O(n)
S=max(S(folatten))=O(logm)=O(logn)
故 W=O(nlogn) S=O(log^2n)
1.3 numEdges
numEdges : graph -> int
函数功能:
返回图的边数
函数思路:
对每个点的邻居构成的串长度求和即可
函数代码:
fun numEdges (G : graph) : int =
let
val Pre = Table.map (fn r => length r) G
in
Table.reduce (fn (x,y) => x+y) 0 Pre
end
1.4 numVertices
numVertices : graph -> int
函数功能:
返回图的点数
函数思路:
表示图的邻接表的大小即为点数
函数代码:
fun numVertices (G : graph) : int =
Table.size G
1.5 outNeighbors
outNeighbors : graph -> vertex -> vertex seq
函数功能:
对于给定点查找其outneighbor
函数思路:
Table.find即可
函数代码:
fun outNeighbors (G : graph) (v : vertex) : vertex seq =
case (Table.find G v) of
NONE => empty ()
| SOME x => x
渐进复杂度分析:
W(Table.find)=O(log|V|) S(Table.find)=O(log|V|)
W(SOME x => x)=O((|V( out) |) S(SOME x=> x)=O(1)
故
W=O(|V(out) |+log|V|)
S= O(log|V|)
1.6 asp type
type asp = (vertex seq) Table.table
1.7 makeASP
makeASP : graph -> vertex -> asp
函数功能:
返回一个asp类型,其中存储了从给定起点v到全部能够到达的点的路径,通过邻接表形式存储
函数思路:
从给定起点v开始进行广度优先搜索,从而得出所求的邻接表,其中邻接表的key值为从各终点开始到起点的路径上的每一个点。
函数代码:
fun FindParent x G =
let
val Out = outNeighbors G x
val Parent = map (fn v => (v,x)) Out
in
Parent
end
fun MyFind G v =
case (Table.find G v) of
(SOME x) => false
| NONE => true
fun makeASP (G : graph) (v : vertex) : asp =
case (MyFind G v) of
true => Table.empty()
| false =>
let
fun BFS (X : asp,F : graph) =
case (size F) of
0 => X
| _ =>
let
val X' = Table.merge (fn (x,y) => append(x,y)) (X,F)
val Pre = flatten (map (fn (x,y) => FindParent x G ) (Table.toSeq F))
val F' = Table.collect Pre
val F'' = Table.filterk (fn (k,_) => MyFind X' k) F'
in
BFS (X',F'')
end
val Res = BFS (Table.empty(),Table.singleton (v,empty()))
in
Res
end
渐进复杂度分析:
每一步时间复杂度相对较小的情况下,算法复杂度即为广度优先搜索的时间复杂度
W= O(|E|log|V|)
S=O(Dlog 2 |V|)
1.8 All Shortest Paths Reporting
report : asp -> vertex -> vertex seq seq
函数功能:
返回全部的最短路径
函数思路:
对asp进行从给定的起点v的深度优先搜索即可
函数代码:
fun report (A : asp) (v : vertex) : vertex seq seq =
let
fun DFS Y ((Res,X),m) =
case (length (outNeighbors A m)) of
0 =>
let
val X' = append (X,singleton m)
in
(append (Res,singleton (rev X')),X')
end
| _ =>
let
val X' = append (Y,singleton m)
val Y' = append (Y,singleton m)
val (Res',X'') = iter (DFS Y') (Res,X') (outNeighbors A m)
in
(Res',X'')
end
in
#1 (DFS (empty()) ((empty(),empty()),v))
end
渐进时间复杂度分析:
假设有P条最短路径,L为路径长度
那么相当于P次DFS
故 W=S=O(PLlog|V|)
2. Thesaurus Paths
2.1 make
make : (string * string seq) seq -> thesaurus
函数功能:
将所给的string二元组表示的遍集转化成thesaurus表示的邻接表
函数思路:
调用ASP.makeGraph即可
函数代码:
fun make (S : (string * string seq) seq) : thesaurus =
let
val Pre = flatten (map (fn (l,r) => (map (fn x => (l,x))r)) S)
in
ASP.makeGraph Pre
end
2.2 Thesaurus Lookup
numWords : thesaurus -> int
synonyms : thesaurus -> string -> string seq
函数功能:
返回单词个数以及单词之间的同义边数
函数思路:
调用ASP.numVertices以及ASP.outNeighbors
函数代码:
fun numWords (T : thesaurus) : int =
ASP.numVertices T
fun synonyms (T : thesaurus) (w : string) : string seq =
ASP.outNeighbors T w
2.3 Thesaurus All Shortest Paths
query : thesaurus -> string -> string -> string seq seq
函数功能:
返回给出的两个词之间的所有语义转化路径
函数思路:
调用ASP.makeASP以及ASP.report Early,注意保证函数是staged的
函数代码:
fun query (T : thesaurus) (w1 : string) (w2 : string) : string seq seq =
let
val Early = ASP.makeASP T w1
in
ASP.report Early w2
end