Clustering OWL Documents Based on Semantic Analysis_unsupervised deep embedding for clustering analysi-CSDN博客

本文链接：https://blog.csdn.net/mudboy/article/details/619183

Abstract. Clustering OWL documents on the WWW or the Semantic
Web is an important task in domain of ontology research and WI re-
search. This paper analyzes semantic of OWL documents and proposes
a method for computing semantic similarity between OWL documents.
The method considers inheritance of objects and representation of com-
plex classes. It can be used in clustering OWL documents built by experts
and OWL documents learned by automatic tools.
1 Introduction
Ontology, special domain ontology [1, 2], plays an important role in information
extract and exchange. These ontology documents must be sound and complete.
Now, most of them had been built by experts. On the one hand this work con-
sumes lots of time, on the other hand these ontologies have personal features.
Clustering existing lots of ontology documents on the WWW or the Semantic
Web is important for user to re¯ne ontology or integrate ontology.
One of typical problems on Web Intelligence (WI) [3] technologies is PSML
(Problem Solving Makeup Language). The core of PSML is distributed inference
engines. The precondition of PSML is clustering appropriate contents and meta-
knowledge like ontology information on the Semantic Web. Therefore clustering
ontology documents on the WWW or the Semantic Web is very important for
PSML.
OWL [4], which is the standard web ontology language proposed by W3C,
has become the new standard for ontology representation and exchange on the
Internet. It uses characteristic of other ontology languages for reference in its de-
veloping process. Clustering research in this paper aims at the OWL documents.
Clustering other ontology languages like as: RDF, RDFS, DAML can amend the
method in the paper.
1.1 Related Work
A key problem in clustering research is computing semantic similarity between
objects. Traditional distance-based method in computing similarity between
database objects is not suitable for OWL documents. OWL essentially is semi-
structure data. So evaluating semantic similarity can use methods of computing
semantic similarity between XML documents for reference. The methods in lit-
eratures [5{8] can be divided into two kinds: one is structure similarity [5, 7, 8]
and another is semantic similarity [6]. The common feature of structure simi-
larity is modelling XML document as XML tree and evaluating similarity by
tree operation [8] or path structure [5, 7]. Semantic similarity ¯rstly computes
similarity between basic elements in document, then evaluates full document
based on these similarities. However, the methods can't directly evaluate sim-
ilarity between OWL documents. The reason is that the method of structure
similarity lacks semantic information whereas the method of semantic similarity
only considers basic elements in XML document. OWL document is a language
of representing knowledge. It can describe all kinds of objects in world and re-
lations between the objects. The most important di®erence from XML is that
OWL is an inferential language with semantic. It enhances inheritance between
objects and complex classes representation.
This paper considers inheritance relation between objects and complex classes
representation in OWL document. Then it proposes a method of computing se-
mantic similarity between OWL documents and integrates it with hierarchical
clustering algorithm to cluster OWL documents, which are built by experts or
auto tools. The results of experiments show the algorithm has better e®ect on
clustering OWL documents.
There are contributes in the paper as follow:
1. The paper proposes a method of computing similarity between simple classes
based on resource similarity and property constraints.
2. When the paper evaluates similarity between complex classes, it uses set
operation for reference.
3. The method in the paper has better e®ect on clustering OWL documents,
which are built by experts or auto tools.
1.2 Paper Organization
The paper is organized as follows. Section 2 discourses upon method of com-
puting similarity between classes in OWL documents. Section 3 introduces how
to compute similarity matrix of OWL documents set. Experimental results are
found in section 4. Section 5 concludes the paper and presents future work.
2 Similarity of Two Classes
An OWL concerns classes, properties, instances of classes (named individual). To
compute similarity of two OWL documents, it is necessary to compute similar-
ity of elements in OWL documents. Basic elements in OWL are classes. Classes
have three types: simple named classes, anonymous classes, and complex classes.
Commonly anonymous classes have not own local names, but they have prop-
erties which restrict instances of anonymous classes. Anonymous classes can be
seen as special simple classes. DatatypeProperty in OWL denotes relation be-
tween instances of classes and RDF literals or XML Schema and ObjectProperty
denotes relation between instances of two classes. Similarity of properties includ-
ing DatatypeProperty and ObjectProperty just explains similarity of two classes,
which are domains of these properties. We propose a method to compute simple
classes similarity that considers the basic semantic, properties of classes.
2.1 Similarity of Two Simple Classes
Members of simple classes generally are restricted by directly sup-classes and
their properties as Figure 1. According to inheritance of class, a class can inherit
its all sup-classes properties. In the way, restriction of sup-classes can translate
into restriction of properties in sup-classes. In addition to consider basic semantic
similarity of classes (names of simple named classes denote semantic), similarity
between properties which restrict the classes in computing similarity of two
classes must be considered.
Property can divided into two sorts: DatatypeProperty, ObjectProperty. Be-
cause two kinds of properties have di®erent restrained rang, it is not meaning
to compare similarity of di®erent property typies. So similarity of two classes
can be computed by basic semantic similarity BasicSim(c1; c2)(If c1; c2 have an
anonymous class, then BasicSim(c1; c2) = 0.), similarity of DatatypeProper-
ties in two classes, and similarity of ObjectProperties in two classes as Equa-
tion 1, where w1 = 1
jnumber(sum)j
, w2 = jnumber(BMDP)j
jnumber(sum)j
, w3 = jnumber(BMOP)j
jnumber(sum)j
,
number(sum) = 1+jnumber(BMDP)j+jnumber(BMOP)j, number(BMDP)
and number(BMOP) are number of the most mapping DatatypeProperty pair
and number of the most mapping ObjectProperty pair respectively.
ClassSim(c1; c2) = w1BasicSim(c1; c2)
+w2 X (pi;pj )2BMDP
DataP ropertySim(c1:pi; c2:pj)
+ w3 X (pi;pj )2BMOP
ObjectP ropertySim(c1:pi; c2:pj)
(1)