One of the difficulties in building an SQL-like query lange for the Web is the absence of a database schema for this huge, heterogeneous repository of information. However, if we are interested in HTML documents only, we can construct a virtual(66)from the implicit structure of these files. Thus, at the highest level of(67), every such document is identified by its Uniform. Resource Locator(URL), has a title and a text Also, Web servers provide some additional information such as the type, length, and the last modification date of a document. So, for data mining purposes, we can consider the site of all HTML documents as arelation:Document(url,(68), text, type, length, modify)Where all the(69)are character strings. In this framework, anindividual document is identified with a(70)in this relation. Of course, if some optional information is missing from the HTML document, the associate fields will de left blank, but this is not uncommon in any database.
题型:单项选择题