现在您可以开始看看个别结果;事实上,GoogleSearchResult 对象是查询返回的所有结果的整体:
...
import com.google.soap.search.GoogleSearchResultElement;
public class GoogleSearchTutorial {
...
GoogleSearchResult result = search.doSearch();
GoogleSearchResultElement[] resultElements = result.getResultElements();
System.out.println("Start index = "+ result.getStartIndex() );
System.out.println("End index = "+ result.getEndIndex() );
} catch (GoogleSearchFault gsf) {
...
getResultElements() 方法将返回一组 GoogleSearchResultElement 对象,每个对象占据整个结果集中的一个位置。它是以 1 开始的索引,因此,找到 1790000 个结果的搜索将返回一组 GoogleSearchResultElement 对象,索引号从 1 开始到 10 结束。(请记住,每条请求最多返回 10 个结果。)
访问每个个别结果简单到只要循环这个数组:
import com.google.soap.search.GoogleSearch;
import com.google.soap.search.GoogleSearchResult;
import com.google.soap.search.GoogleSearchFault;
import com.google.soap.search.GoogleSearchResultElement;
public class GoogleSearchTutorial {
public static void main (String[] args) {
String searchTerm = args[0];
try {
GoogleSearch search = new GoogleSearch();
search.setKey("00000000000000000000000000000000");
search.setQueryString(searchTerm);
GoogleSearchResult result = search.doSearch();
GoogleSearchResultElement[] resultElements = result.getResultElements();
int startIndex = result.getStartIndex() - 1; int endIndex = result.getEndIndex() - 1; for (int i = startIndex; i <= endIndex; i++) { GoogleSearchResultElement resultElement = resultElements[(i)]; String title = resultElement.getTitle(); String url = resultElement.getURL(); System.out.println(title); System.out.println(url); System.out.println(""); }
} catch (GoogleSearchFault gsf) {
System.out.println("Google Search Fault: "+gsf.getMessage());
}
}
}
因为数组是从零开始的,所以要减去 1 才能得到真正的起始和结束的值。
如下所示,GoogleSearchResultElement 对象代表数组内的每一个结果,它的方法可以访问如标题等信息。对于 "science fiction" 搜索,打个比方,结果可能是:
<b>Science</b> <b>Fiction</b> Weekly
http://www.scifi.com/sfw/
SCIFI.COM
http://www.scifi.com/
<b>Science</b> <b>Fiction</b> and Fantasy Writers of America, Inc.
http://www.sfwa.org/
The Link?ping <b>Science</b> <b>Fiction</b> & Fantasy Archive
http://www2.lysator.liu.se/sf_archive/
Asimov's <b>Science</b> <b>Fiction</b>
http://www.asimovs.com/
The SF Site: The Best in <b>Science</b> <b>Fiction</b> and Fantasy
http://www.sfsite.com/home.htm
World <b>Science</b> <b>Fiction</b> Society / Worldcon [Official]
http://worldcon.org/
Analog <b>Science</b> <b>Fiction</b> & Fact
http://www.analogsf.com/
ULTIMATE <b>SCIENCE</b> <b>FICTION</b> WEB GUIDE
http://www.magicdragon.com/UltimateSF/SF-Index.html
Feminist <b>Science</b> <b>Fiction</b>, Fantasy & Utopia
http://www.feministsf.org/femsf/
请注意,在 Web 搜索中,使用黑体标记突出显示结果中的搜索项;这一信息将延续到结果从 Web 服务返回为止。
Google 返回的结果并不限于站点。它还返回每个结果的 Goolge 分类:
...
import com.google.soap.search.GoogleSearchDirectoryCategory;
public class GoogleSearchTutorial {
...
for (int i = startIndex; i <= endIndex; i++) {
GoogleSearchResultElement resultElement = resultElements[(i-1)];
String title = resultElement.getTitle();
String url = resultElement.getURL();
GoogleSearchDirectoryCategory category =
resultElement.getDirectoryCategory(); String categoryViewable = category.getFullViewableName(); String categoryString = category.toString();
System.out.println(categoryViewable + " " + categoryString);
System.out.println(title);
System.out.println(url);
System.out.println("");
}
...
GoogleSearchDirectoryCategory 对象中包含了 Google 类别名(表示为有格式的字符串)和 Open Directory Project 类别名(由 getFullViewableName() 返回)方面的信息。Open Directory 项目将在Open Directory 信息中述及。
运行应用程序将返回目录信息和结果,而且目录信息会在每个结果之前:
Top/Arts/Literature/Genres/Science_Fiction/Magazines_and_E-zines
{SE="", FVN="To p/Arts/Literature/Genres/Science_Fiction/Magazines_and_E-zines"}
<b>Science</b> <b>Fiction</b> Weekly
http://www.scifi.com/sfw/
Top/Arts/Genres/Science_Fiction_and_Fantasy
{SE="", FVN="Top/Arts/Genres/Science _Fiction_and_Fantasy"}
SCIFI.COM
http://www.scifi.com/
Top/Arts/Writers_Resources/Fiction/Science_Fiction_and_Fantasy
{SE="", FVN="Top/Arts/Writers_Resources/Fiction/Science_Fiction_and_Fantasy"}
<b>Science</b> <b>Fiction</b> and Fantasy Writers of America, Inc.
http://www.sfwa.org/
Top/Arts/Literature/Genres/Science_Fiction
{SE="", FVN="Top/Arts/Literature/Genres/Science_Fiction"}
The Linköping <b>Science</b> <b>Fiction</b> & Fantasy Archive
http://www2.lysator.liu.se/sf_archive/
...
确定一个结果是否恰当部分要通过在该页面其余部分的上下文中看搜索项。API 允许您检索“片断”,即使得该页面被包含在结果中的那部分。
...
String categoryViewable = category.getFullViewableName();
String snippet = resultElement.getSnippet();
System.out.println(categoryViewable + " " + categoryString);
System.out.println(title);
System.out.println(url);
System.out.println(" " + snippet );
System.out.println("");
}
...
请注意,片断并不一定会包括搜索项。例如,您的查询返回:
Top/Arts/Literature/Genres/Science_Fiction/Magazines_and_E-zines
{SE="", FVN="Top/Arts/Literature/Genres/Science_Fiction/Magazines_and_E-zines"}
<b>Science</b> <b>Fiction</b> Weekly
http://www.scifi.com/sfw/
<b>...</b> Hollywood will choose to adapt better SF, and
more. <b>...</b> (c)<br> Copyright 2002, <b>Science</b> <b>Fiction</b> Weekly
(tm).
Top/Arts/Genres/Science_Fiction_and_Fantasy
{SE="", FVN="Top/Arts/Genres/Science_Fiction_and_Fantasy"}
SCIFI.COM
http://www.scifi.com/
[ Farscape/Stargate SG-1 ]. Apr 28, 2002, Today's<br> News Hodder Talks
More 'Daredevil' 'Sub-Mariner <b>...</b>
Top/Arts/Writers_Resources/Fiction/Science_Fiction_and_Fantasy {SE="", FVN="Top/
Arts/Writers_Resources/Fiction/Science_Fiction_and_Fantasy"}
<b>Science</b> <b>Fiction</b> and Fantasy Writers of America, Inc.
http://www.sfwa.org/
<b>...</b> not of SFWA.
Except where otherwise noted, content and design c ]
copyright 1995-2002<br> by <b>Science</b> <b>Fiction</b> and Fantasy Writers of America, Inc. ("SFWA"). SFWA <b>...</b>
...
请再次注意搜索项,如果出现,则应当使用黑体标记突出显示。而且注意到片断是页面内的 HTML 格式的内容。
Open Directory Project 是自愿发起为因特网上有价值的站点编制目录的工作。ODP不是自动的网上搜索程序生成的索引,而是人工编写的站点列表。ODP 数据库是免费使用的,Google 在其结果中利用这一信息。API 可以直接检索这一信息。
import com.google.soap.search.GoogleSearch;
import com.google.soap.search.GoogleSearchResult;
import com.google.soap.search.GoogleSearchFault;
import com.google.soap.search.GoogleSearchResultElement;
import com.google.soap.search.GoogleSearchDirectoryCategory;
public class GoogleSearchTutorial {
public static void main (String[] args) {
String searchTerm = args[0];
try {
GoogleSearch search = new GoogleSearch();
search.setKey("00000000000000000000000000000000");
search.setQueryString(searchTerm);
GoogleSearchResult result = search.doSearch();
GoogleSearchResultElement[] resultElements = result.getResultElements();
int startIndex = result.getStartIndex();
int endIndex = result.getEndIndex();
for (int i = startIndex; i <= endIndex; i++) {
GoogleSearchResultElement resultElement = resultElements[(i-1)];
String title = resultElement.getTitle();
String url = resultElement.getURL();
GoogleSearchDirectoryCategory category =
resultElement.getDirectoryCategory();
String categoryString = category.toString();
String categoryViewable = category.getFullViewableName();
String snippet = resultElement.getSnippet();
String directoryTitle = resultElement.getDirectoryTitle(); String directorySummary = resultElement.getSummary();
System.out.println(categoryViewable + " " + categoryString);
System.out.println(title);
System.out.println(url);
System.out.println(" " + snippet);
System.out.println("Open Directory Information:"); System.out.println(" " + directoryTitle); System.out.println(" " + directorySummary);
System.out.println("");
}
} catch (GoogleSearchFault gsf) {
System.out.println("Google Search Fault: "+gsf.getMessage());
}
}
}
请注意,根据长度,摘要会在结果中被截断:
Top/Arts/Literature/Genres/Science_Fiction/Magazines_and_E-zines {SE="",
FVN="Top/Arts/Literature/Genres/Science_Fiction/Magazines_and_E-zines"}
<b>Science</b> <b>Fiction</b> Weekly
http://www.scifi.com/sfw/
<b>...</b> Hollywood will choose to adapt better SF, and more. <b>...</b>
(c)<br> Copyright 2002, <b>Science</b> <b>Fiction</b> Weekly(tm).
Open Directory Information: <b>Science</b> <b>Fiction</b> Weekly The leading electronic publication covering the world of <b>Science
</b><b> Fiction</b>, with news, reviews, original...
Top/Arts/Genres/Science_Fiction_and_Fantasy {SE="",
FVN="Top/Arts/Genres/Science_Fiction_and_Fantasy"}
SCIFI.COM
http://www.scifi.com/
[ Farscape/Stargate SG-1 ]. Apr 29, 2002, Today's<br> News Hodder Talks
More 'Daredevil' 'Sub-Mariner <b>...</b>
Open Directory Information: SCIFI.COM News, reviews, games and shopping from the SCI FI Channel.
Top/Arts/Writers_Resources/Fiction/Science_Fiction_and_Fantasy {SE="",
FVN="Top/Arts/Writers_Resources/Fiction/Science_Fiction_and_Fantasy"}
<b>Science</b> <b>Fiction</b> and Fantasy Writers of America, Inc.
http://www.sfwa.org/
<b>...</b> not of SFWA. Except where otherwise noted,
content and design copyright A 1995-2002
<br> by <b>Science</b> <b>Fiction</b> and Fantasy Writers of
America, Inc. ("SFWA"). SFWA <b>...</b>
Open Directory Information: <b>Science</b> <b>Fiction</b> & Fantasy
Writers of America, Inc. The official Website of the premier professional <b>science</b>
<b>fiction< /b> writer's association. The SFWA site... ...
到目前为止,示例已经展示的搜索都只请求检索搜索的前 10 条记录。API 支持请求前 1000 个结果内的任何记录,条件是每次在 10 条以内。通过设置起始位置和结果的最多个数,应用程序可以得到可用的结果。
...
search.setQueryString(searchTerm);
int startResult = 100;
search.setStartResult(startResult);
int maxResult = 5;
search.setMaxResults(maxResult);
GoogleSearchResult result = search.doSearch();
GoogleSearchResultElement[] resultElements = result.getResultElements();
int startIndex = result.getStartIndex() - 1 - startResult ;
int endIndex = result.getEndIndex() - 1 - startResult ;
for (int i = startIndex; i <= endIndex; i++) {
GoogleSearchResultElement resultElement = resultElements[i];
...
注意,起始位置的计算和您的想象并不完全相同。例如,startResult 实际上表示最后一条“没有”包括在内的记录索引。例如,上面的代码实际上返回的不是 100 到 105 而是 101 到 106。
为此,startIndex 和 endIndex 不仅要根据 startResult 进行调整,而且还要以 1 为单位进行调整(如前)。