CodeProject上看见的感兴趣的文章,先研究着,有空翻译一下:


线程查看
线程列表查看显示所有正在工作中的线程.每个线程从URI队列中取出一个URI,进行链接.
.
GET / HTTP/1.0
Host: www.cnn.com
Connection: Keep-Alive
HTTP/1.0 200 OK
Date: Sun, 19 Mar 2006 19:39:05 GMT
Content-Length: 65730
Content-Type: text/html
Expires: Sun, 19 Mar 2006 19:40:05 GMT
Cache-Control: max-age=60, private
Connection: keep-alive
Proxy-Connection: keep-alive
Server: Apache
Last-Modified: Sun, 19 Mar 2006 19:38:58 GMT
Vary: Accept-Encoding,User-Agent
Via: 1.1 webcache (NetCache NetApp/6.0.1P3)
Parsing page 
Found: 356 ref(s)
http://www.cnn.com/
http://www.cnn.com/search/
http://www.cnn.com/linkto/intl.html
文件类型
爬虫支持的下载下来的文件类型,用户可以添加MIME类型,支持可下载的文件包括一个默认类型用来用户可添加,编辑和删除MIME类型. 用户可以选择让所有MIME类型为下列数字.



高级
高级设置:
代码页编码的文本下载页面列出了用户定义列表限制词,让用户 防止任何坏页面列出了用户定义列表限制主机分机以免堵塞这类主机 名单一使用者定义限制档案清单分机避免paring非文字资料

兴趣要点
保持活动连接:
保持活动连接是一种形式,要求客户端和服务器保持连接打开后,反应完毕后, 可以通过添加一个HTTPheader的请求到服务器,在以下要求:
GET /CNN/Programs/nancy.grace/ HTTP/1.0
Host: www.cnn.com
Connection: Keep-Alive
HTTP/1.0 200 OK
Date: Sun, 19 Mar 2006 19:38:15 GMT
Content-Length: 29025
Content-Type: text/html
Expires: Sun, 19 Mar 2006 19:39:15 GMT
Cache-Control: max-age=60, private
Connection: keep-alive
Proxy-Connection: keep-alive
Server: Apache
Vary: Accept-Encoding,User-Agent
Last-Modified: Sun, 19 Mar 2006 19:38:15 GMT
Via: 1.1 webcache (NetCache NetApp/6.0.1P3)
HTTP/1.0 200 OK
Date: Sun, 19 Mar 2006 19:38:15 GMT
Content-Length: 29025
Content-Type: text/html
Expires: Sun, 19 Mar 2006 19:39:15 GMT
Cache-Control: max-age=60, private
Connection: Close
Server: Apache
Vary: Accept-Encoding,User-Agent
Last-Modified: Sun, 19 Mar 2006 19:38:15 GMT
Via: 1.1 webcache (NetCache NetApp/6.0.1P3)
WebRequest and WebResponse 问题:
WebRequest request = WebRequest.Create(uri);
WebResponse response = request.GetResponse();
Stream streamIn = response.GetResponseStream();
BinaryReader reader = new BinaryReader(streamIn, TextEncoding);
byte[] RecvBuffer = new byte[10240];
int nBytes, nTotalBytes = 0;
while((nBytes = reader.Read(RecvBuffer, 0, 10240)) > 0)
{
nTotalBytes += nBytes;

}
reader.Close();
streamIn.Close();
response.Close();

request = MyWebRequest.Create(uri, request/**//*to Keep-Alive*/, KeepAlive);
MyWebResponse response = request.GetResponse();
byte[] RecvBuffer = new byte[10240];
int nBytes, nTotalBytes = 0;
while((nBytes = response.socket.Receive(RecvBuffer, 0, 10240, SocketFlags.None)) > 0)
{
nTotalBytes += nBytes;

if(response.KeepAlive && nTotalBytes >= response.ContentLength && response.ContentLength > 0)
break;
}
if(response.KeepAlive == false)
response.Close();

/**//* reading response header */
Header = "";
byte[] bytes = new byte[10];
while(socket.Receive(bytes, 0, 1, SocketFlags.None) > 0)
{
Header += Encoding.ASCII.GetString(bytes, 0, 1);
if(bytes[0] == '\n' && Header.EndsWith("\r\n\r\n"))
break;
}
private int ThreadCount

{
get
{ return nThreadCount; }
set
{
Monitor.Enter(this.listViewThreads);
for(int nIndex = 0; nIndex < value; nIndex ++)
{
if(threadsRun[nIndex] == null || threadsRun[nIndex].ThreadState != ThreadState.Suspended)
{
threadsRun[nIndex] = new Thread(new ThreadStart(ThreadRunFunction));
threadsRun[nIndex].Name = nIndex.ToString();
threadsRun[nIndex].Start();
if(nIndex == this.listViewThreads.Items.Count)
{
ListViewItem item = this.listViewThreads.Items.Add((nIndex+1).ToString(), 0);
string[] subItems =
{ "", "", "", "0", "0%" };
item.SubItems.AddRange(subItems);
}
}
else if(threadsRun[nIndex].ThreadState == ThreadState.Suspended)
{
ListViewItem item = this.listViewThreads.Items[nIndex];
item.ImageIndex = 1;
item.SubItems[2].Text = "Resume";
threadsRun[nIndex].Resume();
}
}
nThreadCount = value;
Monitor.Exit(this.listViewThreads);
}
}
void EnqueueUri(MyUri uri)

{
Monitor.Enter(queueURLS);
try
{
queueURLS.Enqueue(uri);
}
catch(Exception)
{
}
Monitor.Exit(queueURLS);
}
MyUri DequeueUri()

{
Monitor.Enter(queueURLS);
MyUri uri = null;
try
{
uri = (MyUri)queueURLS.Dequeue();
}
catch(Exception)
{
}
Monitor.Exit(queueURLS);
return uri;
}
发表于 @ 2007年05月02日 00:13:00 | 评论( loading... ) | 举报| 收藏