6.3.2 head
Venkat writes that he has a text file (CSV) containing over 50,000 URLs. I want to run a program that will take this file as input and output a text file which contains only the valid URLs. Basically I need a URL/Link Validator that can perform this job. I tried to put together a custom C# program to do this, but it takes several minutes just to do a hundred URL. Is there any program/code you are aware that can do this?I recommended a Range Retrieval Request, such as those used by GETRIGHT. GetRight uses a Range Retrieval Request, like this. You can do this in .NET by just adding the name/values for Range to the Headers collection. NOTE: The Server CAN (and many will) ignore this request. If you get partial content, you wont get an OK 200, youll get a 206 and the Content-Length will have the amount of data included.
Venkat写道,他有一个包含超过50,000个URL的文本文件(CSV)。 我想运行一个程序,将这个文件作为输入并输出一个仅包含有效URL的文本文件。 基本上,我需要一个可以执行此工作的URL /链接验证程序。 我试图将一个自定义的C#程序组合在一起来执行此操作,但是只需要花几分钟就可以完成一百个URL。 您是否知道可以执行此操作的任何程序/代码??? 我推荐了一个范围检索请求,例如GETRIGHT使用的那些。 GetRight使用范围检索请求,如下所示。 您只需在.NET中通过将Range的名称/值添加到Headers集合中即可。 注意:服务器可以(很多人会)忽略此请求。 如果获得部分内容,则不会获得OK 200,会得到206,并且Content-Length将包含所包含的数据量。
However, another fellow, more clever than myself wrote me to say that a HEAD (rather than a GET) should provide enough information - namely the headers - to determine page existance, without the trouble of the HTTP Body Content. Good stuff!
但是,另一个比我聪明的家伙写信给我说,HEAD(而不是GET)应该提供足够的信息(即标头)来确定页面是否存在,而不会造成HTTP正文内容的麻烦。 好东西!
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.2http://www.vbip.com/winsock/winsock_http_08_01.asp
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.2 http://www.vbip.com/winsock/winsock_http_08_01.asp
翻译自: https://www.hanselman.com/blog/http-head-and-range-requests
6.3.2 head
353

被折叠的 条评论
为什么被折叠?



