爬取通达信官网上假日休市数据

https://www.tdx.com.cn/url/holiday/   通达信官网上的这个假日休市数据可以爬取吗?

打开网页,F12查看源码,发现一整年的数据都在这里。

用AutoHotkey下载源代码,下载,提取包含*深圳市场* 的记录,解析即可。输出如下:

 

url:="https://www.tdx.com.cn/url/holiday/"
source:=UrlDownloadToVar(url,"GBK")
get=<textarea id="data" style="display:none;">
table:=GetNestedTag(source,get)
out:=[]
loop, Parse, table, `r`n
{
	if InStr(A_LoopField,"深圳市场")
	{
		arr:= StrSplit(A_LoopField, "|")
		out.push(arr)
		str.=arr[1] . "," . arr[2] . "`n"
	}
}
FileDelete holiday.txt
FileAppend,%str%, holiday.txt
Run holiday.txt
return

;;;;;;;;; 辅助函数 ;;;;;;;;

GetNestedTag(data,tag,occurrence="1")
{
	Start:=InStr(data,tag,false,1,occurrence)
	RegExMatch(tag,"i)<([a-z]*)",basetag)
	loop
	{
		until:=InStr(data, "</" basetag1 ">", false, Start, A_Index) + StrLen(basetag1) + 3
		Strng:=SubStr(data, Start, until - Start)
		StringReplace, strng, strng, <%basetag1%, <%basetag1%, UseErrorLevel
		OpenCount:=ErrorLevel
		StringReplace, strng, strng, </%basetag1%, </%basetag1%, UseErrorLevel
		CloseCount:=ErrorLevel
		if (OpenCount = CloseCount)
			break
		if (A_Index > 250)
		{
			strng=
			break
		}
	}
	if (StrLen(strng) < StrLen(tag))
		strng=
	return strng
}
UrlDownloadToVar(URL,Charset="",URLCodePage="",Proxy="",ProxyBypassList="",Cookie="",Referer="",UserAgent="",EnableRedirects="",Timeout=-1)
{
	ComObjError(0)
	WebRequest := ComObjCreate("WinHttp.WinHttpRequest.5.1")
	if (URLCodePage<>"")
		WebRequest.Option(2):=URLCodePage
	if (EnableRedirects<>"")
		WebRequest.Option(6):=EnableRedirects
	if (Proxy<>"")
		WebRequest.SetProxy(2,Proxy,ProxyBypassList)
	WebRequest.Open("GET", URL, true)
	if (Cookie<>"")
	{
		WebRequest.SetRequestHeader("Cookie","tuzi")
		WebRequest.SetRequestHeader("Cookie",Cookie)
	}
	if (Referer<>"")
		WebRequest.SetRequestHeader("Referer",Referer)
	if (UserAgent<>"")
		WebRequest.SetRequestHeader("User-Agent",UserAgent)
	WebRequest.Send()
	WebRequest.WaitForResponse(Timeout)
	if (Charset="")
		return,WebRequest.ResponseText()
	else
	{
		ADO:=ComObjCreate("adodb.stream")
		ADO.Type:=1
		ADO.Mode:=3
		ADO.Open()
		ADO.Write(WebRequest.ResponseBody())
		ADO.Position:=0
		ADO.Type:=2
		ADO.Charset:=Charset
		return,ADO.ReadText()
	}
}

 

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值