世纪佳缘信息爬取存储到mysql,下载图片到本地,从数据库选取账号对其发送消息更新发信状态

利用这种方法,可以把所有会员信息存储下来,多线程发信息,10秒钟就可以对几百个会员完成发信了。

 

首先是筛选信息后爬取账号信息,

#-*-coding:utf-8-*-
import requests,re,json,time,threadpool,os
from mydba import MySql
from gevent import monkey


#monkey.patch_all()

header={
    'Cookie':'guider_quick_search=on; SESSION_HASH=xxxxxxxxcd4523713c3d3700579350daa9992f1a; user_access=1; save_jy_login_name=1314xxxx; sl_jumper=%26cou%3D17; last_login_time=1498576743; user_attr=000000; pclog=%7B%22160961843%22%3A%221498576757268%7C1%7C0%22%7D; IM_S=%7B%22IM_CID%22%3A8677484%2C%22svc%22%3A%7B%22code%22%3A0%2C%22nps%22%3A0%2C%22unread_count%22%3A%220%22%2C%22ocu%22%3A0%2C%22ppc%22%3A0%2C%22jpc%22%3A0%2C%22regt%22%3A%221486465367%22%2C%22using%22%3A%2240%2C33%2C2%2C%22%2C%22user_type%22%3A%2210%22%2C%22uid%22%3A160961843%7D%2C%22IM_SV%22%3A%22123.59.161.3%22%2C%22m%22%3A26%2C%22f%22%3A0%2C%22omc%22%3A0%7D; FROM_BD_WD=%25E4%25B8%2596%25E7%25BA%25AA%25E4%25BD%25B3%25E7%25BC%2598; FROM_ST_ID=416640; FROM_ST=.jiayuan.com; REG_ST_ID=15; REG_ST_URL=http://bzclk.baidu.com/adrc.php?t=06KL00c00f7t0wC0Gfum0QkHAsjnX7Fu00000PNeYH300000uybcI1.THL2sQ1PEPZRVfK85yF9pywd0ZnqryRkryfdryDsnj0kuWR3u0Kd5RuKP17Knj97rDDvPRNAPH9APbRkwHc4njIAnDPjP1NA0ADqI1YhUyPGujYzrH0dnWTYnHckFMKzUvwGujYkP6K-5y9YIZ0lQzqzuyT8ph-9XgN9UB4WUvYETLfE5v-b5HfkPWmYnaudThsqpZwYTjCEQLILIz4Jpy74Iy78QhPEUfKWThnqnWRdnjn&tpl=tpl_10762_15668_1&l=1053916117&attach=location%3D%26linkName%3D%25E6%25A0%2587%25E9%25A2%2598%26linkText%3D%25E4%25B8%2596%25E7%25BA%25AA%25E4%25BD%25B3%25E7%25BC%2598%25E7%25BD%2591%25EF%25BC%2588Jiayuan.com%25EF%25BC%2589%253A2017%25EF%25BC%258C%26xp%3Did(%2522m78828180%2522)%252FDIV%255B1%255D%252FDIV%255B1%255D%252FDIV%255B1%255D%252FH2%255B1%255D%252FA%255B1%255D%26linkType%3D%26checksum%3D191&ie=UTF-8&f=8&tn=baidu&wd=%E4%B8%96%E7%BA%AA%E4%BD%B3%E7%BC%98&oq=%E4%B8%96%E7%BA%AA%E4%BD%B3%E7%BC%98&rqlang=cn; REG_REF_URL=http://www.jiayuan.com/usercp/profile.php?action=work; PHPSESSID=d5557cc5a4560c14c2dc685bcb7fc009; stadate1=159xxxx; myloc=44%7C4403; myage=xx; PROFILE=160xxxxxx%3A%25E9%25A3%258E%3Am%3Aat1.jyimg.com%2F42%2Ffb%2F4558c12f370e446240ba148bcac4%3A1%3A%3A1%3A4558c12f3_2_avatar_p.jpg%3A1%3A1%3A61%3A10; mysex=m; myuid=159xxxxxxx; myincome=40; mylevel=2; main_search:160961843=%7C%7C%7C00; RAW_HASH=J3ewrCGVZG5eU2agrymP2-bNz0IQQpivXOncdsOO63oS3%2AH4o4%2AsAifV0twuNFXqWippm3rMHXvTK%2APmGbap-ZYZpz-18ogWLnBXkcpY87GlFps.; COMMON_HASH=424558c12fxxxxx40ba148bcac4fb; IM_CON=%7B%22IM_TM%22%3A1498577459991%2C%22IM_SN%22%3A3%7D; IM_M=%5B%5D; pop_time=1498577573654; IM_CS=2; IM_ID=12; is_searchv2=1; IM_TK=1498577921417',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36',
}

def fun(i):

    url='http://search.jiayuan.com/v2/search_v2.php?sex=f&key=&stc=1%3A4403%2C2%3A23.27%2C3%3A155.170%2C23%3A1&sn=default&sv=1&p='+str(i)+'&f=select&listStyle=bigPhoto&pri_uid=xxxx&jsversion=v5'

    while(1):
        res=''
        try:
            res=requests.get(url,headers=header)
        except Exception,e:
            print e
        if res!='':
            break



    content=res.content
    dictx= json.loads(re.findall('##jiayser##([\s\S]*?)##jiayser##',content)[0])   ##content是一个json但加了其他东西,需要把json提取出来
    #print dictx
    for dcx in dictx['userInfo']:
        #print dcx
        uhash=dcx['helloUrl
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值