spider-python (媒体信息的爬取)

环境搭建

selenium-3.8.1+python2.7+chromedriver
具体的搭建方式请百度

媒体基础信息爬取实例

app-spider.py
# coding: UTF-8
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import os
import sys
reload(sys)
sys.setdefaultencoding('utf8')



driver = webdriver.Chrome() 

def getAppName(key):

    driver.get("https://www.qimai.cn/")
    driver.set_window_size(1000,1000)  
    attrible = driver.find_element_by_class_name("dropdown-box") 
    ActionChains(driver).move_to_element(attrible).perform()  
    time.sleep(1)

    # left_click = driver.find_element_by_xpath("//i[@class='iconfont icon-anzhuo']/..")
    # left_click = driver.find_element_by_xpath("//i[@class='iconfont icon-ios']/../../li[1]")

    if  key.isdigit() :
        left_click = driver.find_element_by_xpath("//i[@class='iconfont icon-ios']/../../li[1]")      
    else :
        left_click = driver.find_element_by_xpath("//i[@class='iconfont icon-anzhuo']/..")  

    ActionChains(driver).click(left_click).perform() 

    item_inp = driver.find_element_by_xpath("//div[@class='search-wrap']/div[1]/input[@class='ivu-input']")
    item_inp.send_keys(key.decode('utf-8'))
    item_inp.send_keys(Keys.RETURN) 
    time.sleep(3)
    cunrtntUrl = driver.current_url
    print cunrtntUrl
    appname = driver.find_element_by_xpath("//div[@class='appname']").text
    return appname


def main():
    print '--start--'

    # key = '414478124'
    # key = 'com.tencent.mm'
    # key = 'com.wedobest.xiangqi.mz'

    file = open("appid")
    os.remove("appname")
    fo = open("appname", "a+")
    fo.truncate() 
    while 1:
        lines = file.readlines(100000)
        if not lines:
            break
        for line in lines:
            appid = line.replace("\n", "")
            try:
                id_name = appid+","+getAppName(appid)+","
                fo.write(id_name+'\n')
            except:
                continue

            
            print id_name
            time.sleep(3)
    
    driver.quit()
    file.close()
    fo.close()


    print '--end--'


if __name__ == "__main__":
    main()
参考文章:
http://seleniumhq.github.io/selenium/docs/api/py/index.html
  • 3
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值