appium+python爬取其他人微信朋友圈（一）

最新推荐文章于 2025-04-12 14:45:28 发布

未央~

最新推荐文章于 2025-04-12 14:45:28 发布

阅读量1w

点赞数 9

分类专栏： appium 文章标签： appium 朋友圈 python 词云

本文链接：https://blog.csdn.net/weixin_42138362/article/details/96478733

版权

appium 专栏收录该内容

1 篇文章

订阅专栏

由于微信朋友圈没有开放接口，想要获取朋友圈信息比较困难。本文利用appium+python，实现抓取自己或任一好友的朋友圈文本信息，并且可以指定年份。抓取朋友圈文本信息后，利用python提取关键字，并利用python的wordcloud包实现可视化。
先看看最终的效果：
在这里插入图片描述

首先点击进入指定好友（或自己）的朋友圈页面。不同版本微信页面元素的id值会有不同（我的是微信7.0.3），需要根据实际情况修改，可以通过uiautomatorviewer查看。

#进入昵称为name的好友的朋友圈的点击逻辑
def enter_pengyouquan(name):
    driver.find_element_by_id('com.tencent.mm:id/iq').click()  #点击搜索图标
    time.sleep(2)
    driver.find_element_by_id('com.tencent.mm:id/kh').send_keys(name)  #输入搜索文字
    time.sleep(2)
    driver.find_element_by_id('com.tencent.mm:id/q0').click()  #点击第一个搜索结果
    driver.find_element_by_id('com.tencent.mm:id/jy').click()  #点击聊天界面右上角三个小点
    driver.find_element_by_id('com.tencent.mm:id/e0c').click() #点击头像
    driver.find_element_by_id('com.tencent.mm:id/d7w').click() #点击朋友圈

定义一个上拉方法。width和height根据自己的手机屏幕大小修改。

#上拉方法
def swipe_up(distance, time):  #distance为滑动距离，time为滑动时间
    width = 1080
    height = 2150  # width和height根据不同手机而定
    driver.swipe(1 / 2 * width, 9 / 10 * height, 1 / 2 * width, (9 / 10 - distance) * height, time)

获取界面上的存有朋友圈文本信息的元素，带图朋友圈和视频朋友圈的配文对应的元素id相同，分享链接的朋友圈的配文和纯文字朋友圈的元素id相同。

def get_onepage_elementlist():
    pict_list = driver.find_elements_by_id('com.tencent.mm:id/nm')  #带图朋友圈配文和视频朋友圈配文
    link_list = driver.find_elements_by_id('com.tencent.mm:id/kt')  #链接朋友圈配文和纯文字朋友圈
    elementlist = pict_list + link_list
    return elementlist

从获取的元素中提取出文本

def get_onepage():
    eleLst = get_onepage_elementlist()
    pagetext = []
    for e in eleLst:
            pagetext.append(e.get_attribute('text'))
    return pagetext

定义get_pages方法，边上拉边提取信息，到达指定年份后停止上拉。year_count作为参数，指定需要抓取的朋友圈年份。现在是2019年，year_count设为1，则会抓取2019年一年的所有朋友圈；year_count设为2，则会抓取2018-2019两年的朋友圈；依次类推。
这里有一个小问题需要处理，假设year_count设为2，,当拉取到2018年年初的几条朋友圈时，2017年的朋友圈也会出现在界面中，get_onepage方法会获取整个页面的所有相关元素，因此会把2017年的部分朋友圈也抓取下来。为解决这个问题，最后需要把2017年的元素删除。

#获取往前倒推year_count年到现在的所有朋友圈
def get_pages(year_count):
    pagestext = []
    current_year = driver.find_element_by_id("com.tencent.mm:id/ekg").get_attribute("text") #获得当前年份
    while True:
        try:
            end_year = str(int(current_year[0:4]) - year_count) + "年"
            y = driver.find_element_by_id("com.tencent.mm:id/ekg").get_attribute("text")   #在页面中寻找显示年份的元素，没找到就会报错，继续上拉
            if y == end_year:   #到达结束年份
                break
            else:  #未到达结束年份，继续上拉
                pagetext=get_onepage()
                for t in pagetext:
                    if t not in pagestext:
                        pagestext.append(t)
                swipe_up(1 / 2, 2000)

        except:
            pagetext = get_onepage()
            for t in pagetext:
                if t not in pagestext:
                    pagestext.append(t)
            swipe_up(1 / 2, 2000)


    pagetext = get_onepage()
    for t in pagetext:
        if t not in pagestext:
            pagestext.append(t)
    while True:
        try:
            driver.find_element_by_id("com.tencent.mm:id/ekg")
            swipe_up(1/12,500)  #继续缓慢上拉保证最后一页都是多余年份的朋友圈
        except:
            break
    #删除最后一页多获取的朋友圈文本
    lastPage=get_onepage()
    for t in lastPage:
        if t in pagestext:
            pagestext.remove(t)
    return pagestext

存储获取的朋友圈文本

def store_PYQText(PYQ_list,store_path):  #将朋友圈文本存储到指定路径
    f = open(store_path, 'w', encoding='utf-8')
    for text in PYQ_list:
        f.write(text + '\n\n')
    f.close()

获取的朋友圈文本中的表情会转为[捂脸]、[笑哭]这种形式，将其删除

def remove_icondesc(list, storepath):
    f = open(storepath, 'w', encoding='utf-8')
    patten = re.compile('\w+(?![\u4e00-\u9fa5]*])')  #匹配除表情文本外的所有文本
    for s in list:
        splitted_sentences = re.findall(patten, s)
        for p in splitted_sentences:
            f.write(p + '\n')
    f.close()

主代码

if __name__ == '__main__':
    desired_caps = {
        'platformName': 'Android',
        'deviceName': '37KNW18710001152',  #设备名
        'platformVersion': '9',
        'appPackage': 'com.tencent.mm',  # apk包名
        'appActivity': 'com.tencent.mm.ui.LauncherUI',  # apk的launcherActivity
        'noReset': 'True',  # 每次运行脚本不用重复输入密码启动微信
        'unicodeKeyboard': 'True',  # 使用unicodeKeyboard的编码方式来发送字符串
        'resetKeyboard': 'True'  # 将键盘给隐藏起来
    }
    driver = webdriver.Remote('http://127.0.0.1:4723/wd/hub', desired_caps)
    time.sleep(5)
    enter_pengyouquan('哈罗皮')
    PYQ_list = get_pages(1)  #获取最近一年的朋友圈
    store_PYQText(PYQ_list,r'D:\词云\哈罗皮完整朋友圈.txt')  #存储原始朋友圈
    remove_icondesc(PYQ_list, r'd:\词云\哈罗皮已处理.txt')  #存储删除表情文本和符号之后的朋友圈，为生成词云做准备
    driver.quit()