百度贴吧帖子爬取
观察单个帖子的结构
<ul id="thread_list" class="threadlist_bright j_threadlist_bright">
### 中间省略一些,后面帖子都是li为节点
<li class=" j_thread_list clearfix thread_item_box" data-field='{"id":7329474777,"author_name":"tt199402044","author_nickname":"\u4e0d\u6b7b\u8eab\u7684tt\ud83d\ude1c","author_portrait":"tb.1.29a182d6.QHxcL51roDeIycneJe8sFg","first_post_id":139094214678,"reply_num":0,"is_bakan":null,"vid":"","is_good":null,"is_top":null,"is_protal":null,"is_membertop":null,"is_multi_forum":null,"frs_tpoint":null}' data-tid='7329474777' data-thread-type="0" data-floor='1''>
<div class="t_con cleafix">
<div class="col2_left j_threadlist_li_left">
<span class="threadlist_rep_num center_text"
title="回复">0</span>
</div>
<div class="col2_right j_threadlist_li_right ">
<div class="threadlist_lz clearfix">
<div class="threadlist_title pull_left j_th_tit ">
<a rel="noreferrer" href="/p/7329474777" title="其实现在王路飞事在鹤凯多一成实力斗殴???" target="_blank" class="j_th_tit ">其实现在王路飞事在鹤凯多一成实力斗殴???</a>
</div><div class="threadlist_author pull_right">
<span class="tb_icon_author "
title="主题作者: 不死身的tt😜"
data-field='{"user_id":719230442}' ><i class="icon_author"></i><span class="frs-author-name-wrap"><a rel="noreferrer" data-field='{"un":"tt199402044","id":"tb.1.29a182d6.QHxcL51roDeIycneJe8sFg"}' class="frs-author-name j_user_card " href="/home/main/?un=tt199402044&ie=utf-8&id=tb.1.29a182d6.QHxcL51roDeIycneJe8sFg&fr=frs" target="_blank">不死身的t...</a></span><span class="icon_wrap icon_wrap_theme1 frs_bright_icons "></span> </span>
<span class="pull-right is_show_create_time" title="创建时间">15:40</span>
</div>
</div>
<div class="threadlist_detail clearfix">
<div class="threadlist_text pull_left">
<div class="threadlist_abs threadlist_abs_onlyline ">
凯多实力保留不是手环,而是他用九层法力在托鬼岛往花之都前进 所以真正决战 还是到了花之都吧
</div>
<div class="small_wrap j_small_wrap">
<a rel="noreferrer" href="#" onclick="return false;" class="small_btn_pre j_small_pic_pre" style="display:none"></a>
<a rel="noreferrer" href="#" onclick="return false;" class="small_btn_next j_small_pic_next" style="display:none"></a>
<div class="small_list j_small_list cleafix">
<div class="small_list_gallery">
<ul class="threadlist_media j_threadlist_media clearfix" id="fm7329474777"><li><a rel="noreferrer" class="thumbnail vpic_wrap"><img src="https://tb3.bdstatic.com/public/img/icon_pc_picheader_n.432946a7.png" attr="86901" data-original="http://tiebapic.baidu.com/forum/wh%3D200%2C90%3B/sign=391f1b92a41bb0518f71bb2a064af68d/6310f8dcd100baa15d79500b5010b912c9fc2ed2.jpg" bpic="http://tiebapic.baidu.com/forum/w%3D580%3B/sign=5b7b6bd41e23dd542173a760e132b2de/738b4710b912c8fc53a6e17deb039245d688212c.jpg" class="threadlist_pic j_m_pic " /></a><div class="threadlist_pic_highlight j_m_pic_light"></div></li></ul>
</div>
</div>
</div> </div>
<div class="threadlist_author pull_right">
<span class="tb_icon_author_rely j_replyer " title="最后回复人: 不死身的tt😜">
<i class="icon_replyer"></i>
<a rel="noreferrer" data-field='{"un":"tt199402044","id":"tb.1.29a182d6.QHxcL51roDeIycneJe8sFg"}' class="frs-author-name j_user_card " href="/home/main/?un=tt199402044&ie=utf-8&id=tb.1.29a182d6.QHxcL51roDeIycneJe8sFg&fr=frs" target="_blank">不死身的t...</a> </span>
<span class="threadlist_reply_date pull_right j_reply_data" title="最后回复时间">
15:40 </span>
</div>