Klose_10-CSDN博客

原创 FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.me

如果是之前没有运行过hive，要对mysql进行初始化如果hive开启了远程模式所有hive无法启动元数据库，需要单独开启用hive --service metastore & 或者nohup hive --service metastore &

2021-04-14 09:01:33 849

原创 sklearn做文本数据分析中遇到的问题

文本表示方面CountVectorizer()类使用from sklearn.feature_extraction.text import CountVectorizervec=CountVectorizer()#sklearn函数的通用写法corpus = [ 'This is the first document.', 'This is the second second document.', 'And the third one.', 'Is this the

2020-10-27 09:46:06 380

原创 requests 库

1. 介绍简单介绍一下 requests 库的基本用法2. 安装利用 pip 安装pip install requests3. 基本请求req = requests.get("http://www.baidu.com")req = requests.post("http://www.baidu.com")req = requests.put("http://www.baidu.com")req = requests.delete("http://www.baidu.com")req

2020-10-18 05:50:19 153

原创 Selenium爬取网页详解

Selenium爬取网页详解SeleniumSelenium是一个Web的自动化测试工具，最初是为网站自动化测试而开发的，类型像我们玩游戏用的按键精灵，可以按指定的命令自动操作，不同是Selenium 可以直接运行在浏览器上，它支持所有主流的浏览器（包括PhantomJS这些无界面的浏览器）。Selenium 可以根据我们的指令，让浏览器自动加载页面，获取需要的数据，甚至页面截屏，或者判断网站上某些动作是否发生。Selenium 自己不带浏览器，不支持浏览器的功能，它需要与第三方浏览器结合在一起才能

2020-10-10 05:51:25 3185 1

原创爬虫伪装

爬虫伪装有些网站不会同意程序直接进行访问，如果识别有问题，那么站点根本不会响应，所以为了完全模拟浏览器的工作需要进行伪装自己1.1 设置请求头其中User-Agent代表用的哪个请求的浏览器代码如下：from urllib.request import urlopenfrom urllib.request import Requesturl = 'http://www.server.com/login'user_agent = 'Mozilla/4.0 (compatible; MSIE

2020-10-06 22:10:25 307

原创 python爬虫请求头

请求头网页获取：通过urlopen来进行获取requset.urlopen(url,data,timeout)第一个参数url即为URL，第二个参数data是访问URL时要传送的数据，第三个timeout是设置超时时间。第二三个参数是可以不传送的，data默认为空None，timeout默认为 socket._GLOBAL_DEFAULT_TIMEOUT第一个参数URL是必须要加入的，执行urlopen方法之后，返回一个response对象，返回信息便保存在这里面。response对象：re

2020-10-06 21:21:11 4184 3

Klose_10的博客

原创 FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.me

原创 sklearn做文本数据分析中遇到的问题

原创 requests 库

原创 Selenium爬取网页详解

原创爬虫伪装

原创 python爬虫请求头

空空如也

空空如也

原创 FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.me

原创 sklearn做文本数据分析中遇到的问题

原创 requests 库

原创 Selenium爬取网页详解

原创 爬虫伪装

原创 python爬虫请求头

空空如也

空空如也

原创爬虫伪装