CS109 Lecture 7

最新推荐文章于 2018-04-16 15:07:11 发布

ZJun310

最新推荐文章于 2018-04-16 15:07:11 发布

阅读量729

点赞数

分类专栏： Data Science 文章标签： CS109

本文链接：https://blog.csdn.net/u014135091/article/details/52066700

版权

Data Science 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

CS109 Lecture 7

Data Scraping

Sources

From a Web Sites
With An API

Copyrights and permission

Be careful and polite
Give credit
Care about media law
Don’t be evil

Useful tags

<h1></h1>
<p></p>
<br>
<a href = 'url'>Link</a>

Useful Libraries for Scraping

urllib
beautifulsoup
pattern
LXML

Get Data From Website

url = 'url'
scource = urllib2.urlopen(url).read()

soup = bs4.BeautifulSoup(source)
soup.findAll('a') # find <a><\a> tag

tag = soup.find('a')
tag.get('href')

C = soup.findAll('p',{'class':'Event'})
t=C[0] 
t.findNextSiblings

Get Data With An API

import json # JavaScript Obejct Notation
import requests
api_key = 'mykey'
url = 'url' + api_key
scource = urllib2.urlopen(url).read()

#---simple example--------
a = {'a':1,'b':2}
s = json.dump(a) 
a2 = json.loads(s) 
#-------------------------
dataDict = json.loads(data)
dtatDict.keys()

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

ZJun310

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

Stanford CS109 Probability for Computer Scientists Lecture Notes

12-30

Stanford CS109 Probability for Computer Scientists Lecture Notes

CS109: Probability for Computer Scientists, Summer 2022笔记合集

维生素C吃多了会上火

09-03

412

CS109: Probability for Computer Scientists, Summer 2022

参与评论您还未登录，请先登录后发表或查看评论

Stanford-CS109-Notes:斯坦福大学CS 109概率课程的全面排版说明

05-26

斯坦福CS 109笔记由Mehran Sahami教授的斯坦福大学CS 109概率课程的综合说明。使用MacTeX发行版中的在LaTeX中进行排版。 distros.[tex|pdf]是本课程涵盖的所有随机变量的摘要。 studysheet.[tex|pdf]是本课程涵盖的（几乎）所有概念的总体摘要，直到有关有用不平等的“有用定理”讲座为止。如果您有任何疑问或意见，请随时与我联系，非常欢迎提出要求。

斯坦福CS课程列表

骑着蜗牛去旅行的博客

07-03

1万+

http://exploredegrees.stanford.edu/coursedescriptions/cs/ CS 101. Introduction to Computing Principles. 3-5 Units. Introduces the essential ideas of computing: data representation, algorithms,

CS109 Lecture 4

ZJun Thinking

07-25

628

CS109 Lecture 4Visualization Goals Communicate (Explanatory)Present data and ideas Explain and inform Provide evidence and support Influence and persuade Analyze (Exploratory)Explore the data Assess a

CS109 Lecture 2

ZJun Thinking

07-25

511

CS109 Lecture 2Concepts Infographics Distribution CDF (cumulative distribution function) python import scipy.stats scipy.stats.norm.cdf(2) Histograms Histogram is easier to interpret than CDF Norm

CS109 Lecture 3

ZJun Thinking

07-25

475

CS109 Lecture 3Visualization GoalPresentation Know facts about data Task: Communicate results Exploration Data without hypothesis Task: Generate hypothesis The grestest value of a picture is when it

CS109 Lecture 5

ZJun Thinking

07-26

690

CS109 Lecture 5Multi-Dimensional Data Visualization Scatterplot Matrices Parallel Coordinates / Flexible Linked Axes Pix-Basses Visualizations / Heat Maps Dimensionality Reduction Extra Example : LineU

CS229 Lecture Notes

08-28

CS229 Lecture Notes

Stanford Machine Learning CS229 lecture notes (autumn 2019)

12-28

7. 无监督学习与K-均值聚类无监督学习是指不依赖于事先标记的数据集进行学习的方法。K-均值聚类是一种基本的无监督学习算法，它通过迭代方法将数据集分成K个簇，并使得簇内的样本相似度高，而不同簇的样本相似度低...

CS229 Lecture notes

10-30

Generative Learning algorithms So far, we’ve mainly been talking about learning algorithms that model p(y|x; θ), the conditional distribution of y given x. For instance, logistic ...

CS224d-Lecture7

09-27

本次分享内容涉及到的关键词为TensorFlow，相关的知识点涵盖了TensorFlow的入门、与Numpy的比较、与Theano的对比以及Deep Learning框架的概览。首先，TensorFlow是由Google开源的深度学习库。...

斯坦福大学资源

女王の专属领地

04-16

3929

斯坦福大学网站：https://cs.stanford.edu/courses/schedules/2017-2018.autumn.phpCourseTitleInstructorTimeRoomcs1CIntroduction to Computing at StanfordSmithby arrangementcs1UPractical UnixZelenski/SarkaTTh 1:30-...

SQL 基础

热门推荐

ZJun Thinking

10-03

2万+

常用代码（选择全部）SELECT * from celebs; （创建表格）CREATE TABLE celebs (id INTEGER, name TEXT, age INTEGER); （插入行数据）INSERT INTO celebs (id, name, age) VALUES (1, 'Justin Bieber', 21); （选择某列）SELECT name FROM celebs;

ZJun Thinking

09-01

1万+

天池竞赛-淘宝穿衣搭配（数据预处理部分）

ZJun Thinking

10-23

5670

赛题简介淘宝网是中国深受欢迎的网购零售平台，其中服饰鞋包行业占据市场的绝大部分份额，围绕着淘宝诞生了一大批优秀的服饰鞋包导购类的产品。穿衣搭配是服饰鞋包导购中非常重要的课题，它所延伸出的技术、算法能广泛应用到大数据营销几乎所有场景中，如搜索、推荐和营销服务。淘宝穿衣搭配算法竞赛将为参赛者提供搭配专家和达人生成的搭配组合数据，百万级别的淘宝商品的文本和图像数据，同时还将提供用户的脱敏行为数据。期待参赛

（过拟合及其防治）Overfitting and Its Avoidance

ZJun Thinking

08-28

1407

Chapter 5.总结 2015年8月27日 19:05 主要内容： Overfitting（问题）判断和防止overfitting 的方式 —————————————————————————————————— 过度拟合的模型往往不能进行一般化推广（generalization）拟合问题需要在两个方面进行权衡需

CS224d Lecture 7：TensorFlow入门与深度学习库比较

在CS224d-Lecture7的TensorFlow教程中，课程主要关注了深度学习框架的比较和选择，以及其在实际项目中的应用。讲座开始时，教授Bharath Ramsundar提到了学期安排的重要事项，包括PSet1的截止日期（4月19日）和PSet2...