https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share
Finding an Element with the select() Method调用select()方法从BeautifulSoup对象索取网页元素,并用CSS 选择器传递你寻找的元素。 你可以从BeautifulSoup对象 A full discussion of CSS selector syntax is beyond the scope of this book (there’s a good selector tutorial in the resources athttp://nostarch.com/automatestuff/), but here’s a short introduction to selectors. Table 11-2 shows examples of the most common CSS selector patterns. Table 11-2. Examples of CSS Selectors
The >>> import bs4 >>> exampleFile = open('example.html') >>> exampleSoup = bs4.BeautifulSoup(exampleFile.read()) #read()把文件当做一个字符串读取 >>> elems = exampleSoup.select('#author') >>> type(elems) <class 'list'> >>> len(elems) 1 >>> type(elems[0]) <class 'bs4.element.Tag'> >>> elems[0].getText() 'Al Sweigart' >>> str(elems[0]) '<span id="author">Al Sweigart</span>' >>> elems[0].attrs {'id': 'author'} 这代码把 id="author" 的元素从example HTML文档中提取出来。
我们把Tag列表对象存储进elems变量, len(elems)告诉我们列表里只有一个Tag标签 元素调用函数getText() 返回元素的文字内容。 attrs返回元素属性 str() 返回字符串,字符串包含标签符
This code will pull the element with Passing the element to You can also pull all the >>> pElems = exampleSoup.select('p') >>> str(pElems[0]) '<p>Download my <strong>Python</strong> book from <a href=" http://inventwithpython.com">my website</a>.</p>' >>> pElems[0].getText() 'Download my Python book from my website.' >>> str(pElems[1]) '<p class="slogan">Learn Python the easy way!</p>' >>> pElems[1].getText() 'Learn Python the easy way!' >>> str(pElems[2]) '<p>By <span id="author">Al Sweigart</span></p>' >>> pElems[2].getText() 'By Al Sweigart' This time, Getting Data from an Element’s AttributesThe >>> import bs4 >>> soup = bs4.BeautifulSoup(open('example.html')) >>> spanElem = soup.select('span')[0] >>> str(spanElem) '<span id="author">Al Sweigart</span>' >>> spanElem.get('id') 'author' >>> spanElem.get('some_nonexistent_addr') == None True >>> spanElem.attrs {'id': 'author'} 这里我们选择 select()方法找到 <span> 元素,并把匹配的第一元素存储在spanElem变量里。
传输id属性到get()函数,返回属性值 'author' Here we use |