【专题】爬虫期末练习题

向懒羊羊学习的大猫

于 2024-06-20 21:26:37 发布

阅读量964

点赞数 3

分类专栏：专题文章标签：爬虫

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/Pqf18064375973/article/details/139843061

版权

专题专栏收录该内容

17 篇文章 3 订阅

订阅专栏

Which of the following arrays is valid?

A. [1, 0.3, 8, 6.4]

B. [“Lucy”, 16, “Susan”, 23, “Carrie”, 37]

C. [True, False, “False”, True]

D. [3.14j, 7.3j, 5.1j, 2j]
Which function is most useful to convert a multidimensional array into a one- dimensional array?

A. ravel()

B. reshape()

C. resize() and reshape()

D. All of the above
The np.trace() method gives the sum of__.

A. the entire array

B. the diagonal elements from left to right

C. the diagonal elements from right to left

D. consecutive rows of an array
The function np.transpose() when applied on a one dimensional array gives …….?

A. a reverse array

B. an unchanged original array

C. an inverse array

D. all elements with zeroes
Which library is used for n- dimensional array ?

A. numpy

B. pandas

C. Beautiful Soup

D. re
What is the other name of ndim?

A. shape

B. size

C. rank

D. None of the above
What is the product of the elements of the shape tuple called ?

A. shape

B. size

C. rank

D. None of the above
Which function is used to flatten the shape of the array ?

A. ravel()

B. reshape()

C. resize()

D. vsplit()
Which function is used to reshape or change the dataset ?

A. ravel()

B. reshape()

C. resize()

D. vsplit()
which function is used to split the array horizontally ?

A. ravel()

B. reshape()

C. hsplit()

D. hstack()
Which function is used to stack the array together ?

A. ravel()

B. reshape()

C. hsplit()

D. hstack()
which functionis used to sort the array ?

A. np.range()

B. np.sort()

C. np.eye()

D. np.one()
Which library is used for data analysis and manipulation ?

A. numpy

B. pandas

C. Beautiful Soup

D. re
Which data structure is an array like object containing data nad label(or index)?

A. Series

B. DataFrame

C. Panel

D. Panel 4D
Which function is used to create panadas series ?

A. np.series()

B. pd.series()

C. both of the above

D. None of the above
How many element get accessed using a[0: 5]?

A. 0,1,2,3,4,5

B. 1,2,3,4,5

C. 0,1,2,3,4

D. None of the above
Which function is used to look up a column by name or index?

A. pd.series ()

B. .iloc()

C. .loc()

D. np.column()
Which function is used to look up a column by position?

A. pd.series ()

B. .iloc()

C. .loc()

D. np.column()
Which data structure is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)?

A. Series

B. DataFrame

C. Panel

D. Panel 4D
Which function is used to create data frame in python ?

A. np.dataframe

B. pd.dataframe

C. pd.dataFrame

D. pd.DataFrame()
Which function is used to select top 5 rows of the dataframe ?

A. np.head()

B. pd.head

C. data.head()

D. pd.top()
Which function is used to select bottom 5 rows of the dataframe ?

A. np.bottom

B. pd.bottom

C. data.tail()

D. pd.tail()
How to drop the column from the original data ?

A. data.drop([column_name], axis = 1, inplace = true)

B. data.drop(column_name)

C. data.delete()

D. data.remove()
What is the result of DataFrame[3:9]?

A. Series with sliced index from 3 to 9

B. dict of index positions 3 and 9

C. DataFrame of sliced rows index from 3 to 9

D. DataFrame with data elements at index 3 to 9
What is thename of the process where we automate the process of gathering the data from the internet?

A. Web scraping

B. netscraping

C. internet explorer

D. search engine
Which library is used to parse the unwanted data and helps to organise and format the messy web data by fixing bad HTML ?

A. urllib.request

B. re

C. BeautifulSoup

D. None of the above
Which tool is used to interpret and render the information from a web document and also used to validate the input befor processing it ?

A. requests

B. parser

C. BeautifulSoup

D. None of the above
Which function is used for finding tags based on their name and attributes ?

A. .findAll()

B. bs4

C. parser

D. All of the above
What protocol can be used to retrieve web pages using python??

A. bs4

B. urllib

C. HTTP

D. GET
What is a python library that can be used to send and receive data over HTTP?

A. port

B. urllib

C. HTTP

D. header
What does the following regex match? http[s]?😕/.+?

A. Exact match to ‘http[s]?😕/.+?’

B. http://’ or ‘http[s]😕/’ followed by one or more character

C. http://’ or ‘https://’ followed by one or more characters.

D. https://’ followed by one or more characters.
Which of the following is the only xml parser?

A. html.parser

B. lxml

C. lxml.xml

D. html5lib
Which function is used to make request to the web page and get its HTML?

A. re.get()

B. get.request()

C. requests.get()

D. None of the above
which method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string

A. urllib.request

B. beautifulSoup

C. pretiffy()

D. All of the above
__ is the process of fetching all the web pages linked to a web site.

A. Indexing

B. Claculating relevancy

C. crawling

D. Processing

Websites fetched by crawler are indexed and kept in huge database, What is the process called?

A. Indexing

B. Optimizing

C. Crawling

D. None
What is web crawler also called ?

A. Search optimizer

B. Link Directory

C. web manager

D. web spider
which method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string（）。

A. urllib.request

B. beautifulSoup

C. pretiffy()

D. All of the above
__ is the process of fetching all the web pages linked to a web site.

A. Indexing

B. Calculating relevancy

C. Crawling

D. Processing
What is web crawler also called ?

A. Search optimizer

B. Link Directory

C. web manager

D. web spider
Which of the following is NOT a step in basic web crawling?

A. Parsing HTML content

B. Scraping data from web pages

C. Rendering JavaScript

D. Making HTTP requests
Which HTTP method is typically used by web crawlers to retrieve web pages?

A. GET

B. PUT

C. POST

D. DELETE
Which of the following is NOT a common challenge in web crawling

A. Handling dynamic content

B. Dealing with anti-scraping measures

C. Parsing JSON data

D. Avoiding IP blocking
Which Python library provides a framework for building web crawlers and scrapers with a high level of abstraction?

A. Request

B. BeautifulSoup

C. Pandas

D. Scrapy
What is a benefit of storing media files by reference?

A. it consumes less space on the host server

B. It allows for easier code writing without dealing with file downloads

C. It ensures the file will never be subject to change

D. It reduces the load on the scraper’s bandwidth
Which Python module can be used to download files from a remote URL?

A. os

B. urllib.request

C. bs4

D. BeautifulSoup
What does the os.makedirs() function do in the provided script?

A. Retrieves the target directory for each download

B. Creates missing directories along the path if needed

C. Downloads files to their respective paths

D. Cleans and normalizes URLs
Which module is used to manipulate file paths, create directories, and perform other OS-related tasks in Python?

A .os

B .urllib.request

C. bs4

D. BeautifulSoup
What is the purpose of the csv library in Python?

A. To create HTML tables

B. To manipulate image files

C. To work with CSV files

D. To generate random numbers
What does the get_text() function do in BeautifulSoup?

A. Retrieves the HTML content of a webpage

B. Cleans HTML tags and returns the text

C. Downloads images from a webpage

D. Retrieves metadata from a webpage
What is the purpose of the findAll method in BeautifulSoup?

A. To find all occurrences of a specified tag

B. To find the first occurrence of a specified tag

C. To find all HTML attributes within a tag

D. To find the last occurrence of a specified tag
What is the role of the try and finally blocks in the provided Python code?

A. To handle errors that may occur during file operations

B. To repeat a section of code until a condition is met

C. To execute code only if a certain condition is true

D. To define a loop that iterates over each row of the CSV file
Which of the following is NOT a common characteristic of CSV files?

A. They are widely supported by spreadsheet applications.

B. They use commas to separate data fields.

C. They can contain formatting such as colors and fonts.

D. They are plain text files
Which function is used to select top 5 rows of the dataframe ?

A. np.head()

B. pd.head

C. data.head()

D. pd.top()
How to drop the column from the original data ?

A. data.drop([column_name], axis = 1, inplace = true)

B. data.drop(column_name)

C. data.delete()

D. data.remove()
What is the result of DataFrame[3:9]?

A. Series with sliced index from 3 to 9

B. dict of index positions 3 and 9

C. DataFrame of sliced rows index from 3 to 9

D. DataFrame with data elements at index 3 to 9
which functionis used to sort the array ?

A. np.range()

B. np.sort()

C. np.eye()

D. np.one()
Which library is used for n- dimensional array ?

A. numpy

B. pandas

C. Beautiful Soup

D. re
What is the main advantage of using the Requests library for handling HTTP requests?

A. It requires less code to perform complicated HTTP tasks

B. It provides better compatibility with Microsoft Excel

C. It is built into Python’s core libraries

D. It supports complex file uploads
What is the primary purpose of using cookies in web applications?

A. To store sensitive user data securely

B. To enhance the visual appearance of web pages

C. To track user authentication and session information

D. To prevent unauthorized access to web forms
How does the Requests library handle cookie management for web scraping?

A. By automatically storing cookies in a global variable

B. By prompting the user to manually enter cookie data

C. By allowing users to define custom cookie policies

D. By providing built-in functions for cookie retrieval and usage
What is the primary advantage of using a Requests session object for handling cookies?

A. It allows for parallel processing of multiple requests

B. It simplifies cookie management for complex websites

C. It automatically generates secure session tokens

D. It improves browser compatibility for web scraping
what is the purpose of the give code?
```
import requests

params = {'firstname': 'Price', 'lastname': 'pan'}
r = requests.post(
  'https://pythonscraping.com/pages/files/processing.php',
  params)
print(r.text)
```
A. It submits a form with parameters to a processing script and prints the response.

B. It sends an email newsletter subscription request to a website.

C. It retrieves data from a web server and stores it in a variable

D. It generates a random string and prints it to the console.

—— writing by Pan Qifan(潘琦藩) ——

向懒羊羊学习的大猫

关注

3
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
【专题】爬虫期末练习题

NIIT的爬虫期末复习资料
复制链接

扫一扫

专栏目录

向懒羊羊学习的大猫 CSDN认证博客专家 CSDN认证企业博客

码龄2年

海南师范大学

28: 原创

38万+: 周排名

5万+: 总排名

3万+: 访问

: 等级

480: 积分

166: 粉丝

194: 获赞

12: 评论

326: 收藏

私信

关注

热门文章

分类专栏

教程 7篇
专题 17篇
算法 1篇

最新评论

【教程】数据结构
CSDN-Ada助手: 推荐算法技能树：https://edu.csdn.net/skill/algorithm?utm_source=AI_act_algorithm
【教程】HTML5基础
CSDN-Ada助手: 恭喜你写了第18篇博客！标题看上去很有吸引力，我很期待看到你持续更新的内容。你对HTML5的使用指南一定有深入的了解，我觉得你可以在接下来的创作中考虑加入一些实际案例，让读者更好地理解和应用。希望你能继续保持创作的热情，我相信你的博客会越来越受欢迎的！
【教程】软件工程概论
CSDN-Ada助手: 恭喜作者撰写了第19篇博客！看到标题中的“持续更新中……”让我感到非常兴奋。您的教程系列真是为学习软件工程的人提供了很好的指导。不过，我谦虚地建议在下一篇博客中，或许可以探讨一下软件工程在实际项目中的应用案例，这样读者们将更好地理解概念并将其应用到实践中。期待您的下一次创作！
【教程】软件工程概论
IT届柯南: up速更
【教程】C语言
Chestnut723: 真的详细！！！良心大大！！

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

向懒羊羊学习的大猫 你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。