【专题】爬虫期末练习题

  • Which of the following arrays is valid?

    A. [1, 0.3, 8, 6.4]

    B. [“Lucy”, 16, “Susan”, 23, “Carrie”, 37]

    C. [True, False, “False”, True]

    D. [3.14j, 7.3j, 5.1j, 2j]

  • Which function is most useful to convert a multidimensional array into a one- dimensional array?

    A. ravel()

    B. reshape()

    C. resize() and reshape()

    D. All of the above

  • The np.trace() method gives the sum of__.

    A. the entire array

    B. the diagonal elements from left to right

    C. the diagonal elements from right to left

    D. consecutive rows of an array

  • The function np.transpose() when applied on a one dimensional array gives …….?

    A. a reverse array

    B. an unchanged original array

    C. an inverse array

    D. all elements with zeroes

  • Which library is used for n- dimensional array ?

    A. numpy

    B. pandas

    C. Beautiful Soup

    D. re

  • What is the other name of ndim?

    A. shape

    B. size

    C. rank

    D. None of the above

  • What is the product of the elements of the shape tuple called ?

    A. shape

    B. size

    C. rank

    D. None of the above

  • Which function is used to flatten the shape of the array ?

    A. ravel()

    B. reshape()

    C. resize()

    D. vsplit()

  • Which function is used to reshape or change the dataset ?

    A. ravel()

    B. reshape()

    C. resize()

    D. vsplit()

  • which function is used to split the array horizontally ?

    A. ravel()

    B. reshape()

    C. hsplit()

    D. hstack()

  • Which function is used to stack the array together ?

    A. ravel()

    B. reshape()

    C. hsplit()

    D. hstack()

  • which functionis used to sort the array ?

    A. np.range()

    B. np.sort()

    C. np.eye()

    D. np.one()

  • Which library is used for data analysis and manipulation ?

    A. numpy

    B. pandas

    C. Beautiful Soup

    D. re

  • Which data structure is an array like object containing data nad label(or index)?

    A. Series

    B. DataFrame

    C. Panel

    D. Panel 4D

  • Which function is used to create panadas series ?

    A. np.series()

    B. pd.series()

    C. both of the above

    D. None of the above

  • How many element get accessed using a[0: 5]?

    A. 0,1,2,3,4,5

    B. 1,2,3,4,5

    C. 0,1,2,3,4

    D. None of the above

  • Which function is used to look up a column by name or index?

    A. pd.series ()

    B. .iloc()

    C. .loc()

    D. np.column()

  • Which function is used to look up a column by position?

    A. pd.series ()

    B. .iloc()

    C. .loc()

    D. np.column()

  • Which data structure is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)?

    A. Series

    B. DataFrame

    C. Panel

    D. Panel 4D

  • Which function is used to create data frame in python ?

    A. np.dataframe

    B. pd.dataframe

    C. pd.dataFrame

    D. pd.DataFrame()

  • Which function is used to select top 5 rows of the dataframe ?

    A. np.head()

    B. pd.head

    C. data.head()

    D. pd.top()

  • Which function is used to select bottom 5 rows of the dataframe ?

    A. np.bottom

    B. pd.bottom

    C. data.tail()

    D. pd.tail()

  • How to drop the column from the original data ?

    A. data.drop([column_name], axis = 1, inplace = true)

    B. data.drop(column_name)

    C. data.delete()

    D. data.remove()

  • What is the result of DataFrame[3:9]?

    A. Series with sliced index from 3 to 9

    B. dict of index positions 3 and 9

    C. DataFrame of sliced rows index from 3 to 9

    D. DataFrame with data elements at index 3 to 9

  • What is thename of the process where we automate the process of gathering the data from the internet?

    A. Web scraping

    B. netscraping

    C. internet explorer

    D. search engine

  • Which library is used to parse the unwanted data and helps to organise and format the messy web data by fixing bad HTML ?

    A. urllib.request

    B. re

    C. BeautifulSoup

    D. None of the above

  • Which tool is used to interpret and render the information from a web document and also used to validate the input befor processing it ?

    A. requests

    B. parser

    C. BeautifulSoup

    D. None of the above

  • Which function is used for finding tags based on their name and attributes ?

    A. .findAll()

    B. bs4

    C. parser

    D. All of the above

  • What protocol can be used to retrieve web pages using python??

    A. bs4

    B. urllib

    C. HTTP

    D. GET

  • What is a python library that can be used to send and receive data over HTTP?

    A. port

    B. urllib

    C. HTTP

    D. header

  • What does the following regex match? http[s]?😕/.+?

    A. Exact match to ‘http[s]?😕/.+?’

    B. http://’ or ‘http[s]😕/’ followed by one or more character

    C. http://’ or ‘https://’ followed by one or more characters.

    D. https://’ followed by one or more characters.

  • Which of the following is the only xml parser?

    A. html.parser

    B. lxml

    C. lxml.xml

    D. html5lib

  • Which function is used to make request to the web page and get its HTML?

    A. re.get()

    B. get.request()

    C. requests.get()

    D. None of the above

  • which method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string

    A. urllib.request

    B. beautifulSoup

    C. pretiffy()

    D. All of the above

  • __ is the process of fetching all the web pages linked to a web site.

A. Indexing

B. Claculating relevancy

C. crawling

D. Processing

  • Websites fetched by crawler are indexed and kept in huge database, What is the process called?

    A. Indexing

    B. Optimizing

    C. Crawling

    D. None

  • What is web crawler also called ?

    A. Search optimizer

    B. Link Directory

    C. web manager

    D. web spider

  • which method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string()。

    A. urllib.request

    B. beautifulSoup

    C. pretiffy()

    D. All of the above

  • __ is the process of fetching all the web pages linked to a web site.

    A. Indexing

    B. Calculating relevancy

    C. Crawling

    D. Processing

  • What is web crawler also called ?

    A. Search optimizer

    B. Link Directory

    C. web manager

    D. web spider

  • Which of the following is NOT a step in basic web crawling?

    A. Parsing HTML content

    B. Scraping data from web pages

    C. Rendering JavaScript

    D. Making HTTP requests

  • Which HTTP method is typically used by web crawlers to retrieve web pages?

    A. GET

    B. PUT

    C. POST

    D. DELETE

  • Which of the following is NOT a common challenge in web crawling

    A. Handling dynamic content

    B. Dealing with anti-scraping measures

    C. Parsing JSON data

    D. Avoiding IP blocking

  • Which Python library provides a framework for building web crawlers and scrapers with a high level of abstraction?

    A. Request

    B. BeautifulSoup

    C. Pandas

    D. Scrapy

  • What is a benefit of storing media files by reference?

    A. it consumes less space on the host server

    B. It allows for easier code writing without dealing with file downloads

    C. It ensures the file will never be subject to change

    D. It reduces the load on the scraper’s bandwidth

  • Which Python module can be used to download files from a remote URL?

    A. os

    B. urllib.request

    C. bs4

    D. BeautifulSoup

  • What does the os.makedirs() function do in the provided script?

    A. Retrieves the target directory for each download

    B. Creates missing directories along the path if needed

    C. Downloads files to their respective paths

    D. Cleans and normalizes URLs

  • Which module is used to manipulate file paths, create directories, and perform other OS-related tasks in Python?

    A .os

    B .urllib.request

    C. bs4

    D. BeautifulSoup

  • What is the purpose of the csv library in Python?

    A. To create HTML tables

    B. To manipulate image files

    C. To work with CSV files

    D. To generate random numbers

  • What does the get_text() function do in BeautifulSoup?

    A. Retrieves the HTML content of a webpage

    B. Cleans HTML tags and returns the text

    C. Downloads images from a webpage

    D. Retrieves metadata from a webpage

  • What is the purpose of the findAll method in BeautifulSoup?

    A. To find all occurrences of a specified tag

    B. To find the first occurrence of a specified tag

    C. To find all HTML attributes within a tag

    D. To find the last occurrence of a specified tag

  • What is the role of the try and finally blocks in the provided Python code?

    A. To handle errors that may occur during file operations

    B. To repeat a section of code until a condition is met

    C. To execute code only if a certain condition is true

    D. To define a loop that iterates over each row of the CSV file

  • Which of the following is NOT a common characteristic of CSV files?

    A. They are widely supported by spreadsheet applications.

    B. They use commas to separate data fields.

    C. They can contain formatting such as colors and fonts.

    D. They are plain text files

  • Which function is used to select top 5 rows of the dataframe ?

    A. np.head()

    B. pd.head

    C. data.head()

    D. pd.top()

  • How to drop the column from the original data ?

    A. data.drop([column_name], axis = 1, inplace = true)

    B. data.drop(column_name)

    C. data.delete()

    D. data.remove()

  • What is the result of DataFrame[3:9]?

    A. Series with sliced index from 3 to 9

    B. dict of index positions 3 and 9

    C. DataFrame of sliced rows index from 3 to 9

    D. DataFrame with data elements at index 3 to 9

  • which functionis used to sort the array ?

    A. np.range()

    B. np.sort()

    C. np.eye()

    D. np.one()

  • Which library is used for n- dimensional array ?

    A. numpy

    B. pandas

    C. Beautiful Soup

    D. re

  • What is the main advantage of using the Requests library for handling HTTP requests?

    A. It requires less code to perform complicated HTTP tasks

    B. It provides better compatibility with Microsoft Excel

    C. It is built into Python’s core libraries

    D. It supports complex file uploads

  • What is the primary purpose of using cookies in web applications?

    A. To store sensitive user data securely

    B. To enhance the visual appearance of web pages

    C. To track user authentication and session information

    D. To prevent unauthorized access to web forms

  • How does the Requests library handle cookie management for web scraping?

    A. By automatically storing cookies in a global variable

    B. By prompting the user to manually enter cookie data

    C. By allowing users to define custom cookie policies

    D. By providing built-in functions for cookie retrieval and usage

  • What is the primary advantage of using a Requests session object for handling cookies?

    A. It allows for parallel processing of multiple requests

    B. It simplifies cookie management for complex websites

    C. It automatically generates secure session tokens

    D. It improves browser compatibility for web scraping

  • what is the purpose of the give code?

    import requests
    
    params = {'firstname': 'Price', 'lastname': 'pan'}
    r = requests.post(
      'https://pythonscraping.com/pages/files/processing.php',
      params)
    print(r.text)
    

    A. It submits a form with parameters to a processing script and prints the response.

    B. It sends an email newsletter subscription request to a website.

    C. It retrieves data from a web server and stores it in a variable

    D. It generates a random string and prints it to the console.

—— writing by Pan Qifan(潘琦藩) ——

  • 3
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

向懒羊羊学习的大猫

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值