Crawler代码分析题目

Code Analysis Questions with Answers

1. What is the output of the following code?

import numpy as np

CURVE_CENTER = 80
grades = np.array([72, 35, 64, 88, 51, 90, 74, 12])

def curve(arrayData):
    average = arrayData.mean()
    change = CURVE_CENTER - average
    new_grades = arrayData + change
    return np.clip(new_grades, 0, 100)

print(curve(grades))

Answer: B: [91.25, 54.25, 83.25, 100.0, 70.25, 100.0, 93.25, 31.25]

2. Create a 5*5 two dimensional matrix and require each row to have values from 0 to 4.

matrix = np.zeros((5,5), dtype='int32')
# Choose the correct line to complete the matrix initialization
matrix *= np.arange(5)
# matrix -= np.arange(5)  # Incorrect option
# matrix += np.arange(5)  # Incorrect option

Answer: C: matrix += np.arange(5)

3. The code in the options proposes to create a 7*7 matrix such that its boundary elements have a value of 1 and the rest are 0.

matrix = np.ones((7,7))
# Choose the correct operation to create the desired matrix
# matrix[1:6,1:6] = 1  # Incorrect option
matrix[1:6,1:6] = 0
# matrix = np.zeros((7,7))  # Incorrect option

Answer: B: matrix[1:6,1:6] = 0

4. Which of the following options has code that creates a vector with 30 random number elements and calculates the arithmetic mean of these values?

vector = np.random.random(30)
# Choose the correct way to calculate the mean
m = vector.mean()
# m = vector.total()  # Incorrect option
# m = vector.average()  # Incorrect option
# m = vector.sum()  # Incorrect option

Answer: A: m = vector.mean()

5. To create a 3*3 matrix (2D array) with element values from 0 to 8, which of the following is correct?

matrix = np.arange(9).reshape(3,3)
# Check if the following options are correct
# matrix = np.arange(3,3).reshape(9)  # Incorrect option
# matrix = np.arange(3,3).reshape(3,3)  # Incorrect option
# None of the above  # Incorrect option

Answer: A: matrix = np.arange(9).reshape(3,3)

6. There is a vector vector = np.zeros(10). The size of the memory space occupied is calculated, which is correct?

# Assume vector is defined as above, choose the correct calculation for memory size
msize = size(vector) * itemsize(vector)  # Incorrect option
# msize = vector.size * vector.itemsize  # Correct option (missing function call syntax)
msize = vector.size() * vector.itemsize()  # Correct option
print(f'内存占用{msize}字节')

Answer: C: msize = vector.size() * vector.itemsize()

7. Which of the following codes is correct to create a vector of 10 values, whose values are all between 0 and 1 and are isotropic, and which vector does not contain 0 and 1?

# Choose the correct method to create the specified vector
matrix = np.linspace(0, 1, 11, endpoint=False)[1:-1]  # Incorrect option (mislabeled as matrix)
# matrix = np.linspace(0,1,11,endpoint=False)[:-1]  # Incorrect option
matrix = np.linspace(0, 1, 11, endpoint=False)[1:]  # Correct option
# matrix = np.linspace(0,1,11,endpoint=False)  # Incorrect option

Answer: C: matrix = np.linspace(0, 1, 11, endpoint=False)[1:]

8. Which is the output of the following code?

import numpy as np
import pandas as pd

participation = np.array([205, 204, 201, 200, 197])
countries = np.array(['Longdon', 'Beijing', 'Athens', 'Sydney', 'Atlanta'])
np_year = np.array([2012, 2008, 2004, 2000, 1996])
df = pd.DataFrame({'year': np_year, 'Host Cities': countries, 'No. of Participationg Countries': participation})
print(df)

Answer: C: The output is a DataFrame with the provided data correctly displayed.

9. There is the following code, please select the correct output:

import numpy as np
import pandas as pd

index = ['a', 'b', 'c', 'd', 'e']
s = pd.Series(np.random.randn(5), index=index)

print(s)
print(s[1:] + s[:-1])

Answer: C: The output includes the original series and the element-wise sum of the series with itself excluding the first and last elements, resulting in NaN for the first and last positions due to the slice operation.

10. There is the following code:

arr = [0, 1, 2, 3, 4]
s1 = pd.Series(arr)
# Options to change the index of s1 to 'A', 'B', 'C', 'D', 'E'

Answer: D: The correct way to change the index is s1.index = ['A', 'B', 'C', 'D', 'E']

11. The code in the following options creates a DataFrame from a NumPy array, which one is correct?

import numpy as np
import pandas as pd

dates = pd.date_range('today', periods=6)
num_arr = np.random.randn(6, 4)
columns = ['A', 'B', 'C', 'D']
# Choose the correct way to create the DataFrame
df1 = pd.DataFrame(num_arr, index=dates, columns=columns)
# df1 = pd.DataFrame(num_arr, index=dates, columns=['A', 'B', 'C', 'E'])  # Incorrect option
# df1 = pd.DataFrame(num_arr, index=dates, columns=['A', 'B', 'C'])  # Incorrect option
# df1 = pd.DataFrame(num_arr, index=dates, columns=['A', 'B', 'C', 'D', 'E'])  # Incorrect option
print(df1)

Answer: B: df1 = pd.DataFrame(num_arr, index=dates, columns=columns)

12. Choose which of the following descriptions of the code is correct.

import requests
from bs4 import BeautifulSoup

headerData = {
    'User-Agent': 'Mozilla/5.0 (compatible; Python requests/2.25.1)'
}
res = requests.get('https://baike.baidu.com', headers=headerData)

soup = BeautifulSoup(res.text, 'lxml')
for link in soup.findAll('a'):
    if 'href' in link.attrs:
        print(link.attrs['href'])

Answer: A: The code uses the get method to request https://baike.baidu.com, parses the page to find all <a> tags, and prints the href attributes of the links they contain.

13. Please select the incorrect description:

from urllib import request, error

url = "http://pythonscraping.com/pages/page1.html"
UAdata = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36"
}
req = request.Request(url, headers=UAdata)
try:
    response = request.urlopen(req)
    print(response.read().decode())
except error.HTTPError as e:
    if e.code == 404:
        print('page not found error')

Answer: B: The statement “The requests library is not used, so it doesn’t work” is incorrect because the urllib.request module is used in the code.

14. Judge, which of the options is the possible output?

import requests
from bs4 import BeautifulSoup
import re

response = requests.get('http://www.pythonscraping.com/pages/page3.html')
soup = BeautifulSoup(response.text, 'lxml')

images = soup.find_all('img', {'src': re.compile(r'\.\./img/gifts/img.*\.jpg')})
for image in images:
    print(image['src'])

Answer: B: The output will be a list of image source URLs that match the given regular expression, ending with “.jpg”.

15. There is code for the following page:

<html>
<head>
    <title>demo</title>
</head>
<body>
    <h1>title</h1>
    <p>please click <a href="link">here</a></p>
</body>
</html>

Which is the correct description of the relationship between the <head> and <title> tags in the above page?
Answer: A: The <head> is the parent node of <title>.

16. Please select the correct option to add the missing content?

import requests
from requests import exceptions

def do_request(url, endpoint=None):
    res = None
    timeout = 5
    try:
        if endpoint:
            res = requests.get('/'.join([url, endpoint]), timeout=timeout)
        else:
            res = requests.get(url, timeout=timeout)
        res.
    except exceptions.Timeout as e:
        print('Connection timeout:', e)
    except exceptions.ConnectionError as e:
        print('Connection error:', e)
    except exceptions. as e:
        print('HTTP error:', e)

    return res

if __name__ == '__main__':
    url = "http://pythonscraping.com/pages"
    endpoint = 'page11.html'
    response = do_request(url, endpoint)
    if response:
        print(response.text)
    else:
        print('Connection fail!')

Answer:

  • For : res.raise_for_status()
  • For : HTTPError

17. Please select the correct option to complete the missing section.

import re
import requests
from bs4 import BeautifulSoup
from urllib.request import urlretrieve

url = "http://www.pythonscraping.com"
response = requests.get(url)
response.encoding='utf-8'
soup = BeautifulSoup(response.text, 'lxml')
try:
    content = soup.find('div', {'style': re.compile('(background-image:.*)')})['style']
    imgLocation = re.findall(r'\(\\'(http.*)\\'\)', content)[0]
    (imgLocation, imgLocation.split('/')[-1])
    print('Done')
except AttributeError:
    print('not found in page')

Answer: For : urlretrieve

18. Which of the following piece of code enables the submission of the above form data?

<form method="post" action="processing.php">
    First name: <input type="text" name="firstname"><br>
    Last name: <input type="text" name="lastname"><br>
    <input type="submit" value="Submit">
</form>
import requests

# Choose the correct way to submit the form data
params = {'firstname': 'Michael', 'lastname': 'Scofield'}
res = requests.post('https://pythonscraping.com/pages/files/processing.php', data=params)
print(res.text)
# Other options are incorrect as they use 'get' instead of 'post' or omit the 'data' parameter.

Answer: A: The correct way to submit the form data is using requests.post with the data parameter.

19. Running the crawler traincast in pycharm creates a convenience launcher, which of the following options is the correct code?

from scrapy import cmdline

# Choose the correct way to execute the scrapy crawl command
cmdline.execute("scrapy crawl traincast".split(" "))

Answer: B: The correct way to execute the scrapy crawl command in a script is cmdline.execute("scrapy crawl traincast".split(" "))

  • 45
    点赞
  • 41
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值