Introduction
As a data gatherer, the skill of scraping website data is indispensable. And one of the most basic and commonly used skills is to use the requests
library for web scraping. In the requests
library, GET and POST requests are the two most common request methods. Today, we are going to delve into the POST requests of the requests
library and provide some code examples.
What is a POST request?
When using the requests
library for web data scraping, we often choose to use GET requests, such as directly accessing a URL to retrieve the page source code for further processing. However, sometimes this method may fail due to reasons such as the need for login or form data submission. In such cases, we need to use POST requests. POST requests are typically used to submit some data to the server, such as sensitive information like usernames and passwords. Generally, GET requests are used to retrieve data, while POST requests are used to submit data.
Implementation of POST requests
Similar to GET requests, using the requests
library to initiate POST requests is also very simple. It only takes a few lines of code to accomplish.
import requests
url = "http://httpbin.org/post"
data = {
"name": "Tom",
"age": 20,
}
response = requests.post(url, data=data)
print(response.text)
In the above code, we first import the requests
library and specify the URL of the POST request. Then, we use the data
parameter to specify the data to be submitted, which is dictionary data containing the data to be passed. Finally, we use the post
method of the requests
library, passing in the URL and the data to be submitted. After execution, we use response
to receive the server’s response and print it out.
Common parameters and invocation methods of POST requests
The requests
library requires multiple parameters to be passed in to complete the entire POST request process. Let’s introduce these parameters one by one.
url
This parameter is easy to understand, it is the target URL we are accessing.
data
The data
parameter is used to pass the data that needs to be submitted in the POST request. This parameter can be dictionary, tuple list, bytes, or file dictionary data types. When the amount of data submitted in the POST request is very large, dictionaries or tuple lists can be used for transmission.
For example, if there is a simple form with two data fields, namely “name” and “age”:
<form method="post" action="http://httpbin.org/post">
<input type="text" name="name" value="" placeholder="Please enter your name">
<input type="number" name="age" value="" placeholder="Please enter your age">
<button type="submit">Submit</button>
</form>
Then, we can use the following code to simulate submitting this form and use response
to get the server’s response:
import requests
url = "http://httpbin.org/post"
data = {"name": "Tom", "age": 20}
response = requests.post(url, data=data)
print(response)
The content of response
here is the submitted form data and the server’s response.
json
If we need to submit JSON data to the server, then we need to use the json
parameter.
For example, if we have JSON data:
{
"name": "Tom",
"age": 20
}
We can pass JSON data using the following code and use response
to get the server’s response:
import requests
url = "http://httpbin.org/post"
data = {"name": "Tom", "age": 20}
response = requests.post(url, json=data)
print(response)
headers
The headers
parameter is used to pass request header information, such as browser information and data encoding information. Request header information determines how the corresponding server correctly parses the transmitted data information.
An example of headers
:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
}
params
The params
parameter is used to pass the parameters data in the URL of the get
method, such as search keywords and other information, which will be added to the URL.
For example:
import requests
url = "http://httpbin.org/post"
data = {"name": "Tom", "age": 20}
params = {"search": "python"}
response = requests.post(url, data=data, params=params)
print(response)
auth
If the URL being accessed requires authentication information such as usernames and passwords, we can use the auth
parameter to pass this information, as shown in the following code:
import requests
url = "http://httpbin.org/post"
payload = {"name": "may", "age": 25}
auth = ("username", "password")
response = requests.post(url, data=payload, auth=auth)
print(response)
timeout
The timeout
parameter is used to set the request timeout period in seconds.
For example:
import requests
url = "http://httpbin.org/post"
payload = {"name": "may", "age": 25}
timeout = 5
response = requests.post(url, data=payload, timeout=timeout)
print(response)
Response information of POST requests
After initiating a POST request under the requests
library, we need to process the server’s response. This mainly includes:
- Accessing response status code
- Accessing response content
- Accessing response header information
Here is a simple example code:
import requests
url = 'http://httpbin.org/post'
data = {"name": "Tom", "age": 20}
response = requests.post(url, data=data)
# Accessing status code
print(response.status_code)
# Accessing and outputting response content
print(response.text)
# Outputting response headers
print(response.headers)
By executing the above code, we will receive the server’s response, which will include the status code, response content, and response headers.
Common scenarios of POST requests
In actual project development, we often use a lot of POST requests. Let’s introduce some common scenarios of using POST requests.
Data submission
For some scenarios that require submitting data to the server, such as form submissions, comment submissions, etc., POST requests are often used. For example, if there is a form data submission scenario on our webpage, we need to submit the form data to the server for processing and data saving.
import requests
url = "http://httpbin.org/post"
data = {"name": "Tom", "age": 20}
response = requests.post(url, data=data)
print(response)
Data updating
In the background management system, we often need to update data, and we can use POST requests to update data to the server.
import requests
url = "http://httpbin.org/post"
data = {"name": "
Tom", "age": 20}
response = requests.post(url, data=data)
print(response)
File uploading
In many web applications, the ability to upload files is one of the necessary functions. The requests
library is capable of this. Of course, requests
cannot upload files through forms, but it can write file data and send it to the server as parameters to achieve the effect of uploading files.
import requests
url = "http://httpbin.org/post"
files = {"file": open("example.txt", "rb")}
response = requests.post(url, files=files)
print(response)
Web API requests
In many scenarios, we need to interact with Web APIs to obtain the data we need. For example, requesting hot news, searching for images, etc. Using POST requests can meet this scenario’s requirements.
import requests
url = "http://httpbin.org/post"
data = {"query": "python learning", "page": 1, "size": 10}
response = requests.post(url, data=data)
print(response)
Conclusion
In this article, we have provided a detailed introduction on how to use the requests
library to implement POST requests. We have explained the parameters required for POST requests and provided code examples to illustrate the application of POST requests in common scenarios such as data submission, data updating, file uploading, and Web API requests.
When sending a POST request, the requester needs to specify three main components: URL, request headers, and request body. URL represents the address to be requested; request headers contain metadata to be sent, such as body length, data type, etc.; request body is the specific data we want to send, such as forms, and these data will be sent to the server.
With the requests
library, specifying the URL and request body is sufficient, as the request headers will be automatically added as needed. When using POST requests, special attention should be paid to the data format in the request body, which generally needs to be encoded according to the server’s requirements, such as using JSON format, URL encoding, etc.
In conclusion, the POST request method provided by the requests
library is very practical for data scraping, as it can meet the needs of various request scenarios. Most importantly, mastering the usage of POST requests enables us to efficiently obtain the data we need, laying a solid foundation for deeper data analysis and exploration.