很多朋友没有安装matlab
2016b及以上的版本,因此没有webread这个函数,而在mathworks上发现了一个urlread的扩展函数urlread2函数,连接urlread2 ,这是大牛Jim Hokanson 利用java编写的,参见Expanding urlread
capabilities
【Introduction
】简介
HTTP is the underlying computer
networking protocol that enables us to read webpages on the
Internet. It consists of a request made by the user to an Internet
server (typically located via URL), and a response from that
server. Importantly, the request and response consist of three main
parts: a resource line (for requests) or status line (for
responses), followed by headers, and optionally a message
body.
HTTP是使我们能够在因特网上读取网页的底层计算机网络协议。它由用户向互联网服务器(通常通过URL定位)和来自该服务器的响应组成的请求。重要的是,请求和响应包括三个主要部分:资源行(用于请求)或状态行(用于响应),接着是报头,并且可选地是消息体。
Matlab’s built-in urlread
function enables Matlab users to easily read the server’s response
text into a Matlab string:
text =
urlread('http://www.google.com');
MATLAB的内置urlread函数使MATLAB用户能够轻松地读取服务器的响应文本到MATLAB字符串中:
This is done internally using
Java code that connects to the specified URL and reads the
information sent by the URL’s server (more on this).
这是内部使用的java代码,连接到指定的URL和读取的URL的服务器发送的信息。
urlread accepts optional
additional inputs specifying the request type (‘get’ or ‘post’) and
parameter values for the request.
urlread接受可选的附加输入,指定请求类型(“get”或“post”)和请求的参数值。
Unfortunately, urlread has the
following limitations:
It does not allow specification
of request headers
It makes assumptions as to the
request headers needed based on the input method
It does not expose the response
headers and status line
It assumes the response body
contains text, and not a binary payload
It does not enable uploading
binary contents to the server
It does not enable specifying a
timeout in case the server is not responding
不幸的是,URLRead有以下限制:
1.它不允许请求报头的规范。
2.它根据输入法对所需的请求报头进行假设。
3.它不公开响应标题和状态行。
4.它假定响应体包含文本,而不是二进制有效载荷。
5.它无法将二进制内容上传到服务器。
6.如果服务器不响应,则无法启用指定超时。
urlread2
The urlread2 function addresses
all of these problems. The overall design decision for this
function was to make it more general, requiring more work up front
to use in some cases, but more flexibility.
urlread函数解决了所有这些问题。该功能的总体设计决定是使其更通用,需要在某些情况下使用更多的工作,但更灵活。
【语法结构】
For reference, the following is
the calling format for urlread2 (which is reminiscent of
urlread‘s):
urlread2(url,*method,*body,*headersIn,
varargin)
The * indicate optional inputs
that must be spatially maintained.
url – (string), url to
request
method – (string, default GET)
HTTP request method
body – (string, default ”),
body of the request
headersIn – (structure, default
[]), see the following section
varargin – extra properties
that need to be specified via property/pair values
Addressing Problem 1 –
Request header
urlread internally uses a Java
object called urlConnection that is generally an instance of the
class sun.net.www.protocol.http.HttpURLConnection. The method
setRequestProperty() can be used to set headers for the request.
This method has two inputs, the header name and the value of that
header. A simple example of this can be seen below:
urlConnection.setRequestProperty('Content-Type','application/x-www-form-urlencoded');
Here ‘Content-Type’ is the
header name and the second input is the value of that property. My
function requires passing in nearly all headers as a structure
array, with fields for the name and value. The preceding header
would be created using a helper function
http_createHeader.m:
header =
http_createHeader('Content-Type','application/x-www-form-urlencoded');
Multiple headers can be passed
in to the function by concatenating header structures into a
structure array.
Addressing Problem 2 –
Request parameters
When making a POST request,
parameters are generally specified in the message body using the
following format:
[property]=[value]&[property]=[value]
The properties and values are
also encoded in a particular way, generally termed urlencoded
(encoding and decoding can be done using Matlab’s built-in
urlencode and urldecode functions). For GET requests this string is
appended to the url with the “?” symbol. Since urlencoding methods
can vary, and in the spirit of reducing assumptions, I use separate
functions to generate these strings outside of urlread2, and then
pass the result in either as the url (for GET) or as the body input
(for POST). As an example, I might search the Mathworks website
using the upper right search bar on its site for “undocumented
matlab” under file exchange (hmmm… pretty cute stuff there!). Doing
this performs a GET request with the following property/value
pairs:
params =
{'search_submit','fileexchange', 'term','undocumented matlab',
'query','undocumented matlab'};
These property/value pairs are
somewhat obvious from looking at the URL, but could also be
determined by using programs such as Fiddler, Firebug, or
HttpWatch.
After urlencoding and
concatenating, we would form the following string:
search_submit=fileexchange&term=undocumented+matlab&query=undocumented+matlab
This functionality is normally
accomplished internally in urlread, but I use a function
http_paramsToString to produce that result. That function also
returns the required header for POST requests. The following is an
example of both GET and POST requests:
[queryString,header] =
http_paramsToString(params,1);
% For GET:
url = [url '?'
queryString];
urlread2(url)
% For POST:
urlread2(url,'POST',queryString,header)
Addressing Problem 3 –
Response header
According to the HTTP protocol,
each server response starts with a simple header that indicates a
numeric response status. The following Matlab code provides access
to the status line using the urlConnection object:
status =
struct('value',urlConnection.getResponseCode(),
'msg',char(urlConnection.getResponseMessage))
status
=
value: 200
msg: 'OK'
urlConnection‘s
getHeaderField() and getHeaderFieldKey() methods enable reading the
specific parts of the response header:
headerValue =
char(urlConnection.getHeaderField(headerIndex));
headerName =
char(urlConnection.getHeaderFieldKey(headerIndex));
headerIndex starts at 0 and
increases by 1 until both headerValue and headerName return
empty.
It is important to note that
header keys (names) can be repeated for different values. Sometimes
this is desired, such as if there are multiple cookies being sent
to the user. To generically handle this case, two header structures
are returned. In both cases the header names are the field names in
the structure, after replacing hyphens with underscores. In one
case, allHeaders, the values are cell arrays of strings containing
all values presented with the particular key. The other structure,
firstHeaders, contains only the first instance of the header as a
string to avoid needing to dereference a cell array.
Addressing Problem 4 –
Response body
urlread assumes text output.
This is fine for most webpages, which use HTML and are therefore
text-based. However, urlread fails when trying to download any
non-text resource such as an image, a ZIP file, or a PDF document.
I have added a flag in urlread2 called CAST_OUTPUT, which defaults
to true, i.e. text response, just as urlread assumes. Using
varargin, this flag can be set to false ({‘CAST_OUTPUT’,false}) to
indicate a binary response.
Summary
urlread2‘s functionality has
been expanded to also address other limitations of urlread: It
enables binary inputs, better character-set handling of the output,
redirection following, and read timeouts.
The modifications described
above provide direct access to the key components of the HTTP
request and response messages. Its more generic nature lets
urlread2 focus on HTTP transmission, and leaves request formation
and response interpretation up to the user. I think ultimately this
approach is better than providing one-off modifications of the
original urlread function to suit a particular need. urlread2 and
supporting files can be found on the Matlab File
Exchange.
Related posts:
Inactive Control Tooltips &
Event Chaining – Inactive Matlab uicontrols cannot normally display
their tooltips. This article shows how to do this with a
combination of undocumented Matlab and Java hacks....
GUI automation using a Robot –
This article explains how Java's Robot class can be used to
programmatically control mouse and keyboard actions...
Matlab installation woes –
Matlab has some issues when installing a new version. This post
discusses some of them and how to overcome them....
Matlab-Java memory leaks,
performance – Internal fields of Java objects may leak memory -
this article explains how to avoid this without sacrificing
performance. ...
File deletion memory leaks,
performance – Matlab's delete function leaks memory and is also
slower than the equivalent Java function. ...
JGraph in Matlab figures –
JGraph is a powerful open-source Java library that can easily be
integrated in Matlab figures. ...