C 语言手搓HTTP文件上传
背景
事情是这样的,我们在做一款电子学生证产品。
我们要在这上面开发一个人工智能嵌入式应用,语言口语陪聊,语音生图图,创作故事的功能。
这个嵌入式解决方案提供的HTTP库,太LOW,还不完善,很多功能没有实现。例如文件上传。
最终我们只能使用 Socket 链接 http 服务器 80 端口,然后HTTP协议交互,完成文件上传,语音识别等等功能。
还好我是个80后程序员,当前没少干这种活,用 PHP,Perl都干过,还手搓过 SMTP协议,还有NNTP(新闻组协议,估计90/00后小伙伴都没有见过)
可是C语音20多年没有用过了,当年还是学的谭浩强的Turbo C 2.0 呵呵,之前学的还是 Turbo C 2.0 是 DOS 6.22 时代产物。
之前从未做过嵌入式开发,需要做一个 demo 给客户演示。拿到这个产品,一个电子学生证
从拿到这个东西,到完成任务,给客户演示,我用了2周时间,反正2周时间搞完了,这么快主要是心态,我是创业项目,我可以早上一睁眼,干到凌晨才睡,膀胱憋到炸,屎到门口才去卫生间,自己给自己打工,动力不一样。作出来的东西是变现的,所以嗷嗷干,也不感觉累。
而打工人,你这么高会炸干自己,你也没有动力这么快速去学习一种没有接触过的新技术,因为你看不到他的变现,你会本能抗拒。所以你2周不一定能学会。呵呵
第一周基本上是熟悉硬件,开发环境,回忆C语言怎么用。剩下作功能也就1周左右,我做的事一个人工智能产品,就是语音识别,语音生图,语音合成,涉及技术,LVGL 图形库(之前从未接触过)PCM和AMR编码/解码,由于ASR 提供的HTTP库不完善,我还要手搓一个SOCKET HTTP协议,用嗅探器调试搞了好久。
RFC1867协议
要搞懂文件是怎么上传的,很简单,看一下 RFC 1867 协议文档就行了。
Network Working Group E. Nebel
Request For Comments: 1867 L. Masinter
Category: Experimental Xerox Corporation
November 1995
Form-based File Upload in HTML
Status of this Memo
This memo defines an Experimental Protocol for the Internet
community. This memo does not specify an Internet standard of any
kind. Discussion and suggestions for improvement are requested.
Distribution of this memo is unlimited.
1. Abstract
Currently, HTML forms allow the producer of the form to request
information from the user reading the form. These forms have proven
useful in a wide variety of applications in which input from the user
is necessary. However, this capability is limited because HTML forms
don't provide a way to ask the user to submit files of data. Service
providers who need to get files from the user have had to implement
custom user applications. (Examples of these custom browsers have
appeared on the www-talk mailing list.) Since file-upload is a
feature that will benefit many applications, this proposes an
extension to HTML to allow information providers to express file
upload requests uniformly, and a MIME compatible representation for
file upload responses. This also includes a description of a
backward compatibility strategy that allows new servers to interact
with the current HTML user agents.
The proposal is independent of which version of HTML it becomes a
part.
2. HTML forms with file submission
The current HTML specification defines eight possible values for the
attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE,
PASSWORD, RADIO, RESET, SUBMIT, TEXT.
In addition, it defines the default ENCTYPE attribute of the FORM
element using the POST METHOD to have the default value
"application/x-www-form-urlencoded".
Nebel & Masinter Experimental [Page 1]
RFC 1867 Form-based File Upload in HTML November 1995
This proposal makes two changes to HTML:
1) Add a FILE option for the TYPE attribute of INPUT.
2) Allow an ACCEPT attribute for INPUT tag, which is a list of
media types or type patterns allowed for the input.
In addition, it defines a new MIME media type, multipart/form-data,
and specifies the behavior of HTML user agents when interpreting a
form with ENCTYPE="multipart/form-data" and/or <INPUT type="file">
tags.
These changes might be considered independently, but are all
necessary for reasonable file upload.
The author of an HTML form who wants to request one or more files
from a user would write (for example):
<FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>
File to process: <INPUT NAME="userfile1" TYPE="file">
<INPUT TYPE="submit" VALUE="Send File">
</FORM>
The change to the HTML DTD is to add one item to the entity
"InputType". In addition, it is proposed that the INPUT tag have an
ACCEPT attribute, which is a list of comma-separated media types.
... (other elements) ...
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
RADIO | SUBMIT | RESET |
IMAGE | HIDDEN | FILE )">
<!ELEMENT INPUT - 0 EMPTY>
<!ATTLIST INPUT
TYPE %InputType TEXT
NAME CDATA #IMPLIED -- required for all but submit and reset
VALUE CDATA #IMPLIED
SRC %URI #IMPLIED -- for image inputs --
CHECKED (CHECKED) #IMPLIED
SIZE CDATA #IMPLIED --like NUMBERS,
but delimited with comma, not space
MAXLENGTH NUMBER #IMPLIED
ALIGN (top|middle|bottom) #IMPLIED
ACCEPT CDATA #IMPLIED --list of content types
>
Nebel & Masinter Experimental [Page 2]
RFC 1867 Form-based File Upload in HTML November 1995
... (other elements) ...
3. Suggested implementation
While user agents that interpret HTML have wide leeway to choose the
most appropriate mechanism for their context, this section suggests
how one class of user agent, WWW browsers, might implement file
upload.
3.1 Display of FILE widget
When a INPUT tag of type FILE is encountered, the browser might show
a display of (previously selected) file names, and a "Browse" button
or selection method. Selecting the "Browse" button would cause the
browser to enter into a file selection mode appropriate for the
platform. Window-based browsers might pop up a file selection window,
for example. In such a file selection dialog, the user would have the
option of replacing a current selection, adding a new file selection,
etc. Browser implementors might choose let the list of file names be
manually edited.
If an ACCEPT attribute is present, the browser might constrain the
file patterns prompted for to match those with the corresponding
appropriate file extensions for the platform.
3.2 Action on submit
When the user completes the form, and selects the SUBMIT element, the
browser should send the form data and the content of the selected
files. The encoding type application/x-www-form-urlencoded is
inefficient for sending large quantities of binary data or text
containing non-ASCII characters. Thus, a new media type,
multipart/form-data, is proposed as a way of efficiently sending the
values associated with a filled-out form from client to server.
3.3 use of multipart/form-data
The definition of multipart/form-data is included in section 7. A
boundary is selected that does not occur in any of the data. (This
selection is sometimes done probabilisticly.) Each field of the form
is sent, in the order in which it occurs in the form, as a part of
the multipart stream. Each part identifies the INPUT name within the
original HTML form. Each part should be labelled with an appropriate
content-type if the media type is known (e.g., inferred from the file
extension or operating system typing information) or as
application/octet-stream.
Nebel & Masinter Experimental [Page 3]
RFC 1867 Form-based File Upload in HTML November 1995
If multiple files are selected, they should be transferred together
using the multipart/mixed format.
While the HTTP protocol can transport arbitrary BINARY data, the
default for mail transport (e.g., if the ACTION is a "mailto:" URL)
is the 7BIT encoding. The value supplied for a part may need to be
encoded and the "content-transfer-encoding" header supplied if the
value does not conform to the default encoding. [See section 5 of
RFC 1521 for more details.]
The original local file name may be supplied as well, either as a
'filename' parameter either of the 'content-disposition: form-data'
header or in the case of multiple files in a 'content-disposition:
file'