A Beginner’s Guide to LibCurl

最新推荐文章于 2019-07-12 11:26:00 发布

zmylamyer

最新推荐文章于 2019-07-12 11:26:00 发布

阅读量264

点赞数

分类专栏： open source 文章标签： libcurl guide curl

open source 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

本文详细介绍了如何利用LibCurl库进行网页登录和下载操作，包括初始化、设置参数、发送请求等步骤，并通过具体实例展示了如何实现网页登录和文件下载。此外，还提供了将LibCurl应用于实际编程挑战的方法，以及如何解决常见的HTTP请求问题，如处理cookies和发送POST数据。

摘要由CSDN通过智能技术生成

Background/Introduction:

LibCurl is an open-source file transfer library that provides developers with a simple yet robust interface for all common transfer-related needs. While aimed primarily for use with C/C++, an extensive list of bindings is available for most languages. Furthermore, LibCurl is supported by almost every conceivable common platform, making it one of the most versatile libraries of its kind.
In this article, we’re going to look at using LibCurl to make simple HTTP requests with its built-in cookie processor to programmatically log in to websites and download web pages. We’re going to be using C for this, as it’s LibCurl’s native language. Afterwards, we’ll look at how to use this from a practical perspective, including how to use LibCurl to solve several HTS Programming Challenges.

Setup / Installation:

The download page for LibCurl can be found here . I won’t go into too much detail (or any, really) about installation simply because the process is fairly straightforward. The developers even provide an intuitive download wizard at the link provided. The installation guide, if necessary, is well-written and can be found here . When in doubt, ask Google. If all else fails, ask your fellow HTS members.
You will, of course, need to link to the LibCurl libraries when building your programs. You may also have to configure your compiler/linker to look in the appropriate directories for the LibCurl include files. Once again, the process is usually simple, but as all compilers are different, you may to have to seek external help in getting things set up if you’re new to this. Once LibCurl is set up, you’re ready to start building your own file transfer programs.

Retrieving a webpage:

Hereafter, I will refer to LibCurl simply as LC, for convenience.

All of the code below has been tested and compiled with Code::Blocks on Windows Vista.

First we’re going to be using LC to retrieve a webpage using the following code snippets. After that, we’ll look at using cookies and HTTP POST data.

First, our main LC include file.

CODE :

#include <curl/curl.h>

Before making any calls to LC functions, you must initialize LC globally. For our sake, we can do this with one simple line of code:

CODE :

curl_global_init( CURL_GLOBAL_ALL );

This call will initialize all known LC sub-modules. After this, we’re ready to create a session handle. For this, we’re going to use LC’s “easy interface.”

CODE :

CURL * myHandle; CURLcode result; // We’ll store the result of CURL’s webpage retrieval, for simple error checking. myHandle = curl_easy_init ( ) ; // Notice the lack of major error-checking, for brevity

With our new handle, we can now start transferring files. To do this, we must tell LC how to operate: What URL to connect to, amongst other things. LC operates as a state machine, much like OpenGL. This means that once we send parameters to LC telling it how to operate, those settings will stay in effect until we explicitly change them. This can be great for avoiding redundancy between retrievals.

CODE :

curl_easy_setopt( myHandle, CURLOPT_URL, "http://www.example.com"); result = curl_easy_perform( myHandle ); curl_easy_cleanup( myHandle );

That’s all there is to retrieving a webpage with LC. Yes, really. The following C program, made from the code snippets above, will output the HTML from example.com to a console.

CODE :

#include <stdio.h> #include <stdlib.h> #include <curl/curl.h> int main() { curl_global_init( CURL_GLOBAL_ALL ); CURL * myHandle; CURLcode result; // We’ll store the result of CURL’s webpage retrieval, for simple error checking. myHandle = curl_easy_init ( ) ; // Notice the lack of major error checking, for brevity curl_easy_setopt(myHandle, CURLOPT_URL, "http://www.example.com"); result = curl_easy_perform( myHandle ); curl_easy_cleanup( myHandle ); printf("LibCurl rules!\n"); return 0; }

But what if you don’t want to output to the console? What if you want to want to output to, say, a file? Easy. First we just need to define a function to accept LC’s output into a C-style struct, pass a function pointer to LC, and then send the contents of that struct to a file. I won’t go into much detail here, the source should speak for itself. The following code will download the HTML from example.com and save it to a file, example.html. There’s not much error checking in order to save space. If you have any questions not related to LibCurl, look around here .

CODE :

#include <stdlib.h> #include <string.h> #include <curl/curl.h> #include <curl/types.h> #include <curl/easy.h> // Define our struct for accepting LCs output struct BufferStruct { char * buffer; size_t size; }; // This is the function we pass to LC, which writes the output to a BufferStruct static size_t WriteMemoryCallback (void *ptr, size_t size, size_t nmemb, void *data) { size_t realsize = size * nmemb; struct BufferStruct * mem = (struct BufferStruct *) data; mem->buffer = realloc(mem->buffer, mem->size + realsize + 1); if ( mem->buffer ) { memcpy( &( mem->buffer[ mem->size ] ), ptr, realsize ); mem->size += realsize; mem->buffer[ mem->size ] = 0; } return realsize; } int main() { curl_global_init( CURL_GLOBAL_ALL ); CURL * myHandle; CURLcode result; // We’ll store the result of CURL’s webpage retrieval, for simple error checking. struct BufferStruct output; // Create an instance of out BufferStruct to accept LCs output output.buffer = NULL; output.size = 0; myHandle = curl_easy_init ( ) ; /* Notice the lack of major error checking, for brevity */ curl_easy_setopt(myHandle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback); // Passing the function pointer to LC curl_easy_setopt(myHandle, CURLOPT_WRITEDATA, (void *)&output); // Passing our BufferStruct to LC curl_easy_setopt(myHandle, CURLOPT_URL, "http://www.example.com"); result = curl_easy_perform( myHandle ); curl_easy_cleanup( myHandle ); FILE * fp; fp = fopen( "example.html","w"); if( !fp ) return 1; fprintf(fp, output.buffer ); fclose( fp ); if( output.buffer ) { free ( output.buffer ); output.buffer = 0; output.size = 0; } printf("LibCurl rules!\n"); return 0; }

Retrieving a webpage (With cookies!):

The example above is cool if you’re just starting out, but it’s fairly useless. You’ll notice that if you try to retrieve, say, a Facebook profile, LC will just retrieve the Facebook login page. The reason? Cookies. When you log in to a website, the remote server will send you a cookie, and expect the browser to resend that cookie for later HTTP requests. No cookie, no webpage. At least, not the one you were expecting. If you’re new to cookies, it’s imperative to at least read over the CookieCentral FAQ . I also highly suggest reading over how HTTP works .

FACT: Reading doesn’t make you a hacker, but you won’t become a hacker without reading almost technical guide, FAQ, and manual you can find.

Now, we’re going to look at some more LC commands to send HTTP POST data and then store a remote server’s cookies. This will allow us to simulate filling out an HTML form in your web browser and then log in to a website of choice. First, we’re going to add a line to define our user agent, since some websites reject HTTP requests without a proper user agent. We’ll also specify for LC to follow redirects, as many login pages redirect users to a home screen. Remember, we only need to do this once. LC will keep these settings in effect unless we change them.
CODE :

curl_easy_setopt(curl, CURLOPT_USERAGENT, "Mozilla/4.0"); curl_easy_setopt(curl, CURLOPT_AUTOREFERER, 1 ); curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1 );

Next, we’ll enable LC’s automatic processing of cookies. With this line, we won’t have to manually work with the cookies ourselves.
CODE :

curl_easy_setopt(curl, CURLOPT_COOKIEFILE, "");

Finally we have to send the appropriate HTTP POST data, which your browser usually does behind the scenes after you fill out an HTML form. The format for this is a C-style string as follows:
CODE :

char * submit_this_please = “name_of_form_input=some_value&name_of_other_input=other_value”;

The following code will submit the appropriate data to login to HackThisSite.
CODE :

char *data="username=your_username_here&password=your_password_here"; curl_easy_setopt(curl, CURLOPT_POSTFIELDS, data);

The following is a complete, yet brief, example that will show you how to use LC to login to HackThisSite. Obviously, you will have to modify the hard-coded username and password to login to your own account. To save space, the output will be sent simply to the console. Additionally, I have not included any real error checking.
CODE :

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <curl/curl.h> #include <curl/types.h> #include <curl/easy.h> int main() { curl_global_init( CURL_GLOBAL_ALL ); CURL * myHandle = curl_easy_init ( ); // Set up a couple initial paramaters that we will not need to mofiy later. curl_easy_setopt(myHandle, CURLOPT_USERAGENT, "Mozilla/4.0"); curl_easy_setopt(myHandle, CURLOPT_AUTOREFERER, 1 ); curl_easy_setopt(myHandle, CURLOPT_FOLLOWLOCATION, 1 ); curl_easy_setopt(myHandle, CURLOPT_COOKIEFILE, ""); // Visit the login page once to obtain a PHPSESSID cookie curl_easy_setopt(myHandle, CURLOPT_URL, "http://www.hackthissite.org/user/login/"); curl_easy_perform( myHandle ); // Now, can actually login. First we forge the HTTP referer field, or HTS will deny the login curl_easy_setopt(myHandle, CURLOPT_REFERER, "http://www.hackthissite.org/user/login/"); // Next we tell LibCurl what HTTP POST data to submit char *data="username=your_username_here&password=your_password_here"; curl_easy_setopt(myHandle, CURLOPT_POSTFIELDS, data); curl_easy_perform( myHandle ); curl_easy_cleanup( myHandle ); return 0; }

Applying LibCurl to HTS Challenges:

None of these examples will directly help you solve the Programming Challenges. However, several challenges consist of obtaining dynamically provided information, using the information to generate a text value, and then submitting that value. By combining the concepts covered in these examples, you should be able to programmatically log in to HTS, download arbitrary webpages, and save those pages into a buffer. It’s up to you to write those programs – I doubt that you’re going to get much more information than what is provided here without doing your own research.

For example, to solve at least one of the challenges, you will have to URL-escape the POST data. If you take initiative, you can easily figure out how to do this by referring to the LibCurl website .

Real-world uses:

As you can read here , LibCurl is used in numerous real-world applications. If you’re looking for practice, try using LibCurl to build some of the following examples:
A Simple Text-Based Web Browser
An FTP File Uploader
A Flexible Downloading Tool
A Dictionary Tool (to lookup information from, say, dictionary.reference.com)
A Cookie Forging Tool (that can accept and use stolen cookies)
(Insert any file transfer utility here)

Closing remarks:

You should have a general idea on how to use LibCurl. As stated in the introduction, LC is an immensely flexible library that can work with a large variety of protocols – we’ve hardly begun to scratch the surface. I highly suggest reading through the provided links to gain a better appreciation for the capabilities of this library, as much as I suggest compiling the provided examples and experimenting with your own programs. Good luck, and happy hacking.