如何使用Perl制作文件下载脚本

最新推荐文章于 2024-07-07 17:14:23 发布

cxygs5788

最新推荐文章于 2024-07-07 17:14:23 发布

阅读量1.1k

点赞数

文章标签：数据库

注意：如果只需要perl代码，则可以跳到本文的结尾。

介绍

许多网站都有您可以用来下载文件的表格或链接。您单击一个表单按钮或单击一个链接，一两分钟后，您的Web浏览器中会弹出一个文件下载对话框，并提示您输入一些说明，例如“打开”或“保存”。我将向您展示如何使用Perl脚本执行此操作。

你需要什么

任何最新版本的perl（5.06或更高版本都可以）和一台用于运行脚本的服务器。最好使用允许您将文件存储在Web根目录之上的服务器，但这不是必需的。稍微了解一下HTML会有所帮助，但这不是必需的。通常，您将脚本上载到cgi-bin文件夹中，并将文件权限设置为755。服务器上载脚本的文件夹和权限可能会有所不同。

Perl代码

几乎所有作为CGI进程运行的perl脚本都需要以shebang行开头。最常见的shebang行是：

#!/usr/bin/perl

它只是告诉服务器在哪里可以找到perl。您的服务器所需的shebang行可能有所不同。大多数网络主机会将这些信息发布在其网站的某个位置。为了获得良好的perl编码实践和CGI安全性，我们将在shebang行上添加一个开关：-T。注意：它必须是大写的T。

#!/usr/bin/perl -T

T代表“异味”模式。作为脚本的程序员，这实际上是为了防止您犯一个可怕的错误，并允许CGI表单的用户将数据发送到服务器，而该数据可能以不安全的方式使用。作为CGI进程运行的所有perl脚本都应使用-T开关，因此我将其包括在内。模组

模块有点像可以在perl程序中使用的单独的perl程序。许多人编写的模块已成为其他perl程序员一直使用的标准。我们将使用以下模块：


use strict;
use warnings;
use CGI;
# Uncomment the next line only for debugging the script.
#use CGI::Carp qw/fatalsToBrowser/;

前两个实际上不是模块，它们是实用程序。它们会影响perl本身的功能。在本文中，我不会对其进行解释。您需要相信我，它们在几乎所有perl程序中都非常重要。 “ CGI”模块是将为我们完成大部分工作的模块：处理表单数据，打印http标头等。 “ CGI :: Carp”模块确实用于调试，如果有问题，可以帮助您使脚本运行。如果有任何致命错误导致脚本失败，它将在屏幕上显示一条错误消息。这些错误也将在服务器错误日志中打印出来。

程序的后两行建立了一些重要参数：


$CGI::POST_MAX = 1024;
$CGI::DISABLE_UPLOADS = 1;

“ POST_MAX”以字节为单位设置将被视为过多数据的最大限制，并导致脚本返回错误。我将此限制设置为较低（1 kb），因为此脚本需要发送给它的数据很少。第二行告诉脚本不接受文件上传。有道理，因为我们要下载文件，而不是上传文件。这样可以防止用户尝试使用更改后的格式将文件发送到您的脚本。可以保存所有表单并更改HTML代码，并且用户可以将所需的任何内容发送到脚本，这取决于您是否要在服务器端进行。用户所做的一切完全不受您的控制。 设置路径和选项


####################################
#### User Configuration Section ####
#################################### 
# The path to where the downloadable files are. 
# Preferably this should be above the web root folder.
my $path_to_files = '/home/user/downloads/'; 
# The path to the error log file
my $error_log     = '/home/user/downloads/logs/errors.txt'; 
# Option to log errors: 1 = yes, 0 = no
my $log           = 1; 
# To prevent hot-linking to your script
my $url = 'http://www.yoursite.com';
########################################
#### End User Configuration Section ####
########################################

$ path_to_files是存储要下载文件的目录。我建议您将它们存储在无法通过Web访问的文件夹中。通常可以通过将它们放在与根Web文件夹平行的文件夹（public_html或www）中或上方的文件夹中来完成。

$ error_log是errors.txt文件的路径，该文件记录了脚本生成的错误。

$ log打开或关闭错误日志。

$ url应该是您网站的名称，包括“

http：//”部分。 创建CGI对象

my $q = CGI->new;

$ q是我们将用于执行CGI模块的各种方法的对象。我喜欢将其视为管家。您告诉管家您想要什么，他知道如何完成，您不必担心细节。我们的“管家”，$ q，将知道如何处理我们给他的“命令”。

实际上，CGI模块具有许多您可以赋予“管家”的“命令”。我们将使用其中一些。学习使用CGI模块几乎就像学习小型编程语言一样。但是美丽之处在于您只需要知道命令的功能，而不是命令的方式即可。就像真正的男管家一样，您必须相信他知道他在做什么，并且可以高效，高效地完成工作，而无需顾虑。我建议您花些时间阅读CGI模块文档，即使您不太了解CGI模块文档，至少也应该熟悉基本的表单处理方法。我留给你。

安全检查站

当将脚本作为CGI运行时，请不要低估对安全性的需求。我们将使用三个“检查点”来检测任何可疑活动。第一个是要检查发送到脚本的数据量。我们将cgi_error（）命令传递给值得信赖的男管家“ $ q”，他将返回响应$ error。 “ 413”表示已超过我们为$ CGI :: POST_MAX设置的限制，因此我们将检查该响应。注意：在整篇文章中，我交替使用命令和方法来表示同一件事。


if (my $error = $q->cgi_error()){
   if ($error =~ /^413\b/o) {
      error('Maximum data limit exceeded.');
   }
   else {
      error('An unknown error has occured.'); 
   }
}

接下来，我们检查是否有人尝试将文件上传到脚本。为了发送文件，必须在CGI表单的“实体”属性中使用“多部分/表单数据”。


if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) {
   error('Invalid Content-Type : multipart/form-data.')
}

接下来，我们检查使用脚本的请求是否来自您的网站。

if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) {
   error('Access forbidden.')
}

获取文件名

我将使用Vars方法将发送到脚本的所有参数转换为哈希值。再次，我们调用“ $ q”进行实际工作。

my %IN = $q->Vars;

现在，我们确保有一个名为“ file”的参数。

my $file = $IN{'file'} or error('No file selected.');

验证，验证，验证

您说的还不够，必须验证发送到CGI脚本的所有数据。如果我们只允许将任何内容发送到脚本，则有人可以发送以下内容：/ foo / bar并根据您附加的路径，脚本会乖乖地去查找foo目录并下载bar文件。当然，人们可以尝试做一些更糟糕的事情，但这不是有关如何使用前门入侵网站的文章。为了防止用户摆脱这种危险的特技，我们需要验证发送到脚本的数据。


if ($file =~ /^(\w+[\w.-]+\.\w+)$/) {
   $file = $1;
}
else {
   error('Invalid characters in filename.');
}

该代码中看起来很神秘的部分（$ file =〜/^(\w+[\w.-]+\.\w+)$/）被称为正则表达式（regexp）。通常，正则表达式是用于验证/过滤表单数据的内容。正则表达式已经超出了本文的范围。如果您有兴趣了解该正则表达式，则必须阅读一些正则表达式教程。请参阅文章末尾的在线资源。基本上，它正在检查数据是否是这样的：frog.gif，puppy-dog.jpg或meatloaf.txt。它以基本文件名格式filename.ext检查受限制的字符集“ a-zA-Z0-9_-。”，并拒绝其他任何无效字符。

上面的代码也“保留”了数据。由于数据将用于打开服务器上的文件，因此我们必须取消对该文件的污染，以满足–T开关的要求，即我们没有做任何不安全的事情。取消污染数据的唯一方法是使用正则表达式。正则表达式中的括号将模式匹配存储在内存中，我们使用$ 1获得该值。然后，我们将值分配回变量$ file，现在将用于打开文件的数据位于脚本的内部，并且–T开关将认为它可以安全使用。由您决定验证/过滤足以完成任务。例如，如果您在regexp /(.*)/中使用了此模式，则-T开关不会抱怨，但是数据将像在表单中输入或通过超链接一样发送到脚本中。那将是一件愚蠢的事。

如果数据未通过验证例程，则会将一条消息发送到错误子例程，并向用户发出警报。

准备下载

download($file) or error('An unknown error has occured.');

如果文件下载失败，则会将一条消息发送到错误子例程，并向用户发出警报。 download（）子例程


sub download {
   my $file = $_[0] or return(0); 
   # Uncomment the next line only for debugging the script 
   #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!"; 
   # Comment the next line if you uncomment the above line 
   open(my $DLFILE, '<', "$path_to_files/$file") or return(0); 
   # this prints the download headers with the file size included
   # so you get a progress bar in the dialog box that displays during file downloads. 
   print $q->header(-type            => 'application/x-download',
                    -attachment      => $file,
                    -Content_length  => -s "$path_to_files/$file",
   ); 
   binmode $DLFILE;
   print while <$DLFILE>;
   undef ($DLFILE);
   return(1);
}

子例程的第一行获取文件名或将0（零）返回给调用方以指示失败。有两行打开文件，一行用于调试目的，一行用于在一切正常运行时运行脚本。代码的下一部分将显示使Web浏览器下载文件而不是尝试显示文件的标头。


   print $q->header(-type            => 'application/x-download',
                    -attachment      => $file,
                    -Content_length  => -s "$path_to_files/$file",
   );

header（）方法中的“ type”选项是导致下载的特定标题。的

“附件”选项定义要下载的文件的名称。您可以给文件指定任何名称，而不必是实际的文件名。如果您有理由隐藏文件的真实名称，或者需要为下载的文件提供真实名称以外的其他名称，则这可能很有用。 “ Content-length”选项使用–s文件测试操作符来获取文件的大小。这允许文件下载对话框显示文件大小和进度条，并估计完成文件下载所需的时间。

子例程的最后四行完成了下载过程。


   binmode $DLFILE;
   print while <$DLFILE>;
   undef ($DLFILE);
   return(1);

binmode（）函数告诉perl以“二进制”模式传输文件。使用二进制模式将在接收端损坏文件的可能性很小。但是通常使用它没有问题，在某些情况下有必要。如果在使用binmode时遇到问题，请删除或注释掉该行。有关更多详细信息，请参见binmode函数文档。 “打印”行实际上是将文件从服务器传输到客户端的行。 “ undef”关闭文件，因为我使用了间接文件句柄。我们在子例程末尾返回1（一）以指示成功。 子程序

“错误”子例程非常简单。它使用一些html生成方法来打印基本的html文档，该文档显示了我们发送给它的错误消息，存储在$ _ [0]中。 CGI模块文档中讨论了每种方法。如果您打开了错误日志记录，则也会调用“ log_error”函数。


sub error {
   print $q->header(-type=>'text/html'),
         $q->start_html(-title=>'Error'),
         $q->h3("Error: $_[0]"),
         $q->end_html;
   log_error($_[0]) if $log;
   exit(0);
}

接下来是“ log_error”子例程。可以检测到脚本检测到的每个错误，因此您可以查看站点的访问者如何滥用脚本。这是值得跟踪的好信息。可能矫over过正，但我坚信跟踪错误，因为它们可以帮助您编写更安全的脚本并向您发出警告，让您注意僵尸程序或试图滥用该脚本的人。它将错误和一些其他信息附加到文件中。我个人喜欢记录发送到脚本的名称/值对，以查看用户是否更改了表单或查询字符串。这些值将在$ params中，格式如下：“ name =” value ::: name = value ::: name = value”。 “ scalar localtime（）”为您带来方便，因此您可以轻松读取错误的日期/时间。 “时间”以纪元秒为单位记录日期/时间，这是记录日期/时间的标准方法，因此计算机程序和脚本可以理解它。最终由您决定如何处理此信息。我建议您不时检查一次错误日志。您可以将其删除，脚本将创建一个新的脚本。或者完全在脚本的“用户配置”部分中关闭错误日志记录。


sub log_error {
   my $error = $_[0]; 
   # Uncomment the next line only for debugging the script
   #open (my $log, ">>", $error_log) or die "Can't open error log: $!"; 
   # Comment the next line if you uncomment the above line
   open (my $log, ">>", $error_log) or return(0); 
   flock $log,2;
   my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params';
   print $log '"', join('","',time, 
                      scalar localtime(),
                      $ENV{'REMOTE_ADDR'},
                      $ENV{'SERVER_NAME'},
                      $ENV{'HTTP_HOST'},
                      $ENV{'HTTP_REFERER'},
                      $ENV{'HTTP_USER_AGENT'},
                      $ENV{'SCRIPT_NAME'},
                      $ENV{'REQUEST_METHOD'},
                      $params,
                      $error),
                      "\"\n";
}

介面

剩下要做的唯一决定就是如何允许您网站的访问者访问下载脚本。您可以使用超链接或HTML表单或两者的某种组合。您可以让另一个脚本（甚至相同的脚本）生成该接口。基本概念是将下载文件的名称传递给下载脚本。使用超链接的示例：

<a href=”cgi-bin/download.pl?file=frog.jpg”>Download the Frog Image</a>

我将由您自己来发现创建下载脚本接口的其他方法。结论

这是一个非常基本的脚本。您可以向脚本添加更多功能，例如，添加一个计数器文件，该计数器文件跟踪每个文件的下载次数。您可以添加身份验证，以便您的用户必须登录才能下载文件。您可以将脚本绑定到数据库，而不是将文件存储在服务器上。

凯文（又名KevinADC）

本文受《

创用CC许可资源 Perldoc网站所有在线perl文档。 CGI.pm CGI模块文档（在perldoc上）。搜索CPAN综合Perl存档网络。巨大的Perl模块存储库

和更多。

CGI安全性CGI安全性入门。 完整的脚本


#!/usr/bin/perl -T 
## Load pragmas and modules
use strict;
use warnings;
use CGI;
# Uncomment the next line only for debugging the script.
#use CGI::Carp qw/fatalsToBrowser/; 
# The next two lines are very important. Do not modify them
# if you do not understand what they do.
$CGI::POST_MAX = 1024;
$CGI::DISABLE_UPLOADS = 1;   
####################################
#### User Configuration Section ####
#################################### 
# The path to where the downloadable files are. 
# Prefereably this should be above the web root folder.
my $path_to_files = '/home/user/downloads/'; 
# The path to the error log file
my $error_log     = '/home/user/downloads/logs/errors.txt'; 
# Option to log errors: 1 = yes, 0 = no
my $log           = 1; 
# To prevent hot-linking to your script
my $url = 'http://www.yoursite.com'; 
####################################
## End User Configuration Section ##
#################################### 
# Edit below here at your own risk 
my $q = CGI->new; 
######################################
## This section checks for a number ##
## of possible errors or suspicious ##
## activity.                        ##
###################################### 
# check to see if data limit is exceeded
if (my $error = $q->cgi_error()){
   if ($error =~ /^413\b/o) {
      error('Maximum data limit exceeded.');
   }
   else {
      error('An unknown error has occured.'); 
   }
} 
# Check to see if the content-type is acceptable.
# multipart/form-data indicates someone is trying
# to upload data to the script with a hacked form.
# $CGI_DISABLE_UPLOADS prevents uploads. This routine
# is to catch the attempt and log it. 
if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) {
   error('Invalid Content-Type : multipart/form-data.')
}        
# Check if the request came from your website, if not
# it indicates remote access or hot linking.
if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) {
   error('Access forbidden.')
} 
################################
## End error checking section ##
################################ 
# Get the data sent to the script.
my %IN = $q->Vars; 
# Parse the "file" paramater sent to the script.
my $file = $IN{'file'} or error('No file selected.'); 
# Here we untaint the filename and make sure there are no characters like '/' 
# in the name that could be used to download files from any folder on the website.
if ($file =~ /^(\w+[\w.-]+\.\w+)$/o) {
   $file = $1;
}
else {
   error('Invalid characters in filename.');
}     
# Check if the download succeeded
download($file) or error('An unknown error has occured.');  
#################
## SUBROUTINES ##
################# 
# download the file
sub download {
   my $file = $_[0] or return(0); 
   # Uncomment the next line only for debugging the script 
   #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!"; 
   # Comment the next line if you uncomment the above line 
   open(my $DLFILE, '<', "$path_to_files/$file") or return(0); 
   # This prints the download headers with the file size included
   # so you get a progress bar in the dialog box that displays during file downlaods. 
   print $q->header(-type            => 'application/x-download',
                    -attachment      => $file,
                    'Content-length' => -s "$path_to_files/$file",
   ); 
   binmode $DLFILE;
   print while <$DLFILE>;
   undef ($DLFILE);
   return(1);
} 
# This is a very generic error page. You should make a better one.
sub error {
   print $q->header(-type=>'text/html'),
         $q->start_html(-title=>'Error'),
         $q->h3("Error: $_[0]"),
         $q->end_html;
   log_error($_[0]) if $log;
   exit(0);
} 
# Log the error to a file
sub log_error {
   my $error = $_[0]; 
   # Uncomment the next line only for debugging the script
   #open (my $log, ">>", $error_log) or die "Can't open error log: $!"; 
   # Comment the next line if you uncomment the above line
   open (my $log, ">>", $error_log) or return(0); 
   flock $log,2;
   my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params';
   print $log '"', join('","',time, 
                      scalar localtime(),
                      $ENV{'REMOTE_ADDR'},
                      $ENV{'SERVER_NAME'},
                      $ENV{'HTTP_HOST'},
                      $ENV{'HTTP_REFERER'},
                      $ENV{'HTTP_USER_AGENT'},
                      $ENV{'SCRIPT_NAME'},
                      $ENV{'REQUEST_METHOD'},
                      $params,
                      $error),
                      "\"\n";
}