用awk下载优酷视频

        awk 是一种优秀的文本处理工具,用它来处理文本中的数据非常方便。我们现在用的绝大部分是gawk,也就是gnu awk,gnu的软件一贯表现不错,跟其他AWK的实现版本比起来,gawk添加了对网络的支持,比如我可以用awk模拟发送http请求给浏览器,然后用正则表达式过滤网页内容,例如这里 是一个awk和sed搭配获取五大联赛计分表的shell程序。

        gawk编程最权威的资料在其info帮助文件 里,这份帮助资料值得称道的并不是它全面的reference,而是里面包含了大量akw应用的实例。虽然用gawk进行网络编程有点类似所谓的奇技淫巧,但是相比用c来完成同样的工作,awk还是颇具生产力的。

        下面这个程序是用来获取youku视频的,程序运行起来是这个样子:

 

 

        程序的基本原理是用gawk发送http请求,获取服务器返回的信息,然后根据这些信息进行一些处理后重新发送,经过三次请求,youku会发送真正的flv地址,根据这个真实地址就可以下载了,由于gawk在I/O这方面功能很弱,所以我在gawk中通过system()调用curl 来完成这最后一步的下载。

        这个程序可以在命令行下如此调用:

 

         gawk -f get_youku.awk youku.txt

 

        其中youku是视频所在的网页地址和下载回来后要保存的名称,其格式可以这样:

 

 

          csdn blog的代码模板没有awk,代码有300多行,下面是代码,可能有点乱,有兴趣仔细研究的可以留下邮箱索取源代码。

  1. #! /usr/bin/gawk -f
  2. ################################################################################
  3. #
  4. #优酷视频下载器
  5. #
  6. #Author: hailongchang@163.com
  7. #
  8. #Date:   11/15/2008
  9. #
  10. ################################################################################
  11. {
  12.     adr = $1;
  13.     fn = $2;
  14.     download_video(adr,fn);
  15.    
  16. }
  17. ################################################################################
  18. #实际的下载函数,参数url是flv的网络地址,filename是下载后保存的名称
  19. ################################################################################
  20. function download_video(url,filename)
  21. {
  22.     Get_Info(Get_Vid(url));    
  23.     system( "echo ========================================================================================" );
  24.      for (i=1;i<=video_info[ "clipcn" ];i++)
  25.     {
  26.       if (video_info[ "clipcn" ] > 1)
  27.      {
  28.          filename = filename  "_"  i;
  29.      }
  30.     tlink =  "url_"  i;
  31.     filename = filename  ".flv" ;
  32.     echo_hint =  "正在为您下载 : "  filename;
  33.     echo_command =  "echo "  echo_hint;
  34.     system(echo_command);
  35.     system( "echo" );
  36.     command =  "curl "  Identify_video(video_info[tlink])  " >"  filename;
  37.     system(command);
  38.     system( "echo" );
  39.     system( "echo ========================================================================================" );
  40.      }
  41. }
  42. ################################################################################
  43. #提取网页地址,参数web_url来自于youtube.txt,是视频所在的网页地址
  44. ################################################################################
  45. function Get_url(web_url)
  46. {
  47.     gsub(/http:/// //,"",web_url)
  48.     gsub(/v/.youku/.com/, "" ,web_url)
  49.      return  web_url;
  50. }
  51. ################################################################################
  52. #提取视频id的函数
  53. ################################################################################
  54. function Get_Vid(web_url)
  55. {
  56.     RS= "/r/n"
  57.     url = Get_url(web_url)
  58.     InetFile =  "/inet/tcp/0/v.youku.com/80"
  59.     Request =  "GET "  url  " HTTP/1.1/r/n"
  60.     Request = Request  "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*"
  61.     Request = Request  "Accept-Language: zh-cn/r/n"
  62.     Request = Request  "UA-CPU: x86/r/n"
  63.     Request = Request  "Accept-Encoding: unzip, deflate/r/n"
  64.     Request = Request  "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT5.1; .NET CLR 1.1.4322)/r/n"
  65.     Request = Request  "Host: v.youku.com/r/n/r/n"
  66.     
  67.     print Request |& InetFile;
  68.      while ((InetFile |& getline) >0)
  69.     {
  70.      if (match($0,/videoId =  '[0-9]*' /,matchtext))
  71.     {
  72.          if (match(matchtext[0],/ '[0-9]*' /,array_vid))
  73.         {
  74.         vid = array_vid[0];
  75.         gsub(/'/, "" ,vid);
  76.         }
  77.     }
  78.     }
  79.     close(InetFile);
  80.      return  vid;
  81. }
  82. ################################################################################
  83. #获取服务器发送的key
  84. ################################################################################
  85. function Get_key(item)
  86. {
  87.     split(item,item_info, ":" )
  88.     gsub(/ "/," ",item_info[2])
  89.      return  item_info[2]
  90. }
  91. ################################################################################
  92. #获取视频的大小
  93. ################################################################################
  94. function Get_size(item)
  95. {
  96.     split(item,item_info, ":" )
  97.     gsub(/ "/," ",item_info[3])
  98.     gsub(/}/, "" ,item_info[3])
  99.      return  item_info[3]
  100.     
  101. }
  102. ################################################################################
  103. #获取视频的seed
  104. ################################################################################
  105. function Get_seed(item)
  106. {
  107.     split(item,item_info, ":" )
  108.      return  item_info[2]
  109. }
  110. ################################################################################
  111. #一个随机数发生器
  112. ################################################################################
  113. function Genrate_rand() 
  114.    seed = (seed * 211 + 30031) % 65536; 
  115.    num = seed / 65536;
  116.     return  num;
  117. function convert_fileid(fileid) 
  118. {
  119.    split(fileid,fid, "*" ); 
  120.    i = 1;
  121.     while (fid[i] !=  "" )
  122.    {
  123.        i++;   
  124.    }
  125.    fid_length = i-1;
  126.    cg_str =  ""
  127.    str =  "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ///:._-1234567890"
  128.    seed = video_info[ "seed" ];
  129.    str_length = length(str);
  130.     for  (i = 1; i <= str_length; ++i) 
  131.    {
  132.        seed = (seed * 211 + 30031) % 65536;
  133.        num = seed / 65536;
  134.        pos =  int (length(str) * num); 
  135.           pos += 1;
  136.        ch = substr(str,pos,1);
  137.        cg_str = cg_str ch;
  138.        split(str,str_array,ch);
  139.        str = str_array[1] str_array[2];
  140.     } 
  141.     id =  ""
  142.      for  (i = 1; i <= fid_length; ++i) 
  143.     {
  144.         id = id substr(cg_str,fid[i]+1,1); 
  145.     }
  146.      return  (id); 
  147. }
  148. ################################################################################
  149. #提取fileid
  150. ################################################################################
  151. function Get_fileid(item)
  152. {
  153.     split(item,item_info, ":" )
  154.     gsub(/ "/," ",item_info[2])
  155.     split(item_info[2],fileid, "*" )
  156.     
  157.      return  item_info[2]
  158. }
  159. ################################################################################
  160. #将16进制字符转换为数字
  161. ################################################################################
  162. function hex_convention(ch)
  163. {
  164.      if (ch ==  "a" )
  165.     num = 10;
  166.      else   if (ch ==  "b" )
  167.     num =  11;
  168.      else   if (ch ==  "c" )
  169.     num =  12;
  170.      else   if (ch ==  "d" )
  171.     num =  13;
  172.      else   if (ch ==  "e" )
  173.     num =  14;
  174.      else   if (ch ==  "f" )
  175.     num = 15;
  176.      else
  177.     num = ch;
  178.      return  num;
  179. }
  180. ################################################################################
  181. #将16进制字符串转换为十进制数字
  182. ################################################################################
  183. function HexStr_int(str)
  184. {
  185.     sum = 0;
  186.      for (i=length(str);i>=1;i--)
  187.     {
  188.     n = substr(str,i,1);
  189.     tmp = 16**(length(str)-i);
  190.     sum += (hex_convention(n)) * tmp;
  191.     }
  192.      return  sum;
  193. }
  194. ################################################################################
  195. #获取视频的相关信息
  196. ################################################################################
  197. function Get_Info(video_id)
  198. {
  199.     url =  "/player/getPlayList/VideoIDS/"  video_id  "/version/v1.0.0312/source/video/password//Type/flv"
  200.     flvHttpFile =  "/inet/tcp/0/v.youku.com/80"
  201.     Request =  "GET "  url  " HTTP/1.1/r/n"
  202.     Request = Request  "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*"
  203.     Request = Request  "Accept-Language: zh-cn/r/n"
  204.     Request = Request  "UA-CPU: x86/r/n"
  205.     Request = Request  "Accept-Encoding: unzip, deflate/r/n"
  206.     Request = Request  "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT5.1; .NET CLR 1.1.4322)/r/n"
  207.     Request = Request  "Host: v.youku.com/r/n"
  208.     
  209.     print Request |& flvHttpFile
  210.      while ((flvHttpFile |& getline) > 0)
  211.     {
  212.     split($0,match_info, "," );
  213.     }
  214.     close(flvHttpFile);
  215.     i = 1;
  216.      while (match_info[i]!= "" )
  217.     {
  218.      if (0 != match(match_info[i],/ "seed" .*/))
  219.     {
  220.         video_info[ "seed" ] = Get_seed(match_info[i]);
  221.     }
  222.      if (0 != match(match_info[i],/ "streamsizes" .*/))
  223.     {
  224.         video_info[ "size" ] = Get_size(match_info[i]);
  225.     }
  226.     
  227.      if ( 0 != match(match_info[i],/ "fileid" /))
  228.     {
  229.         video_info[ "fileid" ] = Get_fileid(match_info[i]);
  230.     }
  231.      if  (0 != match(match_info[i],/ "key1" .*/))
  232.     {
  233.         video_info[ "key1" ] = Get_key(match_info[i]);
  234.     }
  235.      if ( 0 != match(match_info[i],/ "key2" .*/,match_key2))
  236.     {
  237.         video_info[ "key2" ] = Get_key(match_info[i]);
  238.     }
  239.     i++;
  240.     }
  241. #     printf("/n/n");
  242. #     printf("seed = %s/n",video_info["seed"]);
  243. #     printf("size = %s/n",video_info["size"]);
  244. #     printf("fileid = %s/n",video_info["fileid"]);
  245. #     printf("key1 = %s/n",video_info["key1"]);
  246. #     printf("key2 = %s/n/n",video_info["key2"]);
  247. #     printf("/n/n")
  248.     file_id = convert_fileid(video_info[ "fileid" ]);
  249.     key_stand = sprintf( "%d" ,0xA55AA5A5);
  250.     key1 = HexStr_int(video_info[ "key1" ]);
  251.     video_info[ "key1" ] = sprintf( "%x" ,xor(key1,key_stand));
  252.     video_info[ "clipcn" ] =  int (substr(file_id,7,2));
  253.      if (video_info[ "clipcn" ] == 1)
  254.     {
  255.     last_url =  "http://f.youku.com/player/getFlvPath/sid/00_00/st/flv/fileid/"  
  256.     last_url = last_url file_id  "?K="  video_info[ "key2" ];
  257.     last_url = last_url video_info[ "key1" ];
  258.     video_info[ "url_1" ] = last_url;
  259.     }
  260.      else
  261.     {
  262.      for (i = 1; i<= video_info[ "clipcn" ];i++ )
  263.     {
  264.          if (video_info[ "clipcn" ] <= 10)
  265.         {
  266.         lev =  "0"  (i-1);
  267.         }
  268.         last_url =  "http://f.youku.com/player/getFlvPath/sid/00_00/st/flv/fileid/"  
  269.         last_url = last_url substr(file_id,1,8);
  270.         last_url = last_url lev;
  271.         last_url = last_url substr(file_id,11,length(file_id)-10);
  272.         last_url = last_url  "?K=" ;
  273.         last_url = last_url video_info[ "key2" ];
  274.         last_url = last_url video_info[ "key1" ];
  275.         tlink =  "url_"  i;
  276.         video_info[tlink] = last_url;
  277.     }
  278.     }    
  279.      return ;
  280. }
  281. ################################################################################
  282. #最后一次放松http请求,服务器将返回真实的视频地址
  283. ################################################################################
  284. function Identify_video(req)
  285. {
  286.     InetDown =  "/inet/tcp/0/f.youku.com/80"
  287.     gsub(/http:f.youku.com/, "" ,req);
  288.     Request =  "GET "  req  " HTTP/1.1/r/n" ;
  289.     Request = Request  "Accept: */*/r/n" ;
  290.     Request = Request  "Cache-Control: no-cache/r/n" ;
  291.     Request = Request  "Connection: close/r/n" ;
  292.     Request = Request  "Host: f.youku.com/r/n" ;
  293.     Request = Request  "Pragma: no-cache/r/n" ;
  294.     Request = Request  "Referer:  http://f.youku.com/player/getFlvPath/sid/00_00/st/flv/fileid//r/n" ;
  295.     Request = Request  "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; )/r/n"
  296.     Request = Request  "/r/n" ;
  297.     print Request |& InetDown;
  298.      while ((InetDown |& getline) >0)
  299.     {
  300.     pos = match($0,/http:/// //);
  301.      if (0 != pos)
  302.     {
  303.         flvAddr = substr($0,pos,length($0) - 10);
  304.     }
  305.     }
  306.     close(InetDown);
  307.      return  flvAddr;
  308. }
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 12
    评论
This book is about a set of oddly named UNIX utilities, sed and awk. These utilities have many things in common, including the use of regular expressions for pattern matching. Since pattern matching is such an important part of their use, this book explains UNIX regular expression syntax very thoroughly. Because there is a natural progression in learning from grep to sed to awk, we will be covering all three programs, although the focus is on sed and awk. Sed and awk are tools used by users, programmers, and system administrators - anyone working with text files. Sed, so called because it is a stream editor, is perfect for applying a series of edits to a number of files. Awk, named after its developers Aho, Weinberger, and Kernighan, is a programming language that permits easy manipulation of structured data and the generation of formatted reports. This book emphasizes the POSIX definition of awk. In addition, the book briefly describes the original version of awk, before discussing three freely available versions of awk and two commercial ones, all of which implement POSIX awk. The focus of this book is on writing scripts for sed and awk that quickly solve an assortment of problems for the user. Many of these scripts could be called "quick-fixes." In addition, we'll cover scripts that solve larger problems that require more careful design and development. Scope of This Handbook Chapter 1, Power Tools for Editing, is an overview of the features and capabilities of sed and awk. Chapter 2, Understanding Basic Operations, demonstrates the basic operations of sed and awk, showing a progression in functionality from sed to awk. Both share a similar command-line syntax, accepting user instructions in the form of a script. Chapter 3, Understanding Regular Expression Syntax, describes UNIX regular expression syntax in full detail. New users are often intimidated by these strange expressions, used for pattern matching. It is important to master regular expression syntax to get the most from sed and awk. The pattern-matching examples in this chapter largely rely on grep and egrep. Chapter 4, Writing sed Scripts, begins a three-chapter section on sed. This chapter covers the basic elements of writing a sed script using only a few sed commands. It also presents a shell script that simplifies invoking sed scripts. Chapter 5, Basic sed Commands, and Chapter 6, Advanced sed Commands, divide the sed command set into basic and advanced commands. The basic commands are commands that parallel manual editing actions, while the advanced commands introduce simple programming capabilities. Among the advanced commands are those that manipulate the hold space, a set-aside temporary buffer. Chapter 7, Writing Scripts for awk, begins a five-chapter section on awk. This chapter presents the primary features of this scripting language. A number of scripts are explained, including one that modifies the output of the ls command. Chapter 8, Conditionals, Loops, and Arrays, describes how to use common programming constructs such as conditionals, loops, and arrays. Chapter 9, Functions, describes how to use awk's built-in functions as well as how to write user-defined functions. Chapter 10, The Bottom Drawer, covers a set of miscellaneous awk topics. It describes how to execute UNIX commands from an awk script and how to direct output to files and pipes. It then offers some (meager) advice on debugging awk scripts. Chapter 11, A Flock of awks, describes the original V7 version of awk, the current Bell Labs awk, GNU awk (gawk) from the Free Software Foundation, and mawk, by Michael Brennan. The latter three all have freely available source code. This chapter also describes two commercial implementations, MKS awk and Thomson Automation awk (tawk), as well as VSAwk, which brings awk-like capabilities to the Visual Basic environment. Chapter 12, Full-Featured Applications, presents two longer, more complex awk scripts that together demonstrate nearly all the features of the language. The first script is an interactive spelling checker. The second script processes and formats the index for a book or a master index for a set of books. Chapter 13, A Miscellany of Scripts, presents a number of user-contributed scripts that show different styles and techniques of writing scripts for sed and awk. Appendix A, Quick Reference for sed, is a quick reference describing sed's commands and command-line options. Appendix B, Quick Reference for awk, is a quick reference to awk's command-line options and a full description of its scripting language. Appendix C, Supplement for Chapter 12, presents the full listings for the spellcheck.awk script and the masterindex shell script described in Chapter 12.
sed和awk是用户、程序员和管理员应用的工具。之所以称为sed是因为它是一个流编辑器(stream editor),用于对许多文件执行一系列的编辑操作。awk是根据它的开发者Aho、Weinberger和Kernighan命名的。awk是一种编程语言,它可以使你很容易地处理结构化数据和生成格式化报告。第二版介绍了awk的POSIX标准,同时介绍了一些可免费使用的以及商业版的awk。 本书在一开始就给出了一个概述和指南,论述了从grep到sed再到awk不断改进的功能。sed和awk具有相同的命令行语法,以脚本的形式接收用户的命令。因为所有这三个程序都使用UNIX正则表达式,因此书中用一章的篇幅来介绍UNIX的正则表达式语法。 然后,本书介绍如何编写sed脚本。从编写几行简单的脚本开始,学习进行手工编辑操作的其他基本命令和高级命令,以及由此引入的简单程序结构。这些高级命令包括用于处理保持空间、即一个临时缓冲区的命令。 本书的第二部分经过广泛的修订,包括了POSIX awk,以及3个可免费使用的和3个商业版的awk。本书介绍了awk语言的主要特点以及如何编写简单的脚本。你还能了解到: * 通用的程序结构 * 如何使用awk的内部函数 * 如何编写用户的定义函数 * awk程序的调试技术 * 如何开发一个处理索引的应用程序,该程序演示了awk的强大功能 * 得到不同awk版本的FTP和联系信息 本书还包含了一组用户提供的程序,这些程序展示了广泛的sed和awk程序风格和技巧。
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值