在《大耳朵英语》的播客上,批量获取MP3地址

       先不废话了,直接上代码

#!C:\Perl\bin\perl.exe
use strict;
use warnings;
use LWP::Simple;
++$|;
my $BaseUrl = 'http://oral.ebigear.com';

#以后需要添加新的播客时,就在这里添加信息就可以了
#http://oral.ebigear.com/mypodcastlist-3065994-1.html 这个是faith老师的播客地址,她的用户ID就是3065994;
#13是她播客中的总页数。
my %TeacherInfo = (
	'Faith'         => '3065994,13',
	'xydj'          => '1507470,3',
	'Creature X'    => '1555304,22',
);

#选择要下载的播客用户名
my $Username = 'Creature X';  
#my $Username = 'Faith';
#my $Username = 'xydj';

my ($UserId, $PageSum) = split /,/, $TeacherInfo{$Username};
print << "PRINT";
Getting $Username info: 
UserId is $UserId
Podcast PageSum is $PageSum
PRINT

my %Count = ();
my @Mp3List = ();
my @FinalList = ();
my $Destination = "D:\\${UserId}_mp3_list.txt";

for my $Page (1 .. $PageSum)
{
	my $WebSite = "$BaseUrl/mypodcastlist-$UserId-$Page.html";
	print "\n=========== Here is $WebSite ===========\n";
	my $Content = get("$WebSite");
	my @Results = ($Content =~ /(?<=window\.location\.href='\/)(podcast-\d{1}-\d{5}\.html)/g);
	my @ChildUrlList = grep {++$Count{$_} < 2;} @Results;

	for my $ChildUrl (@ChildUrlList)
	{
		my $ChildWebSite = "$BaseUrl/$ChildUrl";
		print "Catching $ChildWebSite.\n";
		my $ChildContent = get("$ChildWebSite");
		$ChildContent =~ /(http:.*?(?:$UserId)\.mp3)/;
		push @Mp3List, $1;
		print "    Get $1\n";
	}
}

%Count= ();
@FinalList = grep {++$Count{$_} < 2;} @Mp3List;
print "\n\nCongratulations! Obtain @{[scalar(@FinalList)]} mp3 Sucesslly!\n";

open my $fh, '>', $Destination;
print $fh (join "\n", @FinalList);

print "You cat find the mp3 list in $Destination.\n";
exit;


以上代码在获取网页的时候,在有些平台上会显示乱码,不过不影响我的正则表达式取关键字。

如果实在看不惯,则需要encode来转码。

#!C:\Perl\bin\perl.exe
use strict;
use warnings;
use LWP::Simple;
use Encode qw{encode}; #将网页转码

my $content =  get("http://oral.ebigear.com/mypodcastlist-3065994-1.html");
my $transfor = encode('gb2312', $content);   #转成gb2312即可
print $transfor,"\n";
my $prefix = qr/href="/;
my $suffix = qr/" target="_blank">大耳朵FAITH口语课堂-天天学/;
my @list = ($transfor =~ /(?<=$prefix)(.*?)(?=$suffix)/mg);
print join "\n", @list;



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值