转载自:http://www.cnblogs.com/bobzhou/archive/2013/02/16/bobzhou.html
最近经常上虾米听歌,有些歌蛮好听的,昨天回上海准备下载一些音乐路上听,发现要用虾币购买,第一想法在chrome浏览器中按下F12,看Network中发出的报文,很轻松的找到了类似http://f3.xiami.net/78926/417559/08%201769939716_1875663.mp3这样的链接,这就是音乐的真实地址,可以直接下载下来。这里多说一句,很多人问怎么可以把在线的视频或者音乐下载到本地,网上也可以看到各式各样的回答,有用嗅探工具的,有从浏览器缓存找的,其实用chrome或者其他浏览器自带的抓包功能就很容易就能找到。
上面是最简单的方法,但是需要很多手工操作,下面用程序的方式来解析,更重要的是提供一个这类问题的思路。
首先来分析一下这首歌,地址是http://www.xiami.com/song/1769939716 从网页内容可以看到歌曲名字Rainbow Trees,演唱者 Robert de Boron,所属专辑 Diaspora,打开网页源代码注意到一些数字 1769939716,417599,78926.回头看看mp3的真实地址http://f3.xiami.net/78926/417559/08%201769939716_1875663.mp3,1769939716是歌曲ID,417599是所属专辑ID,78926是演唱者ID,发现这个url的构成 http://f3.xiami.net/演唱者ID/所属专辑ID/08%20歌曲ID_18655663.mp3.
这里还差一些东西08是什么?18655663是什么?%20我们知道是空格符,回到专辑页面http://www.xiami.com/album/417559发现这首歌Rainbow Trees是第八首歌,那18655663是什么?翻遍了chrome发出的所有报文,所有相关页面的源代码,没找到这个数字是什么意思。没办法,网上找了个反编译swf的软件,反编译了播放器的源代码,找到一些源代码
下面的代码看起来像是获取歌曲位置的代码,再继续找到getLocation方法
var
dataStr:* = evt.target.data;
dataStr = dataStr.replace(
" xmlns=\"http://xspf.org/ns/0/\""
,
""
);
var
xmlData:* =
new
XML(dataStr);
xmlData.ignoreWhitespace =
true
;
uid = xmlData.uid;
clearList = xmlData.clearlist;
var
songArr:* = xmlData.trackList.track;
var
tLoadArr:* = [];
var
backgroundStr:* =
""
;
var
firstSongId:* =
0
;
var
addSongTmpArr:* = [];
var
oldDataArr:* = [];
if
(songArr[
0
] !=
undefined
){
for
(i
in
songArr) {
tData = songArr[i];
songLocation =
""
;
thisLocation = tData.location;
if
(thisLocation.indexOf(
"http://"
) <
0
){
try
{
songLocation = locationDec.getLocation(tData.location);
}
catch
(e) {
};
}
else
{
songLocation = thisLocation;
};
|
以下是getLocation方法
public
function
getLocation(_arg1:
String
):
String
{
var
_local10:*;
var
_local2:* =
Number
(_arg1.charAt(
0
));
var
_local3:* = _arg1.substring(
1
);
var
_local4:* = Math.floor((_local3.length / _local2));
var
_local5:* = (_local3.length % _local2);
var
_local6:* =
new
Array
();
var
_local7:* =
0
;
while
(_local7 < _local5) {
if
(_local6[_local7] ==
undefined
){
_local6[_local7] =
""
;
};
_local6[_local7] = _local3.substr(((_local4 +
1
) * _local7), (_local4 +
1
));
_local7++;
};
_local7 = _local5;
while
(_local7 < _local2) {
_local6[_local7] = _local3.substr(((_local4 * (_local7 - _local5)) + ((_local4 +
1
) * _local5)), _local4);
_local7++;
};
var
_local8:* =
""
;
_local7 =
0
;
while
(_local7 < _local6[
0
].length) {
_local10 =
0
;
while
(_local10 < _local6.length) {
_local8 = (_local8 + _local6[_local10].charAt(_local7));
_local10++;
};
_local7++;
};
_local8 =
unescape
(_local8);
var
_local9:* =
""
;
_local7 =
0
;
while
(_local7 < _local8.length) {
if
(_local8.charAt(_local7) ==
"^"
){
_local9 = (_local9 +
"0"
);
}
else
{
_local9 = (_local9 + _local8.charAt(_local7));
};
_local7++;
};
_local9 = _local9.replace(
"+"
,
" "
);
return
(_local9);
}
|
这些代码看起来非常像获取地址的关键代码,沿着标黑的代码往上找到一个xml文件,并且这个xml文件里面应该有location这个标签,这时候找到这个xml文件很关键,这时候回到浏览器重新抓包,找到了这样一个链接http://www.xiami.com/song/playlist/id/1769939716(歌曲ID)/object_name/default/object_id/0。内容如下
<?
xml
version="1.0" encoding="utf-8"?>
<
playlist
version="1" xmlns="http://xspf.org/ns/0/">
<
trackList
>
<
track
>
<
title
><![CDATA[Rainbow Trees]]></
title
>
<
song_id
>1769939716</
song_id
>
<
album_id
>417559</
album_id
>
<
album_name
><![CDATA[Diaspora]]></
album_name
>
<
object_id
>1</
object_id
>
<
object_name
>default</
object_name
>
<
insert_type
>1</
insert_type
>
<
background
>http://img.xiami.com/res/player/bimg/bg-5.bak.jpg</
background
>
<
grade
>-1</
grade
>
<
artist
><![CDATA[Robert de Boron]]></
artist
>
<
location
>4h%2Fxit7645F8219186pt3Ffi.%8%19%%%736733tA%3an2927%52569_5.p%2.meF2F52E5E9716m</
location
>
<
ms
></
ms
>
<
lyric
>http://www.xiami.com/song/lyrictxt/id/1769939716</
lyric
>
<
pic
>http://img.xiami.com/images/album/img26/78926/4175591312340942_1.jpg</
pic
>
</
track
>
</
trackList
>
<
uid
>12390378</
uid
>
<
type
>default</
type
>
<
type_id
>1</
type_id
>
<
clearlist
></
clearlist
>
</
playlist
>
|
里面找到了我想要的location标签中的内容。拿到源代码和location参数后就明白了,4h%2Fxit7645F8219186pt3Ffi.%8%19%%%736733tA%3an2927%52569_5.p%2.meF2F52E5E9716m这串字符串中,把第一个字符4拿出来,然后把剩余的字符串分为四部分,若能整除则每部分都一样长,若不能整除,则后余数个字符串少一个字符,这里拆开后为[h%2Fxit7645F8219186p, t3Ffi.%8%19%%%736733, tA%3an2927%52569_5., p%2.meF2F52E5E9716m],一共78个字符 4-78%4 = 2,因此数列为[20,20,19,19].然后从第一个字符串的第一个字符开始拼接,若把这个拆分后的字符串数组看成一个二维的字符数组,拼接方式为[0][0],[1][0],[2][0],[3][0],[4][0],[0][1],[1][1],[2][1],[3][1][4][1]... 拼完之后http%3A%2F%2Ff3.xiami.net%2F78926%2F417559%2F%5E8%252%5E1769939716_1875663.mp3,然后urldecode为http://f3.xiami.net/78926/417559/^8%2^1769939716_1875663.mp3,最后把^替换为字符0.
自己平时用java,把这段代码翻译成JAVA后。
public
static
String getLocation(String location)
throws
UnsupportedEncodingException {
int
_local10;
int
_local2 = Integer.parseInt(location.substring(
0
,
1
));
String _local3 = location.substring(
1
, location.length());
double
_local4 = Math.floor(_local3.length() / _local2);
int
_local5 = _local3.length() % _local2;
String[] _local6 =
new
String[_local2];
int
_local7 =
0
;
while
(_local7 < _local5) {
if
(_local6[_local7] ==
null
) {
_local6[_local7] =
""
;
}
_local6[_local7] = _local3.substring((((
int
) _local4 +
1
) * _local7),
(((
int
) _local4 +
1
) * _local7) + ((
int
) _local4 +
1
));
_local7++;
}
_local7 = _local5;
while
(_local7 < _local2) {
_local6[_local7] = _local3
.substring((((
int
) _local4 * (_local7 - _local5)) + (((
int
) _local4 +
1
) * _local5)),
(((
int
) _local4 * (_local7 - _local5)) + (((
int
) _local4 +
1
) * _local5))+(
int
) _local4);
_local7++;
}
String _local8 =
""
;
_local7 =
0
;
while
(_local7 < ((String) _local6[
0
]).length()) {
_local10 =
0
;
while
(_local10 < _local6.length) {
if
(_local7 >= _local6[_local10].length()) {
break
;
}
_local8 = (_local8 + _local6[_local10].charAt(_local7));
_local10++;
}
_local7++;
}
_local8 = URLDecoder.decode(_local8,
"utf8"
);
String _local9 =
""
;
_local7 =
0
;
while
(_local7 < _local8.length()) {
if
(_local8.charAt(_local7) ==
'^'
){
_local9 = (_local9 +
"0"
);
}
else
{
_local9 = (_local9 + _local8.charAt(_local7));
};
_local7++;
}
_local9 = _local9.replace(
"+"
,
" "
);
return
_local9;
}
|
把location标签中的内容作为输入,输出结果就是我想要的mp3真实地址了。
这里我提供以下我处理这类问题的思路,适用于视频真实地址,音乐真实地址的解析。首先是浏览器抓包,一般这种方式可以直接拿到真实地址,但是如果要做一个程序自动去抓这样还不行,需要知道这个地址是怎么生成的,比如土豆视频,通过一个请求获取一个xml,xml中就有视频地址,这种最简单。比如优酷的直接通过抓包看不出来是怎么算出来真实地址的,这时候需要反编译flash,然后把flash中的代码翻译成你自己想要的语言。