flask开发桌面应用程序
by Tushar Agrawal
由Tushar Agrawal
Basically, if I have no intention of using a service then I won’t bother reverse-engineering it. — Jon Lech Johansen
基本上,如果我不打算使用服务,那么我就不会对它进行反向工程。 — 乔恩·莱希·约翰森
As evident from my bio, I am crazy about music and pretty much anything related to it. And I believe that music videos, if well-directed, are possibly the best way to feel the inherent soul of music.
从我的简历中可以明显看出,我为音乐以及与之相关的一切疯狂。 而且我相信音乐视频,如果经过精心指导,可能是感受音乐内在灵魂的最佳方式。
So, it all began with me watching the music video of a song “Heavydirtysoul” by Twenty One Pilots. The music video was so dope I didn’t even care for the lyrics. It was only after I listened to it a few times, I realized that I didn’t get much of the lyrics except the chorus part.
因此,一切始于我观看二十一飞行员的歌曲“ Heavydirtysoul ”的音乐视频。 音乐录影带太棒了,我什至都不在乎歌词。 只是在我听了几次之后,我才意识到除了合唱部分以外,我并没有得到太多的歌词。
This is something that is an actual problem for many ESL (English as a Second Language) speakers. You can’t enjoy a song to its fullest if you don’t get the lyrics.
对于许多ESL(英语为第二语言)讲者来说,这是一个实际问题。 如果您听不到歌词,就无法尽情欣赏一首歌。
It was then that I thought of something: what if I could play the lyrics of a song alongside the music videos (much like subtitles)? It would be awesome if I could create subtitle files for my music videos and then play it on my video player!
那时我想到了什么:如果我可以在音乐视频旁边播放歌曲的歌词(很像字幕),该怎么办? 如果我可以为音乐视频创建字幕文件,然后在视频播放器上播放,那就太好了!
初始方法和找到Musixmatch (Initial Approach and finding Musixmatch)
I then began a comprehensive search for sites or APIs that could provide me the lyrics for a song. And as expected, I found a dozen sites that provided the lyrics. Cool… isn’t it?
然后,我开始全面搜索可以为我提供歌曲歌词的网站或API。 和预期的一样,我找到了十二个提供歌词的网站。 酷...不是吗?
Nah. Because, what I really needed was timed lyrics, much like a subtitle for a movie. I wanted the lyrics text to sync with the current video frame on the screen. After much searching, I was unable to find any such service.
没事 因为,我真正需要的是定时歌词,很像电影的字幕。 我希望歌词文本与屏幕上的当前视频帧同步 。 经过大量搜索,我找不到任何此类服务。
It was only after a week someone told me to use Musixmatch, a chrome extension that embedded lyrics on YouTube videos. So, yeah, there was someone out there who was already doing what I had thought about. It sounded like most of the other well thought so-called new ideas I had...and I was just a step away from fetching SubRip “srt” subtitle files for my favorite music videos.
仅仅一周后,有人告诉我要使用Musixmatch ,这是一款Chrome扩展程序,可在YouTube视频中嵌入歌词。 所以,是的,外面有人已经在做我想到的事情了。 听起来像是我所拥有的其他大多数经过深思熟虑的所谓的新主意……我离为我喜欢的音乐视频获取SubRip“ srt”字幕文件仅一步之遥。
然后黑客开始了…… (And the hacking started…)
I already had a bit of experience working with the chrome developer tools (thanks to Node.js and front end designing). So I put on my hacker glasses and fired up Chrome Dev tools. I switched to the network tab and began to look for any text file that could contain the lyrics.
我已经有一些使用chrome开发人员工具的经验(感谢Node.js和前端设计)。 因此,我戴上了黑客眼镜并启动了Chrome开发工具。 我切换到“网络”选项卡,开始寻找可能包含歌词的任何文本文件。
But I was analyzing requests on a page that was playing YouTube videos, so I had a plenty of requests. And since the extension was fetching lyrics, the request must have something to do with the Musixmatch domain.
但是我正在分析播放YouTube视频的页面上的请求,因此我有很多请求。 并且由于扩展名正在提取歌词,因此该请求必须与Musixmatch域有关。
So I filtered using the keyword ‘musix’ and looked patiently for my file and I finally found it. Lyrics along with the time stamp. I noted the URL of that request and frankly, it all seemed like gibberish to me. Anyways, I copied the URL string as such and then pasted it into the URL bar, and voilà, I got the lyrics.
因此,我使用关键字“ musix”进行了过滤,耐心地寻找了我的文件,终于找到了它。 歌词以及时间戳记。 我记下了该请求的URL,坦白说,对我来说,这一切似乎简直是胡言乱语。 无论如何,我照原样复制了URL字符串,然后将其粘贴到URL栏中,瞧瞧,我得到了歌词。
So, the only thing left was to find out how the URL is being framed and what were the parameters..
因此,剩下的唯一一件事就是找出URL的框架以及参数是什么。
参数又是什么? (Parameters and what?)
After all the analyzing and filtering, I finally ended up with this. A long URL with a bunch of unknown parameters.
经过所有的分析和过滤,我最终完成了这一工作。 带有许多未知参数的长网址。
I needed to dig deeper to actually understand the importance of each parameter. At a glance, it was clear that the only parameters that actually mattered were res
and v
. Others were just for house-keeping stuff. Then I began to explore the options and ended up wasting an hour just to find that the parameter v
is nothing but the YouTube Video Id.
我需要更深入地研究才能真正理解每个参数的重要性。 乍一看,很明显,真正重要的参数是res
和v
。 其他的只是做家务的东西。 然后,我开始探索这些选项,并最终浪费了一个小时才发现参数v
只是YouTube视频ID。
For example, the Video Id or v
for a YouTube video with a URL https://www.youtube.com/watch?v=ZQeq_T_2VE8 is ZQeq_T_2VE8
. Now that I had unveiled the mystery of v
, I thought it would take me hardly another hour to find about res
, but boy was I wrong.
例如,URL为https://www.youtube.com/watch?v=ZQeq_T_2VE8的YouTube视频的视频ID或v
为ZQeq_T_2VE8
。 既然我已经揭开了v
的奥秘,我以为再也不需要花一个小时就可以找到res
,但是我错了。
参数'res'的奇怪情况 (The curious case of the parameter ‘res’)
An hour of deep analysis and research gave me nothing. A little later, I realized that the URL worked even when I changed few alphabets. I kept up digging and by the end of the 3 hours, I figured out that the alphabets in the string didn’t mean anything. They were just put randomly.
一个小时的深入分析和研究并没有给我带来任何好处。 过了一会儿,我意识到即使更改了几个字母,URL仍然有效。 我坚持不懈地进行挖掘,在3个小时结束时,我发现字符串中的字母没有任何意义。 他们只是随机放置。
A typical value of res : 90rt120b114xz70xv82w85vv90a94hn90vb102av86
So I was done with the alphabets but the numeric values were still alien to me. The next thing I could think of was applying a bit of reverse-engineering to analyze the numbers.
因此,我完成了字母操作,但数字值对我来说仍然陌生。 我可能想到的下一件事是应用一些逆向工程来分析数字。
I began with removing all the alphabets as they didn’t mean anything and the first thing I noticed that the number of those values were fixed, the number being 11. I tried it with many other videos, but the number remained constant.
我开始删除所有字母,因为它们没有任何意义,第一件事是我注意到这些值的数量是固定的,数量为11。我尝试了许多其他视频,但是数量保持不变。
Suddenly, it struck me, Video Id, the v
, we discussed earlier also had 11 characters. However, each character in v
could be an alphabet or a digit or even a ‘-’ or ‘_’, unlike res
which had only numbers.
突然,我大吃一惊,我们前面讨论的视频ID v
也有11个字符。 但是, v
每个字符可以是字母或数字,甚至可以是“-”或“ _”,这与仅具有数字的res
不同。
So, I tried the most obvious mapping that can map a character to its numeric value, ASCII, and voilà that was it. The characters were ASCII encoded and alphabets were randomly put in between the numbers to make the whole string look more random, I guess.
因此,我尝试了最明显的映射,该映射可以将字符映射到其数值,ASCII和原始字符。 我猜想字符是用ASCII编码的,字母是随机放在数字之间的,以使整个字符串看起来更加随机。
At this point, I was delighted. After all, I had learned about all the parameters and was only a step away from writing my own handy script to download the lyrics file in “srt” format. Just to be sure, I checked with different videos and there seemed to be no issue whatsoever. I also shared the URL with one of my friends (yeah, a music lover).
在这一点上,我很高兴。 毕竟,我已经了解了所有参数,距离编写自己的便捷脚本以“ srt”格式下载歌词文件仅一步之遥。 可以肯定的是,我检查了不同的视频,似乎没有任何问题。 我还与我的一个朋友(是音乐迷)分享了该URL。
I got a quick reply and it said “What is it? There’s nothing”. I crosschecked the URL and it was working fine on my browser.
我得到快速回复,并说:“什么? 什么也没有”。 我对URL进行了交叉检查,并且在我的浏览器上工作正常。
谁是罪魁祸首? :P (Who was the culprit ? :P)
I don’t get sent anything strange like underwear. I get sent cookies. :P — Jennifer Aniston
我不会收到像内衣一样奇怪的东西。 我收到了发送的cookie。 :P —珍妮弗·安妮斯顿
I fired up the developer tools again and then copied the link for a new song. It again worked and then I switched to an incognito tab and pasted that same URL. It didn’t work.
我再次启动了开发人员工具,然后复制了新歌曲的链接。 它再次起作用,然后我切换到隐身标签并粘贴了相同的URL。 没用
My experience of CTF (Capture The Flag) contests immediately told me that it had something to do with the cookies. That’s the most likely case if a URL is working in a browser window and not the other.
我在CTF(夺旗比赛)比赛中的经历立即告诉我,这与Cookie有关。 如果URL在浏览器窗口而不是其他窗口中运行,则是最可能的情况。
I switched to the developer console and saw that the cookie was indeed being sent by the browser. To be sure, I analyzed the request many times and it finally occurred to me that the cookie being sent was the same the Musixmatch server is sending in the response. Also, each cookie is valid for only a certain time period.
我切换到开发者控制台,发现cookie确实是由浏览器发送的。 可以肯定的是,我对请求进行了多次分析,最后我发现发送的cookie与Musixmatch服务器在响应中发送的cookie相同。 此外,每个Cookie仅在特定时间段内有效。
So, I wrote a Python script using urllib that first gets the cookie from a normal HTTP response since the cookie works across the domain. Then the cookie along with other parameters was framed as an HTTP request and we got the lyrics... Finally!!
因此,我使用urllib编写了一个Python脚本,该脚本首先从正常的HTTP响应中获取cookie,因为cookie在整个域中均有效。 然后将cookie以及其他参数作为HTTP请求进行框架化,然后我们得到了歌词……最后!
准备成功请求的参数 (Preparing the parameters for a successful request)
Here is the Python code for all the steps discussed above. The code first generates the parameters followed by a request to get the cookies. URL is then prepared using the parameters. Next, the cookie is defined in the header request along with other header fields like ‘Host’ and ‘User-agent’ to give it more of an authentic request look.
这是上面讨论的所有步骤的Python代码。 该代码首先生成参数,然后生成获取cookie的请求。 然后使用参数准备URL。 接下来,在标头请求中定义cookie,并与其他标头字段(如“主机”和“用户代理”)一起定义,以使其具有更真实的请求外观。
将原始定时歌词解析为SRT格式 (Parsing the raw timed lyrics into srt format)
Now, the next major thing or the only task left was to convert the raw timed lyrics data into a proper srt (SubRip Text) format. Here is what the MusixMatch lyrics format looked like.
现在,接下来的主要任务或唯一的任务是将原始定时歌词数据转换为适当的srt(SubRip文本)格式。 这是MusixMatch歌词格式的样子。
Below is a proper format for a srt file.These files contain formatted lines of plain text in groups separated by a blank line. Subtitles are numbered sequentially, starting at 1 as depicted in the figure below.
以下是srt文件的正确格式。这些文件包含以空格分隔的纯文本格式行。 字幕按顺序编号,从1开始,如下图所示。
100:00:00,350 --> 00:00:03,45071 buildings explodedor caught fire.
200:00:03,490 --> 00:00:05,020Elliot, tell me what it isthat you think he did.
300:00:05,060 --> 00:00:06,930Sorry.I don't know if I can say.
This sounded like a whole lot of work was required as the data was yet to be properly formatted. But, if you have the required data and a knowledge of Python, all it takes is a simple script to handle the data and that’s exactly what I did. The HTML tags annoyed me a bit during HTML parsing but guess what, there is an awesome library just for HTML parsing which made the whole process very easy. No points for guessing the library’s name, HTMLParser :-).
听起来这需要进行大量工作,因为数据尚未正确格式化。 但是,如果您具有所需的数据和Python的知识,那么只需要一个简单的脚本即可处理数据,而这正是我所做的。 HTML标记在HTML解析过程中使我有些恼火,但请猜测,有一个很棒的库仅用于HTML解析,这使整个过程非常容易。 没有理由猜测库的名称HTMLParser :-)。
最后的话 (Final words)
So, I put together this script along with some modifications and with a simple front end on a flask server, I had my own lyrics fetching interface, possibly the only one of its kind in the whole world !!
因此,我将该脚本以及一些修改内容和一个简单的前端放在Flask服务器上放在一起,得到了自己的歌词获取接口,这可能是全世界唯一的一种!
By the way, if you are into music, have a look at Musixmatch. It is really awesome. This exercise was just for educational purposes and wasn’t used in any way to violate Musixmatch’s copyright.
顺便说一句,如果您喜欢音乐,请看看Musixmatch。 真的很棒 此练习仅出于教育目的,没有以任何方式侵犯Musixmatch的版权。
flask开发桌面应用程序