使用Python和Selenium的现代Web自动化

In this tutorial you’ll learn advanced Python web automation techniques: Using Selenium with a “headless” browser, exporting the scraped data to CSV files, and wrapping your scraping code in a Python class.

在本教程中,您将学习高级Python Web自动化技术:将Selenium与“无头”浏览器结合使用,将抓取的数据导出到CSV文件,并将抓取代码包装在Python类中。

动机:跟踪听力习惯 (Motivation: Tracking Listening Habits)

Suppose that you have been listening to music on bandcamp for a while now, and you find yourself wishing you could remember a song you heard a few months back.

假设您已经在bandcamp上听音乐了一段时间,并且发现自己希望自己能记住几个月前听到的一首歌。

Sure you could dig through your browser history and check each song, but that might be a pain… All you remember is that you heard the song a few months ago and that it was in the electronic genre.

当然,您可以浏览浏览器的历史记录并检查每首歌曲,但这可能很痛苦……您所记得的是,几个月前您听到了这首歌,并且属于电子流派。

“Wouldn’t it be great,” you think to yourself, “if I had a record of my listening history? I could just look up the electronic songs from two months ago and I’d surely find it.”

“对我来说,如果我有自己的听力记录的话,那会不会很棒?” 我只需要查阅两个月前的电子歌曲,我肯定会找到它的。”

Today, you will build a basic Python class, called BandLeader that connects to bandcamp.com, streams music from the “discovery” section of the front page, and keeps track of your listening history.

今天,您将构建一个名为BandLeader的基本Python类, BandLeader连接到bandcamp.com ,从首页的“发现”部分流式播放音乐,并跟踪您的收听历史。

The listening history will be saved to disk in a CSV file. You can then explore that CSV file in your favorite spreadsheet application or even with Python.

收听历史记录将以CSV文件保存到磁盘。 然后,您可以在自己喜欢的电子表格应用程序中甚至使用Python浏览该CSV文件。

If you have had some experience with web scraping in Python, you are familiar with making HTTP requests and using Pythonic APIs to navigate the DOM. You will do more of the same today, except with one difference.

如果您有使用Python进行网络抓取的经验,那么您将熟悉发出HTTP请求并使用Pythonic API导航DOM。 除了一个区别之外,您今天将做更多相同的事情。

Today you will use a full-fledged browser running in headless mode to do the HTTP requests for you.

今天,您将使用在无头模式下运行的功能强大的浏览器为您执行HTTP请求。

A headless browser is just a regular web browser, except that it contains no visible UI element. Just like you’d expect, it can do more than make requests: it can also render HTML (though you cannot see it), keep session information, and even perform asynchronous network communications by running JavaScript code.

无头浏览器只是普通的Web浏览器,只不过它不包含可见的UI元素。 就像您期望的那样,它不仅可以执行请求:它还可以呈现HTML(尽管您看不到它),保留会话信息,甚至可以通过运行JavaScript代码来执行异步网络通信。

If you want to automate the modern web, headless browsers are essential.

如果要自动化现代网络,无头浏览器必不可少。

Free Bonus: Click here to download a “Python + Selenium” project skeleton with full source code that you can use as a foundation for your own Python web scraping and automation apps.

免费奖金: 单击此处下载带有完整源代码的“ Python + Selenium”项目框架 ,您可以将其用作自己的Python Web抓取和自动化应用程序的基础。

建立 (Setup)

Your first step, before writing a single line of Python, is to install a Selenium supported WebDriver for your favorite web browser. In what follows, you will be working with Firefox, but Chrome could easily work too.

在编写一行Python之前,第一步是为您喜欢的Web浏览器安装受Selenium支持的WebDriver 。 接下来,您将使用Firefox ,但Chrome也可以轻松使用。

So, assuming that the path ~/.local/bin is in your execution PATH, here’s how you would install the Firefox webdriver, called geckodriver, on a Linux machine:

因此,假设路径~/.local/bin在执行PATH ,这是在Linux计算机上安装称为geckodriver的Firefox geckodriver

 $ wget https://github.com/mozilla/geckodriver/releases/download/v0.19.1/geckodriver-v0.19.1-linux64.tar.gz
$ wget https://github.com/mozilla/geckodriver/releases/download/v0.19.1/geckodriver-v0.19.1-linux64.tar.gz
$ tar xvfz geckodriver-v0.19.1-linux64.tar.gz
$ tar xvfz geckodriver-v0.19.1-linux64.tar.gz
$ mv geckodriver ~/.local/bin
$ mv geckodriver ~/.local/bin

Next, you install the selenium package, using pip or however else you like. If you made a virtual environment for this project, you just type:

接下来,使用pip或其他您喜欢的方式安装selenium软件包。 如果您为此项目创建了虚拟环境 ,则只需键入:

[ If you ever feel lost during the course of this tutorial, the full code demo can be found on GitHub. ]

[如果您在本教程的过程中迷路,可以在GitHub上找到完整的代码演示。 ]

Now it’s time for a test drive:

现在该进行试驾了:

测试驾驶无头浏览器 (Test Driving a Headless Browser)

To test that everything is working, you decide to try out a basic web search via DuckDuckGo. You fire up your preferred Python interpreter and type:

为了测试一切正常,您决定尝试通过DuckDuckGo进行基本的网络搜索。 您启动首选的Python解释器并输入:

 >>> >>>  from from selenium.webdriver selenium.webdriver import import Firefox
Firefox
>>> >>>  from from selenium.webdriver.firefox.options selenium.webdriver.firefox.options import import Options
Options
>>> >>>  opts opts = = OptionsOptions ()
()
>>> >>>  optsopts .. set_headlessset_headless ()
()
>>> >>>  assert assert optionsoptions .. headless  headless  # operating in headless mode
# operating in headless mode
>>> >>>  browser browser = = FirefoxFirefox (( optionsoptions == optsopts )
)
>>> >>>  browserbrowser .. getget (( 'https://duckduckgo.com''https://duckduckgo.com' )
)

So far you have created a headless Firefox browser navigated to https://duckduckgo.com. You made an Options instance and used it to activate headless mode when you passed it to the Firefox constructor. This is akin to typing firefox -headless at the command line.

到目前为止,您已经创建了一个无头Firefox浏览器,该浏览器导航到https://duckduckgo.com 。 在将其传递给Firefox构造函数时,您创建了一个Options实例并将其用于激活无头模式。 这类似于在命令行中键入firefox -headless

Python Web Scraping:Duck Duck Go屏幕截图

Now that a page is loaded you can query the DOM using methods defined on your newly minted browser object. But how do you know what to query? The best way is to open your web browser and use its developer tools to inspect the contents of the page. Right now you want to get ahold of the search form so you can submit a query. By inspecting DuckDuckGo’s home page you find that the search form <input> element has an id attribute "search_form_input_homepage". That’s just what you needed:

现在已加载页面,您可以使用在新创建的browser对象上定义的方法来查询DOM。 但是你怎么知道要查询什么呢? 最好的方法是打开Web浏览器并使用其开发人员工具检查页面内容。 现在,您想获取搜索表单,以便可以提交查询。 通过检查DuckDuckGo的主页,您会发现搜索表单<input>元素具有id属性"search_form_input_homepage" 。 这就是您所需要的:

You found the search form, used the send_keys method to fill it out, and then the submit method to perform your search for "Real Python". You can checkout the top result:

你找到了搜索的形式,用send_keys方法来填充它,然后submit方法来执行搜索的"Real Python" 。 您可以签出最佳结果:

 >>> >>>  results results = = browserbrowser .. find_elements_by_class_namefind_elements_by_class_name (( 'result''result' )
)
>>> >>>  printprint (( resultsresults [[ 00 ]] .. texttext )

)

Real Python - Real Python
Real Python - Real Python
Get Real Python and get your hands dirty quickly so you spend more time making real applications. Real Python teaches Python and web development from the ground up ...
Get Real Python and get your hands dirty quickly so you spend more time making real applications. Real Python teaches Python and web development from the ground up ...
https://realpython.com
https://realpython.com

Everything seems to be working. In order to prevent invisible headless browser instances from piling up on your machine, you close the browser object before exiting your python session:

一切似乎都正常。 为了防止看不见的无头浏览器实例堆积在您的计算机上,请在退出python会话之前关闭浏览器对象:

Groovin on Tunes (Groovin on Tunes)

You’ve tested that you can drive a headless browser using Python, now to put it to use.

您已经测试过可以使用Python驱动无头浏览器,现在可以使用它了。

  1. You want to play music
  2. You want to browse and explore music
  3. You want information about what music is playing.
  1. 你想播放音乐
  2. 您想浏览和探索音乐
  3. 您需要有关正在播放的音乐的信息。

To start, you navigate to https://bandcamp.com and start to poke around in your browser’s developer tools. You discover a big shiny play button towards the bottom of the screen with a class attribute that contains the value"playbutton". You check that it works:

首先,您导航到https://bandcamp.com并开始在浏览器的开发人员工具中四处浏览。 您会在屏幕底部发现一个大的闪亮播放按钮,其class属性包含值"playbutton" 。 您检查它是否有效:

Python Web Scraping: Bandcamp Discovery Section

 >>> >>>  opts opts = = OptionOption ()
()
>>> >>>  optsopts .. set_headlessset_headless ()
()
>>> >>>  browser browser = = FirefoxFirefox (( optionsoptions == optsopts )
)
>>> >>>  browserbrowser .. getget (( 'https://bandcamp.com''https://bandcamp.com' )
)
>>> >>>  browserbrowser .. find_element_by_classfind_element_by_class (( 'playbutton''playbutton' )) .. clickclick ()
()

You should hear music! Leave it playing and move back to your web browser. Just to the side of the play button is the discovery section. Again, you inspect this section and find that each of the currently visible available tracks has a class value of "discover-item", and that each item seems to be clickable. In Python, you check this out:

你应该听音乐! 继续播放,然后移回您的Web浏览器。 播放按钮旁边是发现部分。 再次检查该部分,发现每个当前可见的可用轨道的class值为"discover-item" ,并且每个项目似乎都是可单击的。 在Python中,您可以检查以下内容:

A new track should be playing! This is the first step to exploring bandcamp using Python!. You spend a few minutes clicking on different tracks in your Python environment but soon grow tired of the meager library of 8 songs.

应该正在播放新曲目! 这是使用Python探索Bandcamp的第一步。 您花了几分钟时间在Python环境中单击不同的曲目,但很快就厌倦了8首歌曲的贫乏库。

浏览目录 (Exploring the Catalogue)

Looking a back at your browser, you see the buttons for exploring all of the tracks featured in bandcamp’s music discovery section. By now this is familiar: each button has a class value of "item-page". The very last button is the “next” button that will display the next eight tracks in the catalogue. You go to work:

回顾浏览器,您会看到在bandcamp的“音乐发现”部分中探索所有曲目的按钮。 到目前为止,这已经很熟悉了:每个按钮的class值为"item-page" 。 最后一个按钮是“下一个”按钮,它将显示目录中的下八个曲目。 你去上班:

 >>> >>>  next_button next_button = = [[ e e for for e e in in browserbrowser .. find_elements_by_class_namefind_elements_by_class_name (( 'item-page''item-page' )
)
                   if e.text.lower().find('next') > -1]
                   if e.text.lower().find('next') > -1]
>>> >>>  next_buttonnext_button .. clickclick ()
()

Great! Now you want to look at the new tracks, so you think “I’ll just repopulate my tracks variable like I did a few minutes ago”. But this is where things start to get tricky.

大! 现在,您要查看新轨道,因此您认为“就像几分钟前一样,我将重新填充tracks变量”。 但这就是开始变得棘手的地方。

First, bandcamp designed their site for humans to enjoy using, not for Python scripts to access programmatically. When you call next_button.click() the real web browser responds by executing some JavaScript code. If you try it out in your browser, you see that some time elapses as the catalogue of songs scrolls with a smooth animation effect. If you try to repopulate your tracks variable before the animation finishes, you may not get all the tracks and you may get some that you don’t want.

首先,bandcamp设计了供人们欣赏的网站,而不是让Python脚本以编程方式访问。 当您调用next_button.click() ,实际的Web浏览器将通过执行一些JavaScript代码来响应。 如果在浏览器中进行尝试,您会发现随着歌曲目录以平滑的动画效果滚动时会流逝一些时间。 如果尝试在动画结束之前重新填充您的tracks变量,则可能无法获得所有的轨道,并且可能会得到一些不需要的轨道。

The solution? You can just sleep for a second or, if you are just running all this in a Python shell, you probably wont even notice – after all it takes time for you to type too.

解决方案? 您可以睡一秒钟,或者,如果您只是在Python shell中运行所有这些,您甚至可能不会注意到-毕竟键入它也需要时间。

Another slight kink is something that can only be discovered through experimentation. You try to run the same code again:

另一个小问题是只能通过实验才能发现。 您尝试再次运行相同的代码:

But You notice something strange. len(tracks) is not equal to 8 even though only the next batch of 8 should be displayed. Digging a little further you find that your list contains some tracks that were displayed before. To get only the tracks that are actually visible in the browser, you need to filter the results a little.

但是,您会注意到一些奇怪的事情。 即使仅显示下一批8 len(tracks)也不等于8 。 进一步挖掘,您会发现列表中包含一些以前显示的曲目。 要仅获取在浏览器中实际可见的轨道,您需要对结果进行一些过滤。

After trying a few things, you decide to keep a track only if its x coordinate on the page fall within the bounding box of the containing element. The catalogue’s container has a class value of "discover-results". Here’s how you proceed:

在尝试了几件事之后,您决定仅在页面上其x坐标落在包含元素的边界框内的情况下才保留轨道。 目录的容器的class值为"discover-results" 。 操作方法如下:

 >>> >>>  discover_section discover_section = = selfself .. browserbrowser .. find_element_by_class_namefind_element_by_class_name (( 'discover-results''discover-results' )
)
>>> >>>  left_x left_x = = discover_sectiondiscover_section .. locationlocation [[ 'x''x' ]
]
>>> >>>  right_x right_x = = left_x left_x + + discover_sectiondiscover_section .. sizesize [[ 'width''width' ]
]
>>> >>>  discover_items discover_items = = browserbrowser .. find_element_by_class_namefind_element_by_class_name (( 'discover_items''discover_items' )
)
>>> >>>  tracks tracks = = [[ t t for for t t in in discover_items
discover_items
              if t.location['x'] >= left_x and t.location['x'] < right_x]
              if t.location['x'] >= left_x and t.location['x'] < right_x]
>>> >>>  assert assert lenlen (( trackstracks ) ) == == 8
8

建立课堂 (Building a Class)

If you are growing weary of retyping the same commands over and over again in your Python environment, you should dump some of it into a module. A basic class for your bandcamp manipulation should do the following:

如果您厌倦了在Python环境中一遍又一遍地重新键入相同的命令,则应将其中一些命令转储到模块中。 您的bandcamp操作的基础课应执行以下操作:

  1. Initialize a headless browser and navigate to bandcamp
  2. Keep a list of available tracks
  3. Support finding more tracks
  4. Play, pause, and skip tracks
  1. 初始化无头浏览器并导航到bandcamp
  2. 保留可用曲目列表
  3. 支持查找更多曲目
  4. 播放,暂停和跳过曲目

All in one go, here’s the basic code:

一口气,这里是基本代码:

Pretty neat. You can import this into your Python environment and run bandcamp programmatically! But wait, didn’t you start this whole thing because you wanted to keep track of information about your listening history?

漂亮整齐。 您可以将其导入Python环境并以编程方式运行bandcamp! 但是,等等,您不是因为想要跟踪有关收听历史的信息而开始整个事情吗?

收集结构化数据 (Collecting Structured Data)

Your final task is to keep track of the songs that you actually listened to. How might you do this? What does it mean to actually listen to something anyway? If you are perusing the catalogue, stopping for a few seconds on each song, do each of those songs count? Probably not. You are going to allow some ‘exploration’ time to factor in to your data collection.

您的最终任务是跟踪您实际听过的歌曲。 您可能会怎么做? 无论如何,实际上听什么是什么意思? 如果您正在仔细阅读目录,每首歌曲停留几秒钟,那么这些歌曲中的每首歌曲都算在内吗? 可能不是。 您将需要一些“探索”时间来考虑您的数据收集。

Your goals are now to:

您现在的目标是:

  1. Collect structured information about the currently playing track
  2. Keep a “database” of tracks
  3. Save and restore that “database” to and from disk
  1. 收集有关当前播放曲目的结构化信息
  2. 保留曲目的“数据库”
  3. 将该“数据库”保存到磁盘或从磁盘还原

You decide to use a namedtuple to store the information that you track. Named tuples are good for representing bundles of attributes with no functionality tied to them, a bit like a database record.

您决定使用namedtuple存储您跟踪的信息。 命名元组非常适合表示没有任何功能绑定的属性束,有点像数据库记录。

 TrackRec TrackRec = = namedtuplenamedtuple (( 'TrackRec''TrackRec' , , [
    [
    'title''title' , 
    , 
    'artist''artist' ,
    ,
    'artist_url''artist_url' , 
    , 
    'album''album' ,
    ,
    'album_url''album_url' , 
    , 
    'timestamp'  'timestamp'  # When you played it
# When you played it
])
])

In order to collect this information, you add a method to the BandLeader class. Checking back in with the browser’s developer tools, you find the right HTML elements and attributes to select all the information you need. Also, you only want to get information about the currently playing track if there music is actually playing at the time. Luckily, the page player adds a "playing" class to the play button whenever music is playing and removes it when the music stops. With these considerations in mind, you write a couple of methods:

为了收集此信息,您可以向BandLeader类添加一个方法。 再次使用浏览器的开发人员工具,您将找到正确HTML元素和属性,以选择所需的所有信息。 另外,如果当时确实正在播放音乐,则只想获取有关当前播放曲目的信息。 幸运的是,每当播放音乐时,页面播放器就会在播放按钮上添加"playing"类,并在音乐停止播放时将其删除。 考虑到这些注意事项,您编写了两种方法:

For good measure, you also modify the play method to keep track of the currently playing track:

为了达到良好的效果,您还可以修改play方法来跟踪当前播放的曲目:

        def def playplay (( selfself , , tracktrack == NoneNone ):
        ):
        '''
'''
        play a track. If no track number is supplied, the presently selected track
        play a track. If no track number is supplied, the presently selected track
        will play
        will play
        '''

                '''

        if if track track is is NoneNone :
            :
            selfself .. browserbrowser .. find_element_by_class_namefind_element_by_class_name (( 'playbutton''playbutton' )) .. clickclick ()
        ()
        elif elif typetype (( tracktrack ) ) is is int int and and track track <= <= lenlen (( selfself .. track_listtrack_list ) ) and and track track >= >= 11 :
            :
            selfself .. _current_track_number _current_track_number = = track
            track
            selfself .. track_listtrack_list [[ selfself .. _current_track_number _current_track_number - - 11 ]] .. clickclick ()

        ()

        sleepsleep (( 0.50.5 )
        )
        if if selfself .. is_playingis_playing ():
            ():
            selfself .. _current_track_record _current_track_record = = selfself .. currently_playingcurrently_playing ()
()

Next, you’ve got to keep a database of some kind. Though it may not scale well in the long run, you can go far with a simple list. You add self.database = [] to BandCamp‘s __init__ method. Because you want to allow for time to pass before entering a TrackRec object into the database, you decide to use Python’s threading tools to run a separate process that maintains the database in the background.

接下来,您必须保留某种数据库。 尽管从长远来看它可能无法很好地扩展,但是您可以列出一个简单的清单。 您将self.database = []添加到BandCamp__init__方法。 因为您要在将TrackRec对象输入数据库之前TrackRec时间,所以决定使用Python的线程工具来运行一个单独的进程,以在后台维护数据库。

You’ll supply a _maintain() method to BandLeader instances that will run it a separate thread. The new method will periodically check the value of self._current_track_record and add it to the database if it is new.

您将向BandLeader实例提供BandLeader _maintain()方法,该方法将在单独的线程中运行它。 新方法将定期检查self._current_track_record的值,如果它是新值, self._current_track_record其添加到数据库中。

You will start the thread when the class is instantiated by adding some code to __init__.

通过将一些代码添加到__init__可以在实例化类时启动线程。

If you’ve never worked with multithreaded programming in Python, you should read up on it! For your present purpose, you can think of thread as a loop that runs in the background of the main Python process (the one you interact with directly). Every twenty seconds, the loop checks a few things to see if the database needs to be updated, and if it does, appends a new record. Pretty cool.

如果您从未使用过Python中的多线程编程,则应该继续阅读! 就您当前的目的而言,您可以将线程看作是在主Python进程(与您直接交互的后台)的后台运行的循环。 每隔20秒,循环将检查几件事,以查看是否需要更新数据库,如果需要,则追加新记录。 很酷

The very last step is saving the database and restoring from saved states. Using the csv package you can ensure your database resides in a highly portable format, and remains usable even if you abandon your wonderful BandLeader class 😉

最后一步是保存数据库并从保存的状态还原。 使用csv软件包,可以确保数据库以高度可移植的格式驻留,并且即使您放弃了出色的BandLeader类也可以保持可用BandLeader 😉

The __init__ method should be yet again altered, this time to accept a file path where you’d like to save the database. You’d like to load this database if it is available, and you’d like to save it periodically, whenever it is updated. The updates look like so:

应该再次更改__init__方法,这一次接受您要保存数据库的文件路径。 您想加载该数据库(如果可用),并希望在更新时定期保存它。 更新如下所示:

        def def __init____init__ (( selfself ,, csvpathcsvpath == NoneNone ):
        ):
        selfself .. database_pathdatabase_path == csvpath
        csvpath
        selfself .. database database = = []      

        []      

        # load database from disk if possible
        # load database from disk if possible
        if if isfileisfile (( selfself .. database_pathdatabase_path ):
            ):
            with with openopen (( selfself .. database_pathdatabase_path , , newlinenewline == '''' ) ) as as dbfiledbfile :
                :
                dbreader dbreader = = csvcsv .. readerreader (( dbfiledbfile )
                )
                nextnext (( dbreaderdbreader )   )   # to ignore the header line
                # to ignore the header line
                selfself .. database database = = [[ TrackRecTrackRec .. _make_make (( recrec ) ) for for rec rec in in dbreaderdbreader ]

        ]

        # .... the rest of the __init__ method is unchanged ....


    # .... the rest of the __init__ method is unchanged ....


    # a new save_db method
    # a new save_db method
    def def save_dbsave_db (( selfself ):
        ):
        with with openopen (( selfself .. database_pathdatabase_path ,, 'w''w' ,, newlinenewline == '''' ) ) as as dbfiledbfile :
            :
            dbwriter dbwriter = = csvcsv .. writerwriter (( dbfiledbfile )
            )
            dbwriterdbwriter .. writerowwriterow (( listlist (( TrackRecTrackRec .. _fields_fields ))
            ))
            for for entry entry in in selfself .. databasedatabase :
                :
                dbwriterdbwriter .. writerowwriterow (( listlist (( entryentry ))


    ))


    # finally add a call to save_db to your database maintenance method
    # finally add a call to save_db to your database maintenance method
    def def _update_db_update_db (( selfself ):
        ):
        trytry :
            :
            check check = = (( selfself .. _current_track_record _current_track_record is is not not None
                     None
                     and and selfself .. _current_track_record _current_track_record is is not not None
                     None
                     and and (( lenlen (( selfself .. databasedatabase ) ) == == 0
                          0
                          or or selfself .. databasedatabase [[ -- 11 ] ] != != selfself .. _current_track_record_current_track_record )
                     )
                     and and selfself .. is_playingis_playing ())
            ())
            if if checkcheck :
                :
                selfself .. databasedatabase .. appendappend (( selfself .. _current_track_record_current_track_record )
                )
                selfself .. save_dbsave_db ()

        ()

        except except Exception Exception as as ee :
            :
            printprint (( 'error while updating the db: {}''error while updating the db: {}' .. formatformat (( ee )
)

And voilà! You can listen to music and keep a record of what you hear! Amazing.

和瞧! 您可以听音乐并记录所听到的声音! 惊人。

Something interesting about the above is that using a namedtuple really begins to pay off. When converting to and from CSV format, you take advantage of the ordering of the rows in the CSV file to fill in the rows in the TrackRec objects. Likewise, you can create the header row of the CSV file by referencing the TrackRec._fields attribute. This is one of the reasons using a tuple ends up making sense for columnar data.

上面有趣的是, 使用namedtuple确实开始有所namedtuple 。 在与CSV格式之间相互转换时,您可以利用CSV文件中各行的顺序来填充TrackRec对象中的行。 同样,您可以通过引用TrackRec._fields属性来创建CSV文件的标题行。 这是使用元组最终对列数据有意义的原因之一。

接下来是什么,您学到了什么? (What’s Next and What Have You Learned?)

From here you could do loads more! Here are a few quick ideas that would leverage the mild superpower that is Python + Selenium:

从这里您可以做更多的工作! 以下是一些可以利用Python + Selenium的强大功能的快速构想:

  • You could extend the BandLeader class to navigate to album pages and play the tracks you find there
  • You might decide to create playlists based on your favorite or most frequently heard tracks
  • Perhaps you want to add an autoplay feature
  • Maybe you’d like to query songs by date or title or artist and build playlists that way
  • 您可以扩展BandLeader类以导航到专辑页面并播放在那里找到的曲目
  • 您可能决定根据自己喜欢或最常听的曲目来创建播放列表
  • 也许您想添加自动播放功能
  • 也许您想按日期或标题或歌手查询歌曲,并以此方式建立播放列表

Free Bonus: Click here to download a “Python + Selenium” project skeleton with full source code that you can use as a foundation for your own Python web scraping and automation apps.

免费奖金: 单击此处下载带有完整源代码的“ Python + Selenium”项目框架 ,您可以将其用作自己的Python Web抓取和自动化应用程序的基础。

You have learned that Python can do everything that a web browser can do, and a bit more. You could easily write scripts to control virtual browser instances that run in the cloud, create bots that interact with real users, or that mindlessly fill out forms! Go forth, and automate!

您已经了解到Python可以完成Web浏览器可以完成的所有工作,甚至更多。 您可以轻松编写脚本来控制在云中运行的虚拟浏览器实例,创建与真实用户进行交互的机器人或不经意地填写表格! 前进,实现自动化!

翻译自: https://www.pybloggers.com/2018/02/modern-web-automation-with-python-and-selenium/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值