

By Rocky Kev

洛基·凯夫(Rocky Kev)

I wanted to learn Python for a long time, but I could never find a reason. When my company had a bunch of daily reports that needed to be generated, I realized I had an opportunity to explore Python to cut out all the repetition.

我想学习Python很长一段时间,但是我找不到原因。 当我的公司需要生成大量日常报告时,我意识到我有机会探索Python以消除所有重复。

This article is the result of a few weeks learning Python, playing around with the various libraries, and automating some of my tasks at work.


Now I want to share what Python is capable of.


Rather than give boring office related examples, let’s put them in a Game of Thrones frame!

与其给办公室带来无聊的例子 ,不如让它们放在“权力的游戏”框架中!

In this post, I will be implementing web automation with the Selenium library, web scraping with the BeautifulSoup library, and generating reports with the csv module — which is sort of simulating the whole Pandas/Data Science side of Python.

在本文中,我将使用Selenium库实现Web自动化, 使用BeautifulSoup库实现Web 抓取 ,并使用csv模块生成报告 -这类似于模拟Python的整个Pandas / Data Science方面。

And like I mentioned before —all of the examples will be using Game of Thrones.


一些快速注意事项: (Some Quick Notes:)

  1. You shouldn’t need any Python experience to do this. I’ll explain the code, and you should have enough to get going.

    您不需要任何Python经验即可做到这一点。 我将解释代码,您应该有足够的继续做下去。
  2. I’m not a super-expert at Python. This is roughly a few weeks of Python experience. It was just enough to automate my work and create these examples.

    我不是Python的超级专家。 这大约是几周的Python经验。 这足以使我的工作自动化并创建这些示例。

  3. Python is WELL DOCUMENTED. There are so many free guides to learning Python, like Automate the Boring Stuff, Python for Beginners, and the amazing Dataquest.io data science track. There’s even more links in the freeCodeCamp knowledge base.

    Python的文档很好。 有很多免费的Python学习指南,例如《 自动化无聊的东西》 ,《 Python适用于初学者 》和令人惊叹的Dataquest.io数据科学专着freeCodeCamp知识库中还有更多链接。

Python,最好的基于爬行动物的计算机语言 (Python, the best reptile-based computer language)

For those unfamiliar with programming —


Python is a general purpose programming language which is strictly typed, interpreted, and known for its easy readability with great design principles.


Python是一种通用的编程语言,经过严格的类型化,解释和定义,以其易于阅读且具有出色的设计原理而著称。 通过Freecodecamp.com指南

According to Stack Overflow’s 2018 Developer Survey, Python is the language most developers are wanting to learn (and also one of the fastest growing major programming languages).

根据Stack Overflow的2018年开发人员调查 ,Python是大多数开发人员想要学习的语言(也是增长最快的主要编程语言之一)。

Python powers site like Reddit, Instagram and Dropbox. It’s also a really readable language that has a lot of powerful libraries.

Python支持Reddit,Instagram和Dropbox等网站。 它也是一种非常易读的语言,具有许多强大的库。

Python is named after Monty Python, not the reptile. BUT — in spite of that, it’s still the most popular reptile-based programming language, beating Serpent, Gecko, Cobra and Raptor! (I had to research that joke!)

Python以Monty Python命名,而不是爬行动物。 但是,尽管如此,它仍然是最流行的基于爬行动物的编程语言,击败了Serpent,Gecko,Cobra和Raptor! (我不得不研究那个笑话!)

If you have some background in programming (say in JavaScript)—


Some things about Python:


  • Python uses indentation vs curly brackets. Check the example below:

    Python使用缩进和大括号。 检查以下示例:
  • Python uses class-based inheritance — so it’s more like C languages. where as can JavaScript can simulate classes.

    Python使用基于类的继承-因此它更像C语言。 JavaScript可以在哪里模拟类。
  • Python is also strongly typed. No mix-matching. For example, if you add a string and an integer together, it’ll start complaining.

    Python也是强类型的。 没有混搭。 例如,如果您将一个字符串和一个整数加在一起,它将开始抱怨。

让我们跳进去吧! (Let’s jump right into it!)

I’ll be breaking this into 3 pieces.


  • Game of Thrones and Python #1: Web automation

    权力游戏和Python#1 :网络自动化

  • Game of Thrones and Python #2: Web Scraping

    权力游戏和Python#2 :网络爬虫

  • Game of Thrones and Python #3: Generating reports with the CSV Module

    权力游戏和Python#3 :使用CSV模块生成报告

权力游戏和Python 1 — Web自动化 (Game of Thrones and Python 1 — Web Automation)

One of the coolest things you can do with Python is web automation.


For example — you can write a Python script that:


  1. Opens up a browser

  2. Automatically visits a specific website

  3. Logs you into that site

  4. Goes to another part of that website

  5. Finds the most recent blog post.

  6. Opens that blog post.

  7. Submits a comment that says, “Great writing! High five!”

    提交评论,说:“出色的写作! 举手击掌!”
  8. And finally logs you out of that website


It might not seem so hard to do. That takes what…. 20 seconds?

似乎并不难做到。 那需要什么...。 20秒?

But if you had to do that over and over again, it would drive you insane.


For example — what if you had a staging site that’s still in development with 100 blog posts, and you wanted to post a comment on every single page to test its functionality?


That’s 100 blog posts * 20 seconds = roughly 33 minutes

那是100篇博客文章* 20秒= 大约33分钟

And what if there are MULTIPLE testing phases, and you had to repeat the test six more times?


Other use cases for web automation include:


  • You might want to automate account creations on your site.

  • You might want to run a bot from start to finish in your online course.

  • You might want to push 100 bots to submit a form on your site with a single script.


我们将要做什么 (What we will be doing)

For this part, we’ll be automating the process to logging into all of our favorite Game of Thrones fan sites.


Don’t you hate when you have to waste time logging into westeros.org, the /r/freefolk subreddit, winteriscoming.net and all your other fan sites?

当您不得不浪费时间登录westeros.org,/ r / freefolk subreddit,winteriscoming.net和所有其他粉丝站点时,您是否讨厌吗?

With this template, you can automatically log into various websites!


Now, for Game of Thrones!


代码 (The Code)

You will need to install Python 3, Selenium, and the Firefox webdrivers to get started. If you want to follow along, check out my tutorial on How to automate form submissions with Python.

您需要安装Python 3,Selenium和Firefox Webdrivers才能开始。 如果您想继续学习,请查看我的教程“ 如何使用Python自动执行表单提交”

This one might get complicated. So I highly recommend sitting back and enjoying the ride.

这可能会变得复杂。 因此,我强烈建议您高枕无忧。

## Game of Thrones easy login script## ## Description: This code logs into all of your fan sites automaticallyfrom selenium import webdriverfrom selenium.webdriver.common.keys import Keysfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.common.exceptions import TimeoutExceptionimport timedriver = webdriver.Firefox()driver.implicitly_wait(5)    ## implicity_wait makes the bot wait 5 seconds before every action    ## so the site content can load up# Define the functionsdef login_to_westeros (username, userpass):    ## Open the login page    driver.get('https://asoiaf.westeros.org/index.php?/login/')        ## Log the details    print(username + " is logging into westeros.")        ## Find the fields and log into the account.     textfield_username = driver.find_element_by_id('auth')    textfield_username.clear()    textfield_username.send_keys(username)    textfield_email = driver.find_element_by_id('password')    textfield_email.clear()    textfield_email.send_keys(userpass)    submit_button = driver.find_element_by_id('elSignIn_submit')    submit_button.click()    ## Log the details    print(username + " is logged in! -> westeros")		def login_to_reddit_freefolk (username, userpass):    ## Open the login page    driver.get('https://www.reddit.com/login/?dest=https%3A%2F%2Fwww.reddit.com%2Fr%2Ffreefolk')        ## Log the details    print(username + " is logging into /r/freefolk.")        ## Find the fields and log into the account.     textfield_username = driver.find_element_by_id('loginUsername')    textfield_username.clear()    textfield_username.send_keys(username)
textfield_email = driver.find_element_by_id('loginPassword')    textfield_email.clear()    textfield_email.send_keys(userpass)    submit_button = driver.find_element_by_class_name('AnimatedForm__submitButton')    submit_button.click()    ## Log the details    print(username + " is logged in! -> /r/freefolk.")    ## Define the user and email combo. login_to_westeros("gameofthronesfan86", PASSWORDHERE)time.sleep(2)driver.execute_script("window.open('');")Window_List = driver.window_handlesdriver.switch_to_window(Window_List[-1])login_to_reddit_freefolk("MyManMance", PASSWORDHERE)time.sleep(2)driver.execute_script("window.open('');")Window_List = driver.window_handlesdriver.switch_to_window(Window_List[-1])## wait for 2 secondstime.sleep(2)print("task complete")
分解代码 (Breaking the code down)

To start, I’m importing the Selenium library to help with the heavy lifting.


I also imported the time library, so after each action, it will wait x seconds. Adding a wait allows the page to load.

我还导入了时间库,因此在执行每个操作后,它将等待x秒。 添加等待允许页面加载。

from selenium import webdriverfrom selenium.webdriver.common.keys import Keysfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.common.exceptions import TimeoutExceptionimport time
什么是Selenium? (What is Selenium?)

Selenium is the Python library we use for web automation. Selenium has developed an API so third-party authors can develop webdrivers to the communication to browsers. That way, the Selenium team can focus on their codebase, while another team can focus on the middleware.

Selenium是我们用于网络自动化的Python库。 Selenium开发了一个API,因此第三方作者可以开发用于与浏览器通信的Web驱动程序。 这样,Selenium团队可以专注于他们的代码库,而另一个团队可以专注于中间件。

For example:


driver = webdriver.Firefox()



In the code above, I’m asking Selenium to do things like “Set Firefox up as the browser of choice”, and “pass this link to Firefox”, and finally “Close Firefox”. I used the geckodriver to do that.

在上面的代码中,我要Selenium做诸如“ 将Firefox设置为选择的浏览器 ”,“ 将此链接传递给Firefox ”以及最后“ 关闭Firefox ”之类的操作。 我用了geckodriver。

登录网站 (Logging into sites)

To make it easier to read, I wrote a separate function to log into each site, to show the pattern that we are making.


def login_to_westeros (username, userpass):    ## Log in    driver.get('https://asoiaf.westeros.org/index.php?/login/')        ## Log the details    print(username + " is logging into westeros.")        ## 2)
textfield_email = driver.find_element_by_id('password')    textfield_email.clear()    textfield_email.send_keys(userpass)    submit_button = driver.find_element_by_id('elSignIn_submit')    submit_button.click()    ## Log the details    print(username + " is logged in! -> westeros")

If we break that down even more — each function has the following elements.


I’m telling Python to:


  1. Visit a specific page.


    Visit a specific page.driver.get('https://asoiaf.westeros.org/index.php?/login/')

    访问特定页面。 driver.get('https://asoiaf.westeros.org/index.php?/login/')

2. Look for the login box * Clear the text if there is any * Submit my variable


textfield_username = driver.find_element_by_id('auth')    textfield_username.clear()    textfield_username.send_keys(username)

3. Look for the password box * Clear the text if there is any * Submit my variable


textfield_email = driver.find_element_by_id('password')    textfield_email.clear()    textfield_email.send_keys(userpass)

4. Look for the submit button, and click it


submit_button = driver.find_element_by_id('elSignIn_submit')    submit_button.click()

As a note: each website has different ways to find the username/password and submit buttons. You’ll have to do a bit of searching for that.

注意:每个网站都有不同的方式来查找用户名/密码和提交按钮。 您将需要进行一些搜索。

如何找到任何网站的登录框和密码框 (How to find the login box and password box for any website)

The Selenium Library has a bunch of handy ways to find elements on a webpage. Here are some of the ones I like to use.

Selenium库提供了许多方便的方法来查找网页上的元素。 这是我喜欢使用的一些。

  • find_element_by_id

  • find_element_by_name

  • find_element_by_xpath

  • find_element_by_class_name


For the whole list, visit the Selenium Python documentation for locating elements.

有关整个列表,请访问Selenium Python文档以查找元素

To use asoiaf.westeros.com as an example, when I inspect the elements — they all have IDs… which is GREAT! That makes my life easier.

asoiaf.westeros.com为例 ,当我检查元素时-它们都具有ID… 太好了 ! 这使我的生活更轻松。

运行代码 (Running the code)

享受旅程 (Enjoying the ride)

With web automation, you’re playing a game of ‘how can I get Selenium to find the element’. Once you find it, you can then manipulate it.

使用网络自动化,您正在玩​​“我如何让Selenium查找元素”的游戏。 一旦找到它,就可以对其进行操作。

权力游戏和Python 2-网页搜罗 (Game of Thrones and Python 2 — Web Scraping)

In this piece, we will be exploring web-scrapping.


The big picture process is:


  1. We’ll have Python visit a webpage.

  2. We’ll then parse that webpage with BeautifulSoup.

  3. You then set up the code to grab specific data.


For example: You might want to grab all the h1 tags. Or all the links. Or in our case, all of the images on a page.

例如:您可能想获取所有的h1标签。 或所有链接。 或者在我们的例子中,页面上的所有图像。

Some other use cases for Web Scraping:


  • You can grab all the links on a web page.

  • You can grab all the post titles within a forum

  • You can use it to grab the daily NASDAQ Value without ever visiting the site.

  • You can use it to download all of the links within a website that doesn’t have a ‘Download All’.


In short, web scraping allows you to automatically grab web content through Python.


Overall, a very simple process. Except when it isn’t!

总体而言,这是一个非常简单的过程。 除非不是这样!

Web抓取图像的挑战 (The challenge of Web Scraping for images)

My goal was to turn my knowledge of web scraping content to grab images.


While web scraping for links, body text and headers is very straightforward,web scraping for images is significantly more complex. Let me explain.

虽然对链接,正文和标题的Web抓取非常简单,但对图像的Web抓取却要复杂得多。 让我解释。

As a web developer, hosting MULTIPLE full-sized images on a single webpage will slow the whole page down. Instead, use thumbnails and then only load the full-sized image when the thumbnail is clicked on.

作为Web开发人员,在单个网页上托管多个全尺寸图片会减慢整个页面的速度。 而是使用缩略图,然后仅在单击缩略图时才加载完整尺寸的图像。

For example: Imagine if we had twenty 1 megabyte images on our web page. Upon landing, a visitor would have to download 20 megabytes worth of images! The more common method is to make twenty 10kb thumbnail images. Now, your payload is only 200kb, or about 1/100 of the size!

例如:假设我们的网页上有二十张1兆字节的图像。 登陆后,访客将必须下载价值20 MB的图像! 更常见的方法是制作二十张10kb的缩略图。 现在,您的有效负载只有200kb,约为大小的1/100!

So what does this have to do with web scraping images and this tutorial?


It means that it makes it pretty difficult to write a generic block of code that always works for every website. Websites implement all different ways to turn a thumbnail to a full-size image, which makes it a challenge to create a ‘one-size fits all’ model.

这意味着很难编写始终适用于每个网站的通用代码块 。 网站采用各种不同的方式将缩略图转换为全尺寸图像,这使得创建“全尺寸适合”的模型成为一项挑战。

I’ll still teach what I learned. You’ll still gain a lot of skills from it. Just be aware that trying that code on other sites will require major modifications. Hurray for Zone of Proximal Development.

我仍然会教我学到的东西。 您仍将从中获得很多技能。 请注意,在其他站点上尝试该代码将需要进行重大修改 。 万岁为近端发展区。

Python和权力的游戏 (Python and Game of Thrones)

The goal of this tutorial is that we’ll be gathering images of our favorite actors! Which will allow us to do weird things like make a Teenage Crush Actor Collage that we can hang in our bedroom (like so).

本教程的目的是我们将收集我们最喜欢的演员的图像! 这将使我们能够做一些奇怪的事情,例如制作我们可以挂在我们卧室里的“青春美眉演员拼贴画”。

In order to gather those images, we’ll be using Python to do some web scraping. We’ll be using the BeautifulSoup library to visit a web page and grab all the image tags from it.

为了收集这些图像,我们将使用Python进行一些Web抓取。 我们将使用BeautifulSoup库访问网页并从中获取所有图像标签。

NOTE: In many website terms and conditions, they prohibit any web scraping of their data. Some develop APIs to allow you to tap into their data. Others do not. Additionally, try to be mindful that you are taking up their resources. So look to doing one request at a time rather than opening lots of connections in parallel and grinding their site to a halt.

注意:在许多网站条款和条件中,它们禁止任何网络刮取其数据。 一些开发API使您可以利用它们的数据。 其他人没有。 此外,请注意您正在占用他们的资源。 因此,您希望一次执行一个请求,而不是并行打开大量连接并磨碎其站点以使其停止。

代码 (The Code)

# Import the libraries neededimport requestsimport timefrom bs4 import BeautifulSoup# The URL to scrapeurl = 'https://www.popsugar.com/celebrity/Kit-Harington-Rose-Leslie-Cutest-Pictures-42389549?stream_view=1#photo-42389576'#url = 'https://www.bing.com/images/search?q=jon+snow&FORM=HDRSC2'# Connectingresponse = requests.get(url)# Grab the HTML and using Beautifulsoup = BeautifulSoup (response.text, 'html.parser')#A loop code to run through each link, and download itfor i in range(len(soup.findAll('img'))):    tag = soup.findAll('img')[i]    link = tag['src']    #skip it if it doesn't start with http    if "http" in full_link:         print("grabbed url: " + link)        filename = str(i) + '.jpg'        print("Download: " + filename)        r = requests.get(link)        open(filename, 'wb').write(r.content)    else:        print("grabbed url: " + link)        print("skip")        time.sleep(1)
让Python访问网页 (Having Python Visit the Webpage)

We start by importing the libraries needed, and then storing the webpage link into a variable.


  • The Requests library is used to do all sorts of HTTP requests


  • The Time library is used to put a 1 second wait after each request. If we didn’t include that, the whole loop will fire off as fast as possible, which isn’t very friendly to the sites we are scraping from.

    时间库用于在每个请求之后放置1秒的等待时间。 如果我们不包括在内,整个循环将尽快启动,这对于我们要从中进行抓取的网站不是很友好。

  • The BeautifulSoup Library is used to make exploring the DOM Tree easier.


使用BeautifulSoup解析该网页 (Parse that webpage with BeautifulSoup)

Next, we push our URL into BeautifulSoup.


寻找内容 (Finding the content)

Finally, we use a loop to grab the content.


It starts with a FOR loop. BeautifulSoup does some cool filtering, where my code asks BeautifulSoup find all the ‘img’ tags, and store it in a temporary array. Then, the len function asks for the length of the array.

它以FOR循环开始。 BeautifulSoup做一些很酷的过滤,我的代码要求BeautifulSoup找到所有的'img'标签,并将其存储在一个临时数组中。 然后, len函数要求输入数组的长度。

#A loop code to run through each link, and download itfor i in range(len(soup.findAll('img'))):

So in human words, if the array held 51 items, the code will look likeFor i in range(50):

因此,用人类的话来说,如果数组包含51个项目,则代码将类似于range(50)中的For i:

Next, we’ll return back to our soup object, and do the real filtering.


tag = soup.findAll('img')[i]   link = tag['src']

Remember that we are in a For loop, so [i] represents a number.


So we are telling BeautifulSoup to findAll ‘img’ tags, store it in a temp array, and reference a specific index number based on where we are in the loop.


So instead of calling an array directly like allOfTheImages[10], we’re using soup.findAll(‘img’)[10], and then passing it to the tag variable.

因此,我们没有像allOfTheImages [10]那样直接调用数组,而是使用了soup.findAll('img')[10],然后将其传递给tag变量。

The data in the tag variable will look something like:


<img src="smiley.gif" alt="Smiley face" height="42" width="42">

Which is why the next step is pulling out the ‘src’.

这就是为什么下一步要推出“ src”的原因。

下载内容 (Downloading the Content)

Finally — it’s the fun part!


We go to the final part of the loop, with downloading the content.


There’s a few odd design elements here that I want to point out.


  1. The IF statement is actually a hack I made for other sites I was testing. There were times when I was grabbing images that was the part of the root site (like the favicon or the social media icons) that I didn’t want. So using the IF statement allowed me to ignore it.

    IF语句实际上是我为我正在测试的其他站点所做的黑客攻击。 有时候,当我获取我不想要的作为根网站一部分的图像(例如收藏夹图标或社交媒体图标)时,会出现这种情况。 因此,使用IF语句使我可以忽略它。
  2. I also forced all the images to be .jpg. I could have written another chunk of IF statements to check the datatype, and then append the correct filetype. But that was adding a significant chunk of code that made this tutorial longer.

    我还强制所有图像均为.jpg。 我本可以编写另一段IF语句来检查数据类型,然后附加正确的文件类型。 但这增加了大量代码,使本教程更长。
  3. I also added all the print commands. If you wanted to grab all the links of a webpage, or specific content — you can stop right here! You did it!

    我还添加了所有打印命令。 如果您想获取网页的所有链接或特定内容,可以在这里停下来! 你做到了!

I also want to point out is the requests.get(link) and the open(filename, ‘wb’).write(r.content) code.


r = requests.get(link)open(filename, 'wb').write(r.content)

How this works:


  1. Requests gets the link.


2. Open is a default python function that opens or creates a file, gives it writing & binary mode access (since images are are just 1s and 0s), and writes the content of the link into that file.

2. Open是默认的python函数,用于打开或创建文件,为其提供写和二进制模式访问权限(因为图像仅为1和0),并将链接的内容写入该文件。

#skip it if it doesn't start with http    
if "http" in full_link:         
print("grabbed url: " + link)        filename = str(i) + '.jpg'        print("Download: " + filename)        r = requests.get(link)        open(filename, 'wb').write(r.content)    else:        print("grabbed url: " + link)        print("skip")        time.sleep(1)

Web Scraping has a lot of useful features.


This code won’t work right out of the box for most sites with images, but it can serve as a foundation to how to grab images on different sites.


权力游戏和Python 3-生成报告和数据 (Game of Thrones and Python 3 — Generating reports and data)

Gathering data is easy. Interpreting the data is difficult. Which is why there’s a huge surge of demand for data scientists who can make sense of this data. And data scientists use languages like R and Python to interpret it.

收集数据很容易。 解释数据很困难。 这就是为什么对可以理解这些数据的数据科学家的需求激增的原因。 数据科学家使用R和Python等语言来解释它。

In this tutorial, we’ll be using the csv module, which will be enough to generate a report. If we were working with a huge dataset, one that’s like 50,000 rows or bigger, we’d have to tap into the Pandas library.

在本教程中,我们将使用csv模块,该模块足以生成报告。 如果我们正在处理一个巨大的数据集,例如一个50,000行或更大的数据集,则必须使用Pandas库。

What we will be doing is downloading a CSV, having Python interpret the data, send a query based on what kind of question we want answered, and then have the answer print out to us.


Python VS基本电子表格功能 (Python VS basic spreadsheet functions)

You might be wondering:


“Why should I use Python when I can easily just use spreadsheet functions like =SUM or =COUNT, or filter out the rows I don’t need manually?”

“当我可以轻松地仅使用= SUM或= COUNT之类的电子表格函数,或者过滤掉不需要的行时,为什么要使用Python?”

Like for all the other automation tricks in Part 1 and 2, you can definitely do this manually.


But imagine if you had to generate a new report every day.


For example: I build online courses. And we want a daily report of every student’s progress. How many students started today? How many students are active this week? How many students made it to Module 2? How many students submitted their Module 3 homework? How many students clicked on the completion button on mobile devices?

例如:我建立在线课程。 我们希望每天报告每个学生的进度。 今天有多少学生入学? 本周有多少学生活跃? 有多少学生参加了模块2? 多少学生提交了单元3作业? 多少学生点击了移动设备上的完成按钮?

I can either spend 15 minutes sorting through the data to generate a report for my team. OR write Python code that does it daily.

我可以花15分钟来整理数据以为团队生成报告。 或编写每天执行的Python代码。

Other use cases for using code instead of default spreadsheet functions:


  • You might be working with a huge set of data (huge like 50,000 rows and 20 columns)

  • You require multiple slices of filters and segmentation to get your answers.

  • You need to run the same query on a dataset that changes repeatedly


使用权力游戏生成报告 (Generating Reports with Game of Thrones)

Every year, Winteriscoming.net, a Game of Thrones news site, has their annual March Madness. Visitors would vote for their favorite characters, and winners move up the bracket and compete against another person. After 6 rounds of votes, a winner is declared.

每年,《权力的游戏》新闻网站Winteriscoming.net都会举办年度疯狂游行 。 访客会投票选出他们最喜欢的角色,而获胜者则排名上升并与另一个人竞争。 经过6轮投票后,宣布获胜者。

Since 2019’s votes are still happening, I grabbed all 6 rounds of 2018’s data and compiled them into a CSV file. To see how the poll looked like on winteriscoming.net, click here.

由于2019年的投票仍在进行中,因此我获取了2018年的所有6轮数据并将其编译为CSV文件。 要查看winteriscoming.net上的民意调查, 请单击此处

I’ve also added some additional background data (like where they are from), to make the reporting a bit more interesting.


问问题 (Asking Questions)

In order to generate a report, we have to ask some questions.


By definition: A report’s primary duty is to ANSWER questions.

按照定义 :报告的主要职责是回答问题。

So let’s make them up right now.


Based on this dataset… here’s some questions.


  1. Who won the popularity vote?

  2. Who won based on averages?

  3. Who is the most popular non-Westeros person? (characters not born in Westeros)

    谁是最受欢迎的非维斯特洛人? (不是在维斯特洛出生的字符)

在回答问题之前-让我们设置我们的Python代码 (Before answering questions — let’s set up our Python code)

To make it easier, I wrote the all the code, including revisions — in my new favorite online IDE, Repl.it.

为了简化起见,我在最喜欢的新在线IDE Repl.it中编写了所有代码,包括修订版。

import csv# 
Import the dataf_csv = open('winter-is-coming-2018.csv')headers = next(f_csv) f_reader = csv.reader(f_csv)file_data = list(f_reader)

# Make all blank cells into zeroes# https://stackoverflow.com/questions/2862709/replacing-empty-csv-column-values-with-a-zero

for row in file_data:  for i, x in enumerate(row):    if len(x)< 1:      x = row[i] = 0

Here’s my process with the code.


  1. I imported the csv module.


2. I imported the csv file, and turned it into a list type called file_data.


  • The way Python reads your file is by first passing the data to an object.

  • I removed the header, since it’ll fudge the data.

  • I then pass the object to a reader, and finally a list.

  • Note: I just realized I did it via the Python 2 way. There’s a cleaner way to do it in Python 3. Oh well. Still works.

    注意:我刚刚意识到我是通过Python 2方法完成的。 在Python 3中 有一种 更清洁的方法 那好吧。 仍然有效。

3. In order to sum up any totals, I made all blank cells become 0.


  • This was one of those moments where found a Stack Overflow solution that was better than my original version.

    这是当时发现Stack Overflow解决方案比我的原始版本更好的时刻之一。

With this set up, we can now loop through the list of data, and answer questions!


问题1 –谁赢得了人气投票? (Question 1 — Who won the popularity vote?)

The Spreadsheet method:


The easiest way would be to add up each cell, using a formula. Using row 2 as an example, in a blank column, you can write the formula:

最简单的方法是使用公式将每个单元相加。 以第2行为例,在空白列中,可以编写公式:


You can then drag that formula for the other rows.


Then, sort it by total. And you have a winner!

然后,按总数排序。 而且你有赢家!

## Include the code from above
# Push the data to a dictionarytotal_score = {}
# Pass each character and their final score into total_score dictionaryfor row in file_data:  total = (int(row[4]) +           int(row[5]) +           int(row[6]) +           int(row[7]) +           int(row[8]) +           int(row[9]) )  total_score[row[0]] = total# Dictionaries aren't sortable by default, we'll have to borrow from these two classes.
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-valuefrom operator import itemgetterfrom collections import OrderedDictsorted_score = OrderedDict(sorted(total_score.items(), key=itemgetter(1) ,reverse=True))
# We get the name of the winner and their scorewinner = list(sorted_score)[0] 
#jon snowwinner_score = sorted_score[winner] #scoreprint(winner + " with " + str(winner_score))
## RESULT => Jon Snow with 12959

The steps I took are:


  1. The dataset is just one big list. By using a for loop, you can then access each row.

    数据集只是一个大列表。 通过使用for循环,然后可以访问每一行。
  2. Within that for loop, I added each cell. (emulating the whole “=sum(E:J)” formula)

    在该for循环中,我添加了每个单元格。 (模拟整个“ = sum(E:J)”公式)
  3. Since dictionaries aren’t exactly sortable, I had to import two classes to help me sort the dictionary by their values, from high to low.

  4. Finally, I passed the winner, and the winner’s value as text.


To help understand that loop, I drew a diagram.


Overall, this process is a bit longer compared to the spreadsheet Method. But wait, it gets easier!

总体而言,与电子表格方法相比,此过程要更长一些。 但是,等等,它变得更容易了!

问题2 –谁以平均数获胜? (Question 2 — Who won based on averages?)

You might have noticed that whoever proceeded farther in the rankings would obviously get more votes.


For example: If Jon Snow got 500 points in Round One and 1000 points in Round Two, he already beats The Mountain who only had 1000 points and never made it past his bracket.

例如:如果乔恩·雪诺( Jon Snow)在第一回合中获得500分,在第二回合中获得1000分,那么他已经击败了仅获得1000分并且从未超越其支架的The Mountain

So the next best thing is to sum the total, and then divide it based on how many rounds they participated in.


The Spreadsheet Method:


This is easy. In Column B is how many rounds they participated in. You would divide the rounds by the sum, and presto!

这很简单。 在B列中,他们参加了多少回合。您可以将各回合除以总和,然后确定!

1# Pass each character and their final score into total_score dictionaryfor row in file_data:  total = (int(row[4]) +           int(row[5]) +           int(row[6]) +           int(row[7]) +           int(row[8]) +           int(row[9]) )  total_score[row[0]] = total
Pass each character and their final score into total_score dictionaryfor row in file_data:  total = (int(row[4]) +           int(row[5]) +           int(row[6]) +           int(row[7]) +           int(row[8]) +           int(row[9]) )  
# NEW LINE - divide by how many rounds  new_total = total / int(row[2])  total_score[row[0]] = new_total
# RESULT => Davos Seaworth with 2247.6666666666665

Noticed the change? I just added one additional line.

注意到变化了吗? 我刚刚增加了一行。

That’s all it took to answer this question! NEXT!

这就是回答这个问题所需要的全部! 下一个!

With first two examples, it’s pretty easy to calculate the total with the default spreadsheet functions. For this question, things are a bit more complicated.

在前两个示例中,使用默认电子表格功能很容易计算出总数。 对于这个问题,事情要复杂一些。

The Spreadsheet Method:


  1. Assuming you already have the sum

  2. You now have to filter it based on if they are Westeros/Other

    现在,您必须根据它们是否为Westeros / Other对其进行过滤
  3. Then sort by the sum

1# Pass each character and their final score into total_score dictionaryfor row in file_data:  total = (int(row[4]) +           int(row[5]) +           int(row[6]) +           int(row[7]) +           int(row[8]) +           int(row[9]) )  

# NEW LINE - divide by how many rounds  new_total = total / int(row[2])  total_score[row[0]] = new_total## NEW CODE# Pass each character and their final score into total_score dictionaryfor row in file_data:  

# Add IF-THEN statement  if (row[3] == 'other'):    total = (int(row[4]) +             int(row[5]) +             int(row[6]) +             int(row[7]) +             int(row[8]) +             int(row[9]) )  else:    total = 0  total_score[row[0]] = total

# RESULT => Missandei with 4811

In Question 2, I added one line of code to answer that new question.


In Question 3, I added a IF-ELSE statement. If they are non-Westeros, then count their score. Else, give them a score of 0.

在问题3中,我添加了IF-ELSE语句。 如果他们不是维斯特洛人,请计算他们的分数。 否则,给他们打0分。

对此进行审查: (Reviewing this:)

While the spreadsheet Method doesn’t seem like a lot of steps, it sure is a lot more clicks. The Python method took a lot longer to set up, but each additional query involved changing a few lines of code.

尽管电子表格方法似乎没有很多步骤,但可以肯定会有更多点击。 Python方法的建立花费了更长的时间,但是每个额外的查询都涉及更改几行代码。

Imagine if the stakeholder asked a dozen more questions.


For example:


  1. How many points did characters whose names start with L have?

  2. Or how many points did everyone in round 3 get who lived in Westeros?

  3. Or if it was 640 GoT characters instead of just 64?


But also imagine this — you’re given a dataset that’s roughly 50 megabytes (Our Game of Thrones csv file was barely 50 kilobytes — roughly 1/1000 the size). A 50mb file that large would probably take Excel a few minutes to load. Additionally, it’s not unusual for Data Scientists to use datasets that are in the 10 gigabyte range!

但也可以想象一下-您获得的数据集约为50兆字节(我们的《权力的游戏》 csv文件只有50兆字节-约为大小的1000)。 大小为50mb的文件可能需要Excel加载几分钟。 此外,对于数据科学家来说,使用10 GB范围内的数据集也很常见!

Overall, as the data set scales, it’ll take longer and longer to process. And that’s where the power of Python comes in.

总体而言,随着数据集的扩展,处理时间将越来越长。 这就是Python强大功能的所在。

结论 (Conclusion)

In Part 1, I covered web automation with the Selenium library. In Part 2, I covered web scraping with the BeautifulSoup library. And in Part 3, I covered generating reports with the csv module.

在第1部分中,我用Selenium库介绍了Web自动化。 在第2部分中,我用BeautifulSoup库介绍了Web抓取。 在第3部分中,我介绍了使用csv模块生成报告。

While I covered them in pieces — there’s also a synergy between them. Imagine if you had a project where you had to figure out who dies next in Game of Thrones based on the comments by the actors on the show. You might start with web scraping all of the actors’ names off of IMDB. You might use Selenium to automatically log into various social media platforms and search for their social media name. You might then compile all the data, and interpret it as a csv or, if it’s really huge, using the Pandas library.

当我把它们分成几部分时,它们之间也有协同作用。 想象一下,如果您有一个项目,必须根据演出者的评论来确定谁在《权力的游戏》中死了。 您可能首先从Web上将所有参与者的姓名从IMDB中刮下来。 您可以使用Selenium自动登录各种社交媒体平台并搜索其社交媒体名称。 然后,您可以编译所有数据,并将其解释为csv,或者如果它真的很大,则使用Pandas库。

We didn’t even get into Machine Learning, AI, Web Development, or the dozens of other things people use Python for.


Let this be a stepping stone into your Python journey!


? Absolutely HUGE shout out to mJordan for proofing my work at the Puppies and Portfolios meetup. She is one of the most talented CSS developers I have ever met.

绝对大声喊叫 mJordan,以证明我在“ 小狗和公文包”聚会上的工作证明。 她是我见过的最有才华CSS开发人员之一。

? If you like nerding out about course building, online education and the future of education — reach out to me on my Linkedin or Twitter.

如果您想了解课程建设,在线教育和教育的未来,请通过 Linkedin Twitter 与我联系

翻译自: https://www.freecodecamp.org/news/how-i-used-python-to-analyze-game-of-thrones-503a96028ce6/


