google python class exercies

学习python也有一段时间了,最近想找点程序练练手,想到之前用来学习的google python class,里边的exercises并没有做。于是想把它找出来做一遍,找了好久好像网上并没有,于是续了一个月vpn,现在把习题分享出来。

Basic Python Exercises

There are 3 exercisesthat go with the first sections of Google's Python class. They are located inthe "basic" directory within the google-python-exercises directory.Download the google-python-exercises.zip if you have not already (see theSet-Up page for details).

·      string1.py --complete the string functions in string1.py, based on the material in the PythonStrings section (additional exercises available instring2.py)

·      list1.py -- complete the list functions inlist1.py, based on the material in the PythonLists and PythonSortingsections (additional exercises available in list2.py)

·      wordcount.py --this larger, summary exercise in wordcount.py combines all the basic Pythonmaterial in the above sections plus Python Dicts and Files (asecond exercise is available in mimic.py)

Withall the exercises, you can take a look at our solution code inside the solutionsubdirectory.

 

Baby Names Python Exercise

The Social Securityadministration has this neat data by year of what names are most popular forbabies born that year in the USA (see social security baby names).

The files for thisexercise are in the "babynames" directory insidegoogle-python-exercises (download the google-python-exercises.zip if you have not already, see Set Up for details). Add your code in babynames.py. Thefiles baby1990.html baby1992.html ... contain raw html, similar to what you getvisiting the above social security site. Take a look at the html and thinkabout how you might scrape the data out of it.

Part A

In the babynames.py file,implement the extract_names(filename) function which takes the filename of ababy1990.html file and returns the data from the file as a single list -- theyear string at the start of the list followed by the name-rank strings inalphabetical order. ['2006', 'Aaliyah 91', 'Abagail 895', 'Aaron 57', ...].Modify main() so it calls your extract_names() function and prints what itreturns (main already has the code for the command line argument parsing). Ifyou get stuck working out the regular expressions for the year and each name,solution regular expression patterns are shown at the end of this document.Note that for parsing webpages in general, regular expressions don't do a goodjob, but these webpages have a simple and consistent format.

Rather than treat the boyand girl names separately, we'll just lump them all together. In some years, aname appears more than once in the html, but we'll just use one number per name.Optional: make the algorithm smart about this case and choose whichever numberis smaller.

Build the program as aseries of small milestones, getting each step to run/print something beforetrying the next step. This is the pattern used by experienced programmers --build a series of incremental milestones, each with some output to check,rather than building the whole program in one huge step.

Printing the data youhave at the end of one milestone helps you think about how to re-structure thatdata for the next milestone. Python is well suited to this style of incrementaldevelopment. For example, first get it to the point where it extracts andprints the year and calls sys.exit(0). Here are some suggested milestones:

·      Extract all the text from the file and print it

·      Find and extract the year and print it

·      Extract the names and rank numbers and print them

·      Get the names data into a dict and print it

·      Build the [year, 'name rank', ... ] list and print it

·      Fix main() to use the ExtractNames list

Earlier we have hadfunctions just print to standard out. It's more re-usable to have the function*return* the extracted data, so then the caller has the choice to print it ordo something else with it. (You can still print directly from inside yourfunctions for your little experiments during development.)

Have main() callextract_names() for each command line arg and print a text summary. To make thelist into a reasonable looking summary text, here's a clever use of join: text = '\n'.join(mylist) + '\n'

The summary text shouldlook like this for each file:

2006 Aaliyah91 Aaron 57 Abagail 895 Abbey 695 Abbie 650 ...

Part B

Suppose instead ofprinting the text to standard out, we want to write files containing the text.If the flag --summaryfile is present, do the following: for each input file'foo.html', instead of printing to standard output, write a new file'foo.html.summary' that contains the summary text for that file.

Once the --summaryfilefeature is working, run the program on all the fiels using * like this:"./babynames.py --summaryfile baby*.html". This generates all thesummaries in one step. (The standard behavior of the shell is that it expandsthe "baby*.html" pattern into the list of matching filenames, andthen the shell runs babynames.py, passing in all those filenames in thesys.argv list.)

With the data organizedinto summary files, you can see patterns over time with shell commands, likethis:

$ grep'Trinity ' *.summary $ grep 'Nick ' *.summary $ grep 'Miguel '*.summary $ grep 'Emily ' *.summary

Regular expression hints-- year: r'Popularity\sin\s(\d\d\d\d)' names:r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>'

Copy Special Python Exercise

The Copy Special exercisegoes with the file-system and external commands material in the Python Utilities section. This exercise is in the"copyspecial" directory within google-python-exercises(download google-python-exercises.zip if you have not already, see Set Up for details). Add your code in copyspecial.py.

The copyspecial.pyprogram takes one or more directories as its arguments. We'll say that a"special" file is one where the name contains the pattern __w__somewhere, where the w is one or more word chars. The provided main() includescode to parse the command line arguments, but the rest is up to you. Writefunctions to implement the features below and modify main() to call yourfunctions.

Suggested functions foryour solution(details below):

·      get_special_paths(dir) -- returns a list of the absolutepaths of the special files in the given directory

·      copy_to(paths, dir) given a list of paths, copies thosefiles into the given directory

·      zip_to(paths, zippath) given a list of paths, zip thosefiles up into the given zipfile

Part A (manipulating file paths)

Gather a list of theabsolute paths of the special files in all the directories. In the simplestcase, just print that list (here the "." after the command is asingle argument indicating the current directory). Print one absolute path perline.

$ ./copyspecial.py. /Users/nparlante/pycourse/day2/xyz__hello__.txt/Users/nparlante/pycourse/day2/zz__something__.jpg

We'll assume that namesare not repeated across the directories (optional: check that assumption anderror out if it's violated).

Part B (file copying)

If the "--todirdir" option is present at the start of the command line, do not printanything and instead copy the files to the given directory, creating it ifnecessary. Use the python module "shutil" for file copying.

$ ./copyspecial.py--todir /tmp/fooby . $ ls /tmp/fooby xyz__hello__.txt        zz__something__.jpg

Part C (calling an external program)

If the "--tozipzipfile" option is present at the start of the command line, run thiscommand: "zip -j zipfile <list all the files>". This willcreate a zipfile containing the files. Just for fun/reassurance, also print thecommand line you are going to do first (as shown in lecture). (Windows note:windows does not come with a program to produce standard .zip archives bydefault, but you can get download the free and open zip program from www.info-zip.org.)

$ ./copyspecial.py--tozip tmp.zip .

Command I'mgoing to do:zip -j tmp.zip /Users/nparlante/pycourse/day2/xyz__hello__.txt/Users/nparlante/pycourse/day2/zz__something__.jpg

 

If the child processexits with an error code, exit with an error code and print the command'soutput. Test this by trying to write a zip file to a directory that does notexist.

$ ./copyspecial.py--tozip /no/way.zip .

Command I'mgoing to do:zip -j /no/way.zip /Users/nparlante/pycourse/day2/xyz__hello__.txt/Users/nparlante/pycourse/day2/zz__something__.jpg

 zip I/O error: No such file or directory  zip error: Could not create output file(/no/way.zip)

Log Puzzle Python Exercise

For the Log Puzzleexercise, you'll use Python code to solve two puzzles. This exercise uses theurllib module, as shown in the Python Utilities section. The files for this exercise are in the"logpuzzle" directory inside google-python-exercises (downloadthe google-python-exercises.zip if you have not already, see Set Up for details). Add your code to the"logpuzzle.py" file.

An image of an animal hasbeen broken it into many narrow vertical stripe images. The stripe images areon the internet somewhere, each with its own url. The urls are hidden in a webserver log file. Your mission is to find the urls and download all imagestripes to re-create the original image.

The slice urls are hiddeninside apache log files (the open source apache webserver is the most widely used server on the internet). Each log file is fromsome server, and the desired slice urls are hidden within the logs. The logfile encodes what server it comes from like this: the log fileanimal_code.google.com is from the code.google.com server (formally, we'll saythat the server name is whatever follows the first underbar). Theanimial_code.google.com log file contains the data for the "animal"puzzle image. Although the data in the log files has the syntax of a realapache web server, the data beyond what's needed for the puzzle is randomizeddata from a real log file.

Here is what a singleline from the log file looks like (this really is what apache log files looklike):

10.254.254.28- - [06/Aug/2007:00:14:08 -0700] "GET /foo/talks/ HTTP/1.1" 200 5910"-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US;rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"

The first few numbers arethe address of the requesting browser. The most interesting part is the"GET path HTTP" showing the path of a web requestreceived by the server. The path itself never contain spaces, and is separatedfrom the GET and HTTP by spaces (regex suggestion: \S (upper case S) matchesany non-space char). Find the lines in the log where the string"puzzle" appears inside the path, ignoring the many other lines inthe log.

Part A - Log File To Urls

Complete theread_urls(filename) function that extracts the puzzle urls from inside alogfile. Find all the "puzzle" path urls in the logfile. Combine thepath from each url with the server name from the filename to form a full url,e.g. "http://www.example.com/path/puzzle/from/inside/file". Screenout urls that appear more than once. The read_urls() function should return thelist of full urls, sorted into alphabetical order and without duplicates.Taking the urls in alphabetical order will yield the image slices in thecorrect left-to-right order to re-create the original animal image. In thesimplest case, main() should just print the urls, one per line.

$ ./logpuzzle.pyanimal_code.google.comhttp://code.google.com/something/puzzle-animal-baaa.jpghttp://code.google.com/something/puzzle-animal-baab.jpg ...

Part B - Download Images Puzzle

Complete thedownload_images() function which takes a sorted list of urls and a directory.Download the image from each url into the given directory, creating thedirectory first if necessary (see the "os" module to create adirectory, and "urllib.urlretrieve()" for downloading a url). Namethe local image files with a simple scheme like "img0","img1", "img2", and so on. You may wish to print a little"Retrieving..." status output line while downloading each image sinceit can be slow and its nice to have some indication that the program isworking. Each image is a little vertical slice from the original. How to putthe slices together to re-create the original? It can be solved nicely with alittle html (knowledge of HTML is not required).

The download_images()function should also create an index.html file in the directory with an *img*tag to show each local image file. The img tags should all be on one linetogether without separation. In this way, the browser displays all the slicestogether seamlessly. You do not need knowledge of HTML to do this; just createan index.html file that looks like this:

 <verbatim> <html> <body><img src="/edu/python/exercises/img0"><imgsrc="/edu/python/exercises/img1"><imgsrc="/edu/python/exercises/img2">... </body></html> 

Here's what it shouldlook like when you can download the animal puzzle:

$ ./logpuzzle.py--todir animaldir animal_code.google.com $ ls animaldir img0  img1 img2  img3  img4 img5  img6  img7 img8  img9  index.html

When it's all working,opening the index.html in a browser should reveal the original animal image.What is the animal in the image?

Part C - Image Slice Descrambling

The second puzzleinvolves an image of a very famous place, but depends on some custom sorting.For the first puzzle, the urls can be sorted alphabetically to order the imagescorrectly. In the sort, the whole url is used. However, we'll say that if theurl ends in the pattern "-wordchars-wordchars.jpg",e.g. "http://example.com/foo/puzzle/bar-abab-baaa.jpg", then the urlshould be represented by the second word in the sort (e.g."baaa"). So sorting a list of urls each ending with the word-word.jpgpattern should order the urls by the second word.

Extend your code to ordersuch urls properly, and then you should be able to decode the secondplace_code.google.com puzzle which shows a famous place. What place does itshow?

CC Attribution: theimages used in this puzzle were made available by their owners under the Creative Commons Attribution 2.5 license, which generously encourages remixes of thecontent such as this one. The animal image is from the user zappowbang atflickr and the place image is from the user booleansplit at flickr


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值