I'm building a scraper/crawler for linux directories. in essence the program will take users input for startpoint (EX: /home/user/Pictures/) and endpoint (EX: /home/user/Pictures/) as well as a file type to scrape for (which is where my question comes in)
I'm storing acceptable file extension types in a dictionary w/ nested lists like so:file_types = {'audio': ['mp3', 'mpa'], 'images': ['png', 'jpg']}
if I store the users input as the variable scrape_for how can I validate that the string in the variable scrape_for exists in the dictionary file_types?
What I have tried:
this is my current block of code which does the following:
1. take user input for start point
2. verify startpoint is a valid directory
3. take user input for end point
4. validate end point is both a valid directory and sub directory of start point
5. print options of file extensions for user to choose from
import os
ftypes = {'audio': ['mp3', 'mpa', 'wpi', 'wav', 'wpi'], 'images': ['png', 'jpg', 'jpeg', 'gif', 'bmp'], 'text': ['txt', 'doc', 'pdf'], 'video': ['mp4', 'avi', '3g2', '3gp', 'mkv', 'm4v', 'mov', 'mpg', 'wmv', 'flv'], 'executable': ['apk', 'bat', 'bin', 'exe', 'py', 'wsf', 'com', 'cgi', 'pl']}
def UserInput():
#User inputs Start Point
Spoint = input('Where to start: \n')
#check validity of input
if os.path.isdir(Spoint):
print('Scraping will begin at: ' + Spoint)
elif not os.path.isdir(Spoint):
print('Not a valid directory')
exit()
#User input for End Point
Epoint = input('\n\nWhat directory would you like to stop scraping at? \n')
#Check if Endpoint is a valid SubDirectory of the parent directory
if os.path.isdir(Epoint) and len(Epoint) >= len(Spoint):
print('\n\nScraping will end at: ' + Epoint)
elif os.path.isdir(Epoint) or len(Epoint) >= len(Spoint):
print('Error w/ End Point directory, make sure directory is formatted correly, and is a sub directory of your Starting Point')
exit()
#User input for filetype
for k,v in ftypes.items():
print(k, v)
ScrapeType = input('Please enter The extension youd like scraped: \n')
解决方案Suppose you are looking for a filetype 'png'
True in ['png' in d for d in ftypes.values()]
will return True
for a non-existent file type
True in ['mdi' in d for d in ftypes.values()]
will return False
To check for a key
['audio' in ftypes.keys()] will return True
To check both
True in [input in d for d in ftypes.values()] or input in ftypes.keys()