12 个对开发人员有用的 Python 脚本

最新推荐文章于 2024-05-27 17:00:50 发布

疯狂的码泰君

最新推荐文章于 2024-05-27 17:00:50 发布

阅读量972

点赞数 16

分类专栏： Python 文章标签： python 开发语言

本文链接：https://blog.csdn.net/qq_46264636/article/details/136252078

版权

Python 专栏收录该内容

5 篇文章 1 订阅

订阅专栏

众所周知，开发人员的日常工作通常涉及创建和维护小型实用程序脚本。这些脚本是连接系统或构建环境各个方面的粘合剂。虽然这些 Python 脚本可能并不复杂，但维护它们可能会成为一项乏味的苦差事，可能会花费您的时间和金钱。

减轻维护负担的一种方法是使用脚本自动化。脚本自动化允许您安排这些任务在特定时间表上运行或响应某些事件而触发，而不是花费时间运行脚本（通常是手动）。

本文将介绍 12 个 Python 脚本，这些脚本是根据其通用实用性、易用性以及对工作负载的积极影响而选择的。它们的复杂程度从简单到中级不等，专注于文本处理和文件管理。具体来说，我们将介绍以下用例：
本文中的代码示例可以在以下位置找到 Here

Create strong random passwords

创建强随机密码的原因有很多，从加入新用户到提供密码重置工作流程，再到在轮换凭据时创建新密码。您可以轻松地使用无依赖项的 Python 脚本来自动执行此过程：

# Generate Strong Random Passwords
import random
import string
# This script will generate an 18 character password
word_length = 18
# Generate a list of letters, digits, and some punctuation
components = [string.ascii_letters, string.digits, "!@#$%&"]
# flatten the components into a list of characters
chars = []
for clist in components:
  for item in clist:
    chars.append(item)
def generate_password():
  # Store the generated password
  password = []
  # Choose a random item from 'chars' and add it to 'password'
  for i in range(word_length):
    rchar = random.choice(chars)
    password.append(rchar)
  # Return the composed password as a string
  return "".join(password)
# Output generated password
print(generate_password())

Extract text from a PDF

Python 还可以使用 PyPDF2 包轻松地从 PDF 中提取文本。事实证明，从 PDF 文件中获取文本对于数据挖掘、发票核对或报告生成非常有用，并且只需几行代码即可实现提取过程的自动化。您可以在终端中运行 pip install PyPDF2 来安装该软件包。以下是使用 Py2PDF2 可以实现的一些示例：
假设您收到一个多页 PDF 文件，但您只需要第一页。下面的脚本允许您使用几行 Python 代码从 PDF 第一页中提取文本：

# import module PyPDF2
import PyPDF2
# put 'example.pdf' in working directory
# and open it in read binary mode
pdfFileObj = open('example.pdf', 'rb')
# call and store PdfFileReader
# object in pdfReader
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
# to print the total number of pages in pdf
# print(pdfReader.numPages)
# get specific page of pdf by passing
# number since it stores pages in list
# to access first page pass 0
pageObj = pdfReader.getPage(0)
# extract the page object
# by extractText() function
texts = pageObj.extractText()
# print the extracted texts
print(texts)

也许您想从两个 PDF 文件复制文本并将文本合并到一个新的 PDF 中。您可以使用下面的代码来完成此操作：

Text processing with Pandoc

Pandoc 是一个功能齐全的命令行工具，允许您在不同格式之间转换标记。这意味着您可以使用 Pandoc 将 Markdown 文本直接转换为 docx 或将 MediaWiki 标记转换为 DocBook。标记格式转换允许您处理外部内容或用户提交的信息，而不将数据限制为单一格式。您可以使用 pip 安装 pandoc 包。以下是一些可以使用 pandoc 执行操作的示例。
首先，假设您收到一个 Markdown 格式的文档，但需要将其转换为 PDF。 pandoc 让这变得简单：

import pandoc

in_file = open("example.md", "r").read()
pandoc.write(in_file, file="example.pdf", format="pdf")

或者您可能想将 markdown 文件转换为 json 对象。您可以使用以下脚本来执行此操作：

import pandoc
md_string = """
# Hello from Markdown

**This is a markdown string**
"""
input_string = pandoc.read(md_string)
pandoc.write(input_string, format="json", file="md.json")

您可以在此处找到许多其他示例函数，或者查看 pandoc 包文档以获取更多信息。

Manipulate audio with Pydub

Pydub 是一个 Python 包，可让您操作音频，包括将音频转换为各种文件格式，如 wav 或 mp3。此外，Pydub 可以将音频文件分割成毫秒样本，这对于机器学习任务可能特别有用。可以通过在终端中输入 pip install pydub 来安装 Pydub。
假设您正在处理音频，需要确保每个文件都有适当的音量。您可以使用此脚本自动执行该任务：

from pydub import AudioSegment

audio_file = AudioSegment.from_mp3("example.mp3")
louder_audio_file = audio_file + 18
louder_audio_file.export("example_louder.mp3", format="mp3")

Pydub 有许多本示例中未涵盖的附加功能。您可以在 Pydub GitHub 存储库中找到更多内容。

Filter text

# Filter Text
# Import re module
import re
# Take any string data
string = """a string we are using to filter specific items.
perhaps we would like to match credit card numbers
mistakenly entered into the user input. 4444 3232 1010 8989
and perhaps another? 9191 0232 9999 1111"""

# Define the searching pattern
pattern = '(([0-9](\s+)?){4}){4}'

# match the pattern with input value
found = re.search(pattern, string)
print(found)
# Print message based on the return value
if found:
  print("Found a credit card number!")
else:
  print("No credit card numbers present in input")

Locate addresses

如果您正在处理运输或交付物流或执行简单的用户分析任务，查找地址可能会很有用。首先，通过在终端中运行 pip install geocoder 来安装地理编码器。下面的脚本允许您轻松查找任何地址的纬度和经度坐标，或从任何坐标集中查找地址：

import geocoder
address = "1600 Pennsylvania Ave NW, Washington DC USA"
coordinates = geocoder.arcgis(address)
geo = geocoder.arcgis(address)
print(geo.latlng)
# output: [38.89767510765125, -77.03654699820865]

# If we want to retrieve the location from a set of coordinates
# perform a reverse query.
location = geocoder.arcgis([38.89767510765125, -77.03654699820865], method="reverse")

# output: <[OK] Arcgis - Reverse [White House]>
print(location)

Convert a CSV to Excel

您可能会发现自己经常管理来自分析平台或数据集的 CSV 文件输出。在 Excel 中打开 CSV 文件相对简单，但 Python 允许您通过自动转换来跳过此手动步骤。这还允许您在转换为 Excel 之前操作 CSV 数据，从而节省额外的时间和精力。
首先使用 pip install openpyxl 下载 openpyxl 软件包。安装 openpyxl 后，您可以使用以下脚本将 CSV 文件转换为 Excel 电子表格：

#!python3
# -*- coding: utf-8 -*-

import openpyxl
import sys

#inputs
print("This programme writes the data in any Comma-separated value file (such as: .csv or .data) to a Excel file.")
print("The input and output files must be in the same directory of the python file for the programme to work.\n")

csv_name = input("Name of the CSV file for input (with the extension): ")
sep = input("Separator of the CSV file: ")
excel_name = input("Name of the excel file for output (with the extension): ")
sheet_name = input("Name of the excel sheet for output: ")

#opening the files
try:
    wb = openpyxl.load_workbook(excel_name)
    sheet = wb.get_sheet_by_name(sheet_name)

    file = open(csv_name,"r",encoding = "utf-8")
except:
    print("File Error!")
    sys.exit()

#rows and columns
row = 1
column = 1

#for each line in the file
for line in file:
    #remove the \n from the line and make it a list with the separator
    line = line[:-1]
    line = line.split(sep)

    #for each data in the line
    for data in line:
        #write the data to the cell
        sheet.cell(row,column).value = data
        #after each data column number increases by 1
        column += 1

    #to write the next line column number is set to 1 and row number is increased by 1
    column = 1
    row += 1

#saving the excel file and closing the csv file
wb.save(excel_name)
file.close()

Pattern match with regular expressions

从非结构化来源收集数据可能是一个非常乏味的过程。与上面的过滤示例类似，Python 允许使用正则表达式进行更详细的模式匹配。这对于将文本信息分类为数据处理工作流程的一部分或在用户提交的内容中搜索特定关键字非常有用。内置的正则表达式库称为 re，一旦掌握了正则表达式语法，您就可以自动化几乎所有模式匹配脚本。
例如，也许您想匹配在您正在处理的文本中找到的任何电子邮件地址。您可以使用此脚本来执行此操作：

import re
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+     # username
    @                               # @ symbol
     [a-zA-Z0-9.-]+           # domain name
     (\.[a-zA-Z]{2,4})         # dot-something
  )''', re.VERBOSE)

# store matched addresses in an array called "matches"
matches = []
text = """
An example text containing an email address, such as user@example.com or something like hello@example.com
"""

# search the text and append matched addresses to the "matches" array
for groups in emailRegex.findall(text):
    matches.append(groups[0])

# matches => ['user@example.com', 'hello@example.com']
print(matches)

如果您需要匹配文本中的电话号码，可以使用此脚本：

import re

text = """
Here is an example string containing various numbers, some 
of which are not phone numbers.

Business Address
4553-A First Street
Washington, DC 20001

202-555-6473
301-555-8118
"""

phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?                 # area code
    (\s|-|\.)?                             # separator
    (\d{3})                               # first 3 digits
    (\s|-|\.)                               # separator
    (\d{4})                               # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))?    # extension
    )''', re.VERBOSE)

matches = []
for numbers in phoneRegex.findall(text):
  matches.append(numbers[0])

# matches => ['202-555-6473', '301-555-8118']
print(matches)

Convert images to JPG

.jpg 格式可能是当前使用的最流行的图像格式。您可能会发现自己需要转换其他格式的图像以生成项目资产或图像识别。 Python 的 Pillow 包使得将图像转换为 jpg 成为一个简单的过程：

# requires the Pillow module used as `PIL` below
from PIL import Image
import os
import sys
file="toJPG.png"
filename = file.split(".")
img = Image.open(file)
new_name = filename[0] + ".jpg"
converted_img = img.convert('RGB')
converted_img.save(new_name)

Compress images

有时，您可能需要压缩图像作为新站点或临时登录页面的资产创建管道的一部分，并且可能不希望手动执行此操作，或者您必须将任务发送到外部图像处理服务。使用枕头包，您可以轻松压缩 JPG 图像以减小文件大小，同时保持图像质量。使用 pip installpillow 安装pillow。

# the pillow package can be imported as PIL
from PIL import Image
file_path =  "image_uncompressed.jpg"
img = Image.open(file_path)
height, width = img.size
compressed = img.resize((height, width), Image.ANTIALIAS)
compressed.save("image_compressed.jpg", optimize=True,quality=9)

Get content from Wikipedia

维基百科对许多主题提供了精彩的总体概述。此信息可用于向交易电子邮件添加附加信息、跟踪特定文章集的更改或制作培训文档或报告。值得庆幸的是，使用 Python 的 Wikipedia 包收集信息也非常容易。
您可以使用 pip install wikipedia 安装 Wikipedia 包。安装完成后，您就可以开始了。
如果您已经知道要提取的特定页面内容，则可以直接从该页面执行此操作：

import wikipedia
page_content = wikipedia.page("parsec").content
# outputs the text content of the "Parsec" page on wikipedia
print(page_content)

该包还允许您搜索与指定文本匹配的页面：

import wikipedia
search_results = wikipedia.search("arc second")
# outputs an array of pages matching the search term
print(search_results)

Create and manage Heroku apps

Heroku 是一个用于部署和托管 Web 应用程序的流行平台。作为一项托管服务，它允许开发人员通过 Heroku API 轻松设置、配置、维护甚至删除应用程序。您还可以使用 Airplane 运行手册轻松创建或管理 Heroku 应用程序，因为 Airplane 使访问 API 和触发事件变得非常容易。
下面的示例依赖于 heroku3 软件包，您可以使用 pip install heroku3 安装该软件包。请注意，您需要 Heroku API 密钥才能访问该平台。
使用 Python 连接到 Heroku：

import heroku3

# Be sure to update the api_key variable with your key
api_key = "12345-ABCDE-67890-FGHIJ"
client = heroku3.from_key(api_key)

连接到 Heroku 后，您可以列出可用的应用程序并选择要直接管理的应用程序：

import heroku3
api_key = "12345-ABCDE-67890-FGHIJ"
client = heroku3.from_key(api_key)

client.apps()

# the above command prints an array of available applications
# [<app 'airplanedev-heroku-example - ed544e41-601d-4d1b-a327-9a1945b743cb'>, <app 'notes-app - 5b3d6aab-cde2-4527-9ecc-62bdee08ed4a'>, …] 

# use the following command to connect to a specific application
app = client.apps()["airplanedev-heroku-example"]

# add a config variable for your application
config = app.config()
config["test_var"] = "value"

# enable or disable maintenance mode
# enable
app.enable_maintenance_mode()

# disable
app.disable_maintenance_mode()

# restarting your application is simple
app.restart()

然后，以下脚本将允许您创建一个应用程序作为 Airplane Runbook 的一部分：

import heroku3
api_key = "12345-ABCDE-67890-FABCD"
client = heroku3.from_key(api_key)

client.create_app("app-created-with-airplane")

疯狂的码泰君

关注

16
点赞
踩
22

收藏

觉得还不错? 一键收藏
打赏
0
评论
12 个对开发人员有用的 Python 脚本

众所周知，开发人员的日常工作通常涉及创建和维护小型实用程序脚本。这些脚本是连接系统或构建环境各个方面的粘合剂。虽然这些 Python 脚本可能并不复杂，但维护它们可能会成为一项乏味的苦差事，可能会花费您的时间和金钱。减轻维护负担的一种方法是使用脚本自动化。脚本自动化允许您安排这些任务在特定时间表上运行或响应某些事件而触发，而不是花费时间运行脚本（通常是手动）。本文将介绍 12 个 Python 脚本，这些脚本是根据其通用实用性、易用性以及对工作负载的积极影响而选择的。
复制链接

扫一扫