Python 入门教程笔记(附爬虫案例源码讲解)

梵心白莲

已于 2024-10-02 17:16:19 修改

阅读量740

点赞数 16

分类专栏：各类语言和技术总结笔记文章标签： python 笔记实例讲解

于 2024-10-02 14:43:04 首次发布

本文链接：https://blog.csdn.net/ashyyyy/article/details/142681290

版权

各类语言和技术总结笔记专栏收录该内容

15 篇文章 0 订阅

订阅专栏

Python 入门教程笔记(附爬虫案例源码讲解)

下面是一个全面的 Python 教程，适合初学者和有一定编程经验的人士。Python 是一种高级编程语言，以其简洁和易读性而闻名，广泛应用于 Web 开发、数据分析、人工智能等多个领域。准备了一写参考链接：windows环境Python开发环境搭建指南(附实例源码和讲解教程)、简易服务端客户端下载、简易python-flask数据库服务器vue网页客户端下载

1. 简介

定义：Python 是一种解释型、面向对象、动态数据类型的高级编程语言。
用途：
- Web 开发（如 Django 和 Flask 框架）。
- 数据分析（如 Pandas 和 NumPy 库）。
- 机器学习和人工智能（如 TensorFlow 和 PyTorch 库）。
- 自动化脚本。
- 科学计算。
特点：
- 语法简洁清晰。
- 丰富的标准库和第三方库。
- 跨平台支持（Windows、Linux、macOS）。
- 强大的社区支持。

2. 安装 Python

在 Windows 上安装

访问 Python 官方网站下载最新版本的 Python 安装包。
运行下载的 .exe 文件并按照提示进行安装。
勾选“Add Python to PATH”选项以自动配置环境变量。

在 macOS 上安装

使用 Homebrew 安装 Python：
```
brew install python
```

在 Linux 上安装

使用包管理器安装 Python：

sudo apt-get update
sudo apt-get install python3

3. 第一个 Python 程序

创建项目目录

创建一个新的目录用于存放你的 Python 项目，例如 myproject。

编写第一个程序

在 myproject 目录下创建一个名为 hello.py 的文件。
编辑 hello.py 文件，添加以下内容：
```
print("Hello, World!")
```

运行程序

打开终端或命令提示符，导航到 myproject 文件夹。
运行以下命令执行程序：
```
python hello.py
```
你应该会看到输出 Hello, World!。

4. Python 基础语法

注释

单行注释使用 #。
多行注释使用三引号 ''' ... ''' 或 """ ... """。

# 这是单行注释
"""
这是多行注释
可以跨越多行
"""

变量

变量不需要显式声明类型。
支持动态类型。

a = 42
b = 3.14
c = True
d = "Hello, World!"

数据类型

基本类型：int, float, bool, str。
复合类型：list, tuple, dict, set。

my_int = 42
my_float = 3.14
my_bool = True
my_str = "Hello, World!"

my_list = [1, 2, 3]
my_tuple = (1, 2, 3)
my_dict = {"one": 1, "two": 2}
my_set = {1, 2, 3}

字符串

使用单引号 ' ' 或双引号 " " 定义字符串。
支持多行字符串（三引号 ''' ... ''' 或 """ ... """）。

s1 = 'Hello, World!'
s2 = "This is a string."
s3 = '''This is a
multi-line string.'''

列表和元组

列表是可变的。
元组是不可变的。

# 列表
my_list = [1, 2, 3]
print(my_list)  # 输出: [1, 2, 3]

# 元组
my_tuple = (1, 2, 3)
print(my_tuple)  # 输出: (1, 2, 3)

# 添加元素到列表
my_list.append(4)
print(my_list)  # 输出: [1, 2, 3, 4]

控制结构

条件语句

if...elif...else 语句

age = 18

if age >= 18:
    print("You are an adult.")
elif age >= 13:
    print("You are a teenager.")
else:
    print("You are a child.")

循环

for 循环

for i in range(5):
    print(i)  # 输出: 0 1 2 3 4

while 循环

i = 0
while i < 5:
    print(i)  # 输出: 0 1 2 3 4
    i += 1

遍历列表

my_list = [1, 2, 3]
for item in my_list:
    print(item)  # 输出: 1 2 3

5. 函数

定义函数

使用 def 关键字定义函数。

def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))  # 输出: Hello, Alice!

默认参数

函数可以有默认参数值。

def greet(name, greeting="Hello"):
    return f"{greeting}, {name}!"

print(greet("Alice"))         # 输出: Hello, Alice!
print(greet("Bob", "Hi there"))  # 输出: Hi there, Bob!

可变参数

使用 *args 表示可变位置参数。
使用 **kwargs 表示可变关键字参数。

def sum(*args):
    total = 0
    for num in args:
        total += num
    return total

print(sum(1, 2, 3, 4))  # 输出: 10

def display_info(**kwargs):
    for key, value in kwargs.items():
        print(f"{key}: {value}")

display_info(name="Alice", age=30)  # 输出: name: Alice, age: 30

6. 类和对象

定义类

使用 class 关键字定义类。

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def say_hello(self):
        return f"Hello, my name is {self.name} and I am {self.age} years old."

p = Person("Alice", 30)
print(p.say_hello())  # 输出: Hello, my name is Alice and I am 30 years old.

继承

使用 class 关键字定义子类，并使用 super() 调用父类的方法。

class Student(Person):
    def __init__(self, name, age, grade):
        super().__init__(name, age)
        self.grade = grade

    def say_hello(self):
        return f"Hello, I'm a student named {self.name} and I am {self.age} years old, in grade {self.grade}."

s = Student("Bob", 20, "A")
print(s.say_hello())  # 输出: Hello, I'm a student named Bob and I am 20 years old, in grade A.

7. 文件操作

读取文件

使用 open 函数打开文件，并使用 read 方法读取内容。

with open("example.txt", "r") as file:
    content = file.read()
    print(content)

写入文件

使用 open 函数打开文件，并使用 write 方法写入内容。

with open("example.txt", "w") as file:
    file.write("This is some text.")

追加内容

使用 open 函数以追加模式打开文件，并使用 write 方法写入内容。

with open("example.txt", "a") as file:
    file.write("\nThis is additional text.")

8. 异常处理

捕获异常

使用 try...except 语句捕获和处理异常。

try:
    result = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero.")

多个异常

使用多个 except 子句处理不同类型的异常。

try:
    result = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero.")
except TypeError:
    print("Invalid data type.")

抛出异常

使用 raise 关键字抛出异常。

def divide(a, b):
    if b == 0:
        raise ValueError("Cannot divide by zero.")
    return a / b

try:
    result = divide(10, 0)
except ValueError as e:
    print(e)  # 输出: Cannot divide by zero.

9. 标准库

数学运算

使用 math 模块进行数学运算。

import math

print(math.sqrt(16))  # 输出: 4.0
print(math.pi)         # 输出: 3.141592653589793

时间和日期

使用 datetime 模块处理时间和日期。

from datetime import datetime

now = datetime.now()
print(now)  # 输出当前时间

随机数

使用 random 模块生成随机数。

import random

print(random.randint(1, 10))  # 输出: 1 到 10 之间的随机整数

10. 第三方库

安装第三方库

使用 pip 工具安装第三方库。

pip install numpy

使用第三方库

导入并使用第三方库。

import numpy as np

arr = np.array([1, 2, 3])
print(arr)  # 输出: [1 2 3]

11. 虚拟环境

创建虚拟环境

使用 venv 模块创建虚拟环境。

python -m venv myenv

激活虚拟环境

在 Windows 上激活虚拟环境：
```
myenv\Scripts\activate
```
在 macOS 和 Linux 上激活虚拟环境：
```
source myenv/bin/activate
```

退出虚拟环境

使用 deactivate 命令退出虚拟环境。

deactivate

12. python爬虫实例及讲解

下面是一个简单的 Python 爬虫示例，我们将使用 requests 库来获取网页内容，并使用 BeautifulSoup 库来解析 HTML。这个示例将从一个网站抓取一些数据并打印出来。假设我们要从一个新闻网站（如 https://news.ycombinator.com/）抓取最新的新闻标题和链接。

12.1. 安装必要的库

首先，你需要安装 requests 和 beautifulsoup4 库。你可以使用 pip 来安装这些库：

pip install requests beautifulsoup4

12.2. 编写爬虫代码

创建一个新的 Python 文件，例如 scraper.py，然后添加以下代码：

import requests
from bs4 import BeautifulSoup

# 目标 URL
url = 'https://news.ycombinator.com/'

# 发送 HTTP 请求
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 解析 HTML 内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找所有的新闻条目
    news_items = soup.find_all('span', class_='titlelink')
    
    # 打印新闻标题和链接
    for item in news_items:
        title = item.get_text()
        link = item.find('a')['href']
        print(f'Title: {title}')
        print(f'Link: {link}\n')
else:
    print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

12.3. 代码讲解

导入库

import requests
from bs4 import BeautifulSoup

requests 用于发送 HTTP 请求。
BeautifulSoup 用于解析 HTML 文档。

设置目标 URL

url = 'https://news.ycombinator.com/'

这里我们设置要爬取的网站 URL。

发送 HTTP 请求

response = requests.get(url)

使用 requests.get 方法发送 GET 请求到指定的 URL，并获取响应。

检查请求是否成功

if response.status_code == 200:

检查响应的状态码是否为 200，表示请求成功。

解析 HTML 内容

soup = BeautifulSoup(response.text, 'html.parser')

使用 BeautifulSoup 解析 HTML 文档。response.text 是响应的内容，'html.parser' 是解析器类型。

查找新闻条目

news_items = soup.find_all('span', class_='titlelink')

使用 find_all 方法查找所有具有特定类名 titlelink 的 <span> 标签。这些标签通常包含新闻标题和链接。

打印新闻标题和链接

for item in news_items:
    title = item.get_text()
    link = item.find('a')['href']
    print(f'Title: {title}')
    print(f'Link: {link}\n')

遍历找到的新闻条目。
get_text() 方法获取文本内容（即新闻标题）。
find('a')['href'] 获取 <a> 标签中的 href 属性值（即新闻链接）。
打印新闻标题和链接。

12.4. 运行爬虫

在终端或命令提示符中运行你的爬虫脚本：

python scraper.py

你应该会看到类似以下的输出：

Title: Example News Title 1
Link: https://example.com/news1

Title: Example News Title 2
Link: https://example.com/news2

...

12.5. 注意事项

合法性：确保你有权爬取该网站的数据。查看网站的 robots.txt 文件和使用条款。
频率控制：不要频繁请求同一个网站，以免对服务器造成负担。可以使用 time.sleep 控制请求频率。
错误处理：增加更多的错误处理逻辑，以应对网络问题或其他异常情况。

通过这个简单的示例，你可以开始编写自己的爬虫来抓取网页上的信息。随着经验的积累，你可以尝试更复杂的任务，比如登录网站、处理分页等。

总结

以上是一个全面的 Python 入门教程，涵盖了从基础语法到类和对象、文件操作、异常处理、标准库和第三方库的基本步骤。通过这些基础知识，你可以开始编写简单的 Python 程序，并进一步探索更复杂的功能和创意。

梵心白莲

关注

16
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录