HomeHarvest 项目使用教程

最新推荐文章于 2024-08-31 10:02:41 发布

柏赢安Simona

最新推荐文章于 2024-08-31 10:02:41 发布

阅读量253

点赞数 5

本文链接：https://blog.csdn.net/gitblog_00090/article/details/139387250

版权

HomeHarvest 项目使用教程

HomeHarvest Python package for real estate scraping of MLS listing data 项目地址: https://gitcode.com/gh_mirrors/ho/HomeHarvest

1. 项目的目录结构及介绍

HomeHarvest 项目的目录结构如下：

HomeHarvest/
├── github/
│   └── workflows/
│       └── examples/
├── homeharvest/
├── tests/
├── .gitignore
├── pre-commit-config.yaml
├── LICENSE
├── README.md
├── poetry.lock
└── pyproject.toml

目录结构介绍

github/workflows/: 包含 GitHub Actions 的工作流配置文件，用于自动化 CI/CD 流程。
homeharvest/: 项目的主要代码目录，包含用于抓取房地产数据的 Python 模块。
tests/: 包含项目的测试代码，用于确保代码的正确性和稳定性。
.gitignore: 指定 Git 版本控制系统忽略的文件和目录。
pre-commit-config.yaml: 配置预提交钩子，用于在提交代码前执行一些自动化检查。
LICENSE: 项目的开源许可证文件，本项目使用 MIT 许可证。
README.md: 项目的说明文档，包含项目的基本信息、安装和使用说明。
poetry.lock: 锁定项目依赖的版本，确保在不同环境中依赖的一致性。
pyproject.toml: 项目的配置文件，包含项目的基本信息、依赖和构建配置。

2. 项目的启动文件介绍

HomeHarvest 项目的启动文件是 homeharvest/ 目录下的 Python 文件。具体来说，主要的启动文件是 homeharvest/scrape_property.py，该文件包含了抓取房地产数据的主要逻辑。

启动文件介绍

scrape_property.py: 该文件定义了 scrape_property 函数，用于根据指定的参数抓取房地产数据。用户可以通过调用该函数来获取特定地区的房地产列表数据。

from homeharvest import scrape_property
from datetime import datetime

# 生成基于当前时间戳的文件名
current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"HomeHarvest_[current_timestamp].csv"

# 抓取房地产数据
properties = scrape_property(
    location="San Diego, CA",
    listing_type="sold",
    past_days=30
)

# 将数据导出为 CSV 文件
properties.to_csv(filename, index=False)

3. 项目的配置文件介绍

HomeHarvest 项目的配置文件主要包括 pyproject.toml 和 poetry.lock。

pyproject.toml

pyproject.toml 是项目的配置文件，包含了项目的基本信息、依赖和构建配置。以下是该文件的部分内容：

[tool.poetry]
name = "homeharvest"
version = "0.1.0"
description = "Python package for scraping real estate property data"
authors = ["Your Name <your.email@example.com>"]
license = "MIT"

[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.25.1"
pandas = "^1.2.4"

[tool.poetry.dev-dependencies]
pytest = "^6.2.4"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"