转https://blog.csdn.net/XuZZ94/article/details/79669313
工程目录
.
└── com
├── app
│ ├── crawler02.py
│ └── __init__.py
├── core
│ ├── crawler_core.py
│ └── __init__.py
│
│
│
├── crawler01.py
├── __init__.py
│
└── tool
目的
crawler01 和 crawler02都需要调用crawler_core(下简称f)中的方法。
crawler01 是f的父级目录下的文件其调用方法是:
#!/usr/bin/env python
#coding=utf-8
from core import crawler_core
if __name__ == '__main__':
url = "url"
html = crawler_core.getHtml(url)
print(html)
crawler02 是f的同级目录下的文件其调用方法是:
#!/usr/bin/env python
#coding=utf-8
import os
import sys
sys.path.append(os.path.abspath(os.path.dirname(__file__)+'/'+'..'))
from core import crawler_core
url = "url"
html = crawler_core.getHtml(url)
print(html)
上面是把把当前python程序所在目录的父目录的绝对路径加入到环境变量PYTHON_PATH中。PYTHON_PATH是python的搜索路径,再引入模块时就可以从父目录中搜索得到了
crawler_core的代码:python3.X 后使用urllib.request
#!/usr/bin/env python
#coding=utf-8
#简单爬虫实现
import urllib.request
def getHtml(url):
page =urllib.request.urlopen(url)
html = ""
for line in page.readlines():
html = html+str(line)+"\n"
return html