说明
本文记录自己在爬虫过程中遇到的时间解析过程,因为有些网站显示的时间格式千奇百怪,但是我们存到数据库的格式却是唯一的。
下面讲自己在某论坛网站上遇到的时间格式解析
操作
在下面的所有时间操作中,都是将时间转换成标准的格式 %Y-%m-%d %H:%M:%S 示例:2018-07-26 18:56:42
在示例代码中会出现 s_time 这个字符串是我们提取出来的字符串,需要做处理的字符串。result_time 是我们处理的结果。
1、s_time = "2017-06-15"
if re.findall(r'\d{1,4}-\d{1,2}-\d{1,2}', s_time):
result_time = time.strftime("%Y-%m-%d %H:%M:%S", time.strptime(s_time, "%Y-%m-%d"))
2、s_time = "6天前"
elif u'天前' in s_time:
days = re.findall(u'(\d+)天前', s_time)[0]
result_time = (datetime.now() - timedelta(days=int(days))).strftime("%Y-%m-%d %H:%M:%S")
3、s_time = "昨天 18:03"
elif u'昨天' in s_time:
last_time = re.findall(r'.*?(\d{1,2}:\d{1,2})', s_time)[0]
days_ago = datetime.now() - timedelta(days=int(1))
y_m_d = str(days_ago.year) + '-' + str(days_ago.month) + '-' + str(days_ago.day)
_time = y_m_d + ' ' + last_time
result_time = time.strftime("%Y-%m-%d %H:%M:%S", time.strptime(_time, "%Y-%m-%d %H:%M"))
4、s_time = "28分钟前"
elif u'分钟前' in s_time:
minutes = re.findall(u'(\d+)分钟', s_time)[0]
minutes_ago = (datetime.now() - timedelta(minutes=int(minutes))).strftime("%Y-%m-%d %H:%M:%S")
result_time = minutes_ago
5、s_time = "06-29"
elif re.findall(r'\d{1,2}-\d{1,2}', s_time) and len(s_time) <= 5:
now_year = str(datetime.now().year)
_time = now_year + '-' + s_time
result_time = time.strftime("%Y-%m-%d %H:%M:%S", time.strptime(_time, "%Y-%m-%d"))
6、s_time = "1小时前"
elif u'小时前' in s_time:
hours = re.findall(u'(\d+)小时前', s_time)[0]
hours_ago = (datetime.now() - timedelta(hours=int(hours))).strftime("%Y-%m-%d %H:%M:%S")
result_time = hours_ago
7、s_time = "1532573387"
elif re.findall('\d{10,13}',s_time)[0]:
_t = int(s_time)
_time = time.localtime(int(_t))
result_time = time.strftime("%Y-%m-%d %H:%M:%S", _time)
以上就是某论坛基本的时间格式。最后贴出完整代码。
def parse_time(self,s_time):
result_time = ''
# 1、2017-06-15
if re.findall(r'\d{1,4}-\d{1,2}-\d{1,2}', s_time):
result_time = time.strftime("%Y-%m-%d %H:%M:%S", time.strptime(s_time, "%Y-%m-%d"))
# 6天前
elif u'天前' in s_time:
days = re.findall(u'(\d+)天前', s_time)[0]
result_time = (datetime.now() - timedelta(days=int(days))).strftime("%Y-%m-%d %H:%M:%S")
# 昨天 18:03
elif u'昨天' in s_time:
last_time = re.findall(r'.*?(\d{1,2}:\d{1,2})', s_time)[0]
days_ago = datetime.now() - timedelta(days=int(1))
y_m_d = str(days_ago.year) + '-' + str(days_ago.month) + '-' + str(days_ago.day)
_time = y_m_d + ' ' + last_time
result_time = time.strftime("%Y-%m-%d %H:%M:%S", time.strptime(_time, "%Y-%m-%d %H:%M"))
# 28分钟前
elif u'分钟前' in s_time:
minutes = re.findall(u'(\d+)分钟', s_time)[0]
minutes_ago = (datetime.now() - timedelta(minutes=int(minutes))).strftime("%Y-%m-%d %H:%M:%S")
result_time = minutes_ago
# 06-29
elif re.findall(r'\d{1,2}-\d{1,2}', s_time) and len(s_time) <= 5:
now_year = str(datetime.now().year)
_time = now_year + '-' + s_time
result_time = time.strftime("%Y-%m-%d %H:%M:%S", time.strptime(_time, "%Y-%m-%d"))
# 1小时前
elif u'小时前' in s_time:
hours = re.findall(u'(\d+)小时前', s_time)[0]
hours_ago = (datetime.now() - timedelta(hours=int(hours))).strftime("%Y-%m-%d %H:%M:%S")
result_time = hours_ago
return result_time
下面是某次爬虫遇到的结果。