statistics模块维护一些关于seafile的统计功能。
目录结构
models.py
handler.py
counter.py
db.py
model.py:数据库模型声明
TotalStorageStat:
id,timestamp,total_size,org_id
FileOpsStat:
id,timestemp,op_type,number,org_id
UserTrafficStat:
email,month,block_download,file_view,
file_download,dir_download
UserAcivityStat:
id,name_time_md5,timestamp,org_id
UserTraffic:
id,user,org_id,timestamp,op_type,size
SysTraffic:
id,org_id,timestamp,op_type,size
MonthlyUserTraffic:
id,user,org_id,timestamp,
web_file_upload,web_file_download,
sync_file_upload,sync_file_download,
link_file_upload,link_file_download
MonthlySysTraffic:
id,org_id,timestamp,
web_file_updoad,web_file_download,
sync_file_upload,sync_file_download,
link_file_upload,link_file_download
db.py:对数据库的操作
文件结构
get_org_id
get_user_activity_stats_by_day
get_org_user_activity_stats_by_day
_get_total_storage_stats
get_total_storage_stats_by_day
get_org_storage_stats_by_day
get_user_traffic_by_day
get_org_traffic_by_day
get_system_traffic_by_day
get_all_user_traffic_by_month
get_all_orgs_traffic_by_month
get_user_traffic_by_month
根据函数名可以大概了解响应数据库的操作,不做过多分析
handlers.py:事件处理类
目录结构
UserLoginEventHandler
FileStatsEventHandler
register_handlers
UserLoginEventHandler(session,msg)
elements = msg['content'].split('\t')
if len(elements) != 4:
logging.warning("got bad message: %s", elements)
return
username = elements[1]
timestamp = elements[2]
org_id = elements[3]
_timestamp = datetime.strptime(timestamp, '%Y-%m-%d %H:%M:%S')
从msg中获取相应信息,并调用counter中的update_hash_record方法
update_hash_record(session, username, _timestamp, org_id)
FileStatsEventHandler(session,msg)
和上一个方法类似,先从msg中获取信息,然后调用counter中的相应方法
elements = msg['content'].split('\t')
if len(elements) != 4:
logging.warning("got bad message: %s", elements)
return
timestamp = datetime.utcfromtimestamp(msg['ctime'])
oper = elements[0]
user_name = elements[1]
repo_id = elements[2]
size = int(elements[3])
save_traffic_info(session, timestamp, user_name, repo_id, oper, size)
register_handlers(handlers)
handlers.add_handler('seahub.stats:user-login', UserLoginEventHandler)
handlers.add_handler('seaf_server.stats:web-file-upload', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:web-file-download', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:link-file-upload', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:link-file-download', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:sync-file-upload', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:sync-file-download', FileStatsEventHandler)
handlers为mq_handler文件中的MessageHandler对象
def add_handler(self, msg_type, func):
if msg_type in self._handlers:
funcs = self._handlers[msg_type]
else:
funcs = []
self._handlers[msg_type] = funcs
if func not in funcs:
funcs.append(func)
register_handlers的作用相当于注册了一系列功能
counter.py:
文件结构
update_hash_record
save_traffic_info
FileOpsCounter
TotalStorageCounter
TrafficInfoCounter
MonthlyTrafficCounter
UserActivityCounter
可以看出所有Counter类都是对数据库操作进行调用的工具类。
def update_hash_record(session, login_name, login_time, org_id)
向login_records字典中添加信息。
if not appconfig.enable_statistics:
return
首先无配置直接返回
time_str = login_time.strftime('%Y-%m-%d 00:00:00')
time_by_day = datetime.strptime(time_str, '%Y-%m-%d %H:%M:%S')
md5_key = hashlib.md5((login_name + time_str).encode('utf-8')).hexdigest()
对信息进行操作并生成md5
login_records[md5_key] = (login_name, time_by_day, org_id)
添加入字典
def save_traffic_info(session, timestamp, user_name, repo_id, oper, size)
和上一个结构类似
if not appconfig.enable_statistics:
return
org_id = get_org_id(repo_id)
time_str = timestamp.strftime('%Y-%m-%d')
if time_str not in traffic_info:
traffic_info[time_str] = {}
if (org_id, user_name, oper) not in traffic_info[time_str]:
traffic_info[time_str][(org_id, user_name, oper)] = size
else:
traffic_info[time_str][(org_id, user_name, oper)] += size
处理的时候对二维数组多了一些判定,记录同一天的用户各种操作类型分别的操作次数。
通过分析对数据库进行操作的类的代码,可以加深我们对数据库字段作用的了解
FileOpsCounter
def start_count():开始对文件操作进行计数。
补充关于时间的操作
datetime.utcnow:获取系统的世界标准时间
dt = datetime.utcnow()
delta = timedelta(hours=1)
_start = (dt - delta)
的作用是获取dt的前一个小时
start = _start.strftime('%Y-%m-%d %H:00:00')
end = _start.strftime('%Y-%m-%d %H:59:59')
处理开始时间和结束时间
s_timestamp = datetime.strptime(start, '%Y-%m-%d %H:%M:%S')
e_timestamp = datetime.strptime(end, '%Y-%m-%d %H:%M:%S')
转换为标准形式
q = self.edb_session.query(FileOpsStat.timestamp).filter(
FileOpsStat.timestamp==s_timestamp)
if q.first():
self.edb_session.close()
return
返回条件1
q = self.edb_session.query(FileUpdate.org_id, FileUpdate.timestamp, FileUpdate.file_oper).filter(
FileUpdate.timestamp.between(
s_timestamp, e_timestamp))
找到FileUpdate中的所有在时间范围内的数据(FileUpdate数据表定义在events.models)
rows = q.all()
for row in rows:
org_id = row.org_id
if 'Added' in row.file_oper:
total_added += 1
if org_id not in org_added:
org_added[org_id] = 1
else:
org_added[org_id] += 1
elif 'Deleted' in row.file_oper or 'Removed' in row.file_oper:
total_deleted += 1
if org_id not in org_deleted:
org_deleted[org_id] = 1
else:
org_deleted[org_id] += 1
elif 'Modified' in row.file_oper:
total_modified += 1
if org_id not in org_modified:
org_modified[org_id] = 1
else:
org_modified[org_id] += 1
根据操作类型进行计数
q = self.edb_session.query(FileAudit.org_id, func.count(FileAudit.eid)).filter(
FileAudit.timestamp.between(
s_timestamp, e_timestamp)).group_by(FileAudit.org_id)
rows = q.all()
for row in rows:
org_id = row[0]
total_visited += row[1]
org_visited[org_id] = row[1]
从FileAudit中选择“Visited”信息
for k, v in org_added.items():
new_record = FileOpsStat(k, s_timestamp, 'Added', v)
self.edb_session.add(new_record)
for k, v in org_deleted.items():
new_record = FileOpsStat(k, s_timestamp, 'Deleted', v)
self.edb_session.add(new_record)
for k, v in org_visited.items():
new_record = FileOpsStat(k, s_timestamp, 'Visited', v)
self.edb_session.add(new_record)
for k, v in org_modified.items():
new_record = FileOpsStat(k, s_timestamp, 'Modified', v)
self.edb_session.add(new_record)
向FileOpsStat中添加信息。可以看出返回条件1是因为已经进行过计数操作,对数据库进行了更新。
FileOpsStat:id,timestemp,op_type,number,org_id
以timestemp开始的一定时间内,org_id的用户进行op_type操作的操作次数为number
TotalStorageCounter
对每个用户在一定时间点的存储使用情况进行记录
q = self.seafdb_session.query(func.sum(RepoSize.size).label("size"),
OrgRepo.org_id).outerjoin(VirtualRepo,\
RepoSize.repo_id==VirtualRepo.repo_id).outerjoin(OrgRepo,\
RepoSize.repo_id==OrgRepo.repo_id).filter(\
VirtualRepo.repo_id == None).group_by(OrgRepo.org_id)
查询用户的数据库大小
for result in results:
org_id = result.org_id
org_size = result.size
if not org_id:
org_id = -1
q = self.edb_session.query(TotalStorageStat).filter(\
TotalStorageStat.org_id==org_id,
TotalStorageStat.timestamp==timestamp)
r = q.first()
if not r:
newrecord = TotalStorageStat(org_id, timestamp, org_size)
self.edb_session.add(newrecord)
进行更新
TotalStorageStat:id,timestamp,total_size,org_id
org_id的用户在timestamp的时刻,数据库大小为total_size
TrafficInfoCounter
更新用户流量和系统流量
for row in local_traffic_info[date_str]:
trans_count += 1
org_id = row[0]
user = row[1]
oper = row[2]
size = local_traffic_info[date_str][row]
if size == 0:
continue
if (org_id, oper) not in org_delta:
org_delta[(org_id, oper)] = size
else:
org_delta[(org_id, oper)] += size
try:
q = self.edb_session.query(UserTraffic.size).filter(
UserTraffic.timestamp==date,
UserTraffic.user==user,
UserTraffic.org_id==org_id,
UserTraffic.op_type==oper)
result = q.first()
if result:
size_in_db = result[0]
self.edb_session.query(UserTraffic).filter(UserTraffic.timestamp==date,
UserTraffic.user==user,
UserTraffic.org_id==org_id,
UserTraffic.op_type==oper).update(
{"size": size + size_in_db})
else:
new_record = UserTraffic(user, date, oper, size, org_id)
self.edb_session.add(new_record)
# commit every 100 items.
if trans_count >= 100:
self.edb_session.commit()
trans_count = 0
except Exception as e:
logging.warning('Failed to update traffic info: %s.', e)
return
for row in org_delta:
org_id = row[0]
oper = row[1]
size = org_delta[row]
try:
q = self.edb_session.query(SysTraffic.size).filter(
SysTraffic.timestamp==date,
SysTraffic.org_id==org_id,
SysTraffic.op_type==oper)
result = q.first()
if result:
size_in_db = result[0]
self.edb_session.query(SysTraffic).filter(SysTraffic.timestamp==date,
SysTraffic.org_id==org_id,
SysTraffic.op_type==oper).update(
{"size": size + size_in_db})
else:
new_record = SysTraffic(date, oper, size, org_id)
self.edb_session.add(new_record)
except Exception as e:
logging.warning('Failed to update traffic info: %s.', e)
不难得知MonthlyTrafficCounter是对月流量进行计数。UserActivityCounter是对用户活跃度进行统计。
总结:Statistics模块完成的功能是对包括用户的操作数量,数据库大小,流量和活跃度在一定时间内进行统计