2021SC@SDUSC 山大智云 10.statistics

statistics模块维护一些关于seafile的统计功能。

目录结构

models.py
handler.py
counter.py
db.py
model.py:数据库模型声明
TotalStorageStat:
   id,timestamp,total_size,org_id
FileOpsStat:
   id,timestemp,op_type,number,org_id
UserTrafficStat:
   email,month,block_download,file_view,
   file_download,dir_download
UserAcivityStat:
   id,name_time_md5,timestamp,org_id
UserTraffic:
   id,user,org_id,timestamp,op_type,size 
SysTraffic:
   id,org_id,timestamp,op_type,size
MonthlyUserTraffic:
   id,user,org_id,timestamp,
   web_file_upload,web_file_download,
   sync_file_upload,sync_file_download,
   link_file_upload,link_file_download
MonthlySysTraffic:
   id,org_id,timestamp,
   web_file_updoad,web_file_download,
   sync_file_upload,sync_file_download,
   link_file_upload,link_file_download
db.py:对数据库的操作

文件结构

get_org_id
get_user_activity_stats_by_day
get_org_user_activity_stats_by_day
_get_total_storage_stats
get_total_storage_stats_by_day
get_org_storage_stats_by_day
get_user_traffic_by_day
get_org_traffic_by_day
get_system_traffic_by_day
get_all_user_traffic_by_month
get_all_orgs_traffic_by_month
get_user_traffic_by_month

根据函数名可以大概了解响应数据库的操作,不做过多分析

handlers.py:事件处理类

目录结构

UserLoginEventHandler
FileStatsEventHandler
register_handlers

UserLoginEventHandler(session,msg)

elements = msg['content'].split('\t')
if len(elements) != 4:
    logging.warning("got bad message: %s", elements)
    return
username = elements[1]
timestamp = elements[2]
org_id = elements[3]
_timestamp = datetime.strptime(timestamp, '%Y-%m-%d %H:%M:%S')

从msg中获取相应信息,并调用counter中的update_hash_record方法

update_hash_record(session, username, _timestamp, org_id)

FileStatsEventHandler(session,msg)

和上一个方法类似,先从msg中获取信息,然后调用counter中的相应方法

elements = msg['content'].split('\t')
if len(elements) != 4:
    logging.warning("got bad message: %s", elements)
    return

timestamp = datetime.utcfromtimestamp(msg['ctime'])
oper = elements[0]
user_name = elements[1]
repo_id = elements[2]
size = int(elements[3])

save_traffic_info(session, timestamp, user_name, repo_id, oper, size)

register_handlers(handlers)

handlers.add_handler('seahub.stats:user-login', UserLoginEventHandler)
handlers.add_handler('seaf_server.stats:web-file-upload', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:web-file-download', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:link-file-upload', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:link-file-download', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:sync-file-upload', FileStatsEventHandler)
handlers.add_handler('seaf_server.stats:sync-file-download', FileStatsEventHandler)

handlers为mq_handler文件中的MessageHandler对象

def add_handler(self, msg_type, func):
    if msg_type in self._handlers:
        funcs = self._handlers[msg_type]
    else:
        funcs = []
        self._handlers[msg_type] = funcs

    if func not in funcs:
        funcs.append(func)

register_handlers的作用相当于注册了一系列功能

counter.py:

文件结构

update_hash_record
save_traffic_info
FileOpsCounter
TotalStorageCounter
TrafficInfoCounter
MonthlyTrafficCounter
UserActivityCounter

可以看出所有Counter类都是对数据库操作进行调用的工具类。

def update_hash_record(session, login_name, login_time, org_id)

向login_records字典中添加信息。

    if not appconfig.enable_statistics:
        return

首先无配置直接返回

time_str = login_time.strftime('%Y-%m-%d 00:00:00')
time_by_day = datetime.strptime(time_str, '%Y-%m-%d %H:%M:%S')
md5_key = hashlib.md5((login_name + time_str).encode('utf-8')).hexdigest()

对信息进行操作并生成md5

login_records[md5_key] = (login_name, time_by_day, org_id)

添加入字典

def save_traffic_info(session, timestamp, user_name, repo_id, oper, size)

和上一个结构类似

if not appconfig.enable_statistics:
    return
org_id = get_org_id(repo_id)
time_str = timestamp.strftime('%Y-%m-%d')
if time_str not in traffic_info:
    traffic_info[time_str] = {}
if (org_id, user_name, oper) not in traffic_info[time_str]:
    traffic_info[time_str][(org_id, user_name, oper)] = size
else:
    traffic_info[time_str][(org_id, user_name, oper)] += size

处理的时候对二维数组多了一些判定,记录同一天的用户各种操作类型分别的操作次数。

通过分析对数据库进行操作的类的代码,可以加深我们对数据库字段作用的了解

FileOpsCounter

def start_count():开始对文件操作进行计数。

补充关于时间的操作

datetime.utcnow:获取系统的世界标准时间
dt = datetime.utcnow()
delta = timedelta(hours=1)
_start = (dt - delta)
的作用是获取dt的前一个小时
start = _start.strftime('%Y-%m-%d %H:00:00')
end = _start.strftime('%Y-%m-%d %H:59:59')
处理开始时间和结束时间
s_timestamp = datetime.strptime(start, '%Y-%m-%d %H:%M:%S')
e_timestamp = datetime.strptime(end, '%Y-%m-%d %H:%M:%S')
转换为标准形式
q = self.edb_session.query(FileOpsStat.timestamp).filter(
                           FileOpsStat.timestamp==s_timestamp)
if q.first():
    self.edb_session.close()
    return

返回条件1

q = self.edb_session.query(FileUpdate.org_id, FileUpdate.timestamp, FileUpdate.file_oper).filter(
                           FileUpdate.timestamp.between(
                           s_timestamp, e_timestamp))

找到FileUpdate中的所有在时间范围内的数据(FileUpdate数据表定义在events.models)

rows = q.all()
for row in rows:
    org_id = row.org_id
    if 'Added' in row.file_oper:
        total_added += 1
        if org_id not in org_added:
            org_added[org_id] = 1
        else:
            org_added[org_id] += 1
    elif 'Deleted' in row.file_oper or 'Removed' in row.file_oper:
        total_deleted += 1
        if org_id not in org_deleted:
            org_deleted[org_id] = 1
        else:
            org_deleted[org_id] += 1
    elif 'Modified' in row.file_oper:
        total_modified += 1
        if org_id not in org_modified:
            org_modified[org_id] = 1
        else:
            org_modified[org_id] += 1

根据操作类型进行计数

q = self.edb_session.query(FileAudit.org_id, func.count(FileAudit.eid)).filter(
                           FileAudit.timestamp.between(
                           s_timestamp, e_timestamp)).group_by(FileAudit.org_id)
rows = q.all()
for row in rows:
    org_id = row[0]
    total_visited += row[1]
    org_visited[org_id] = row[1]

从FileAudit中选择“Visited”信息

for k, v in org_added.items():
    new_record = FileOpsStat(k, s_timestamp, 'Added', v)
    self.edb_session.add(new_record)

for k, v in org_deleted.items():
    new_record = FileOpsStat(k, s_timestamp, 'Deleted', v)
    self.edb_session.add(new_record)

for k, v in org_visited.items():
    new_record = FileOpsStat(k, s_timestamp, 'Visited', v)
    self.edb_session.add(new_record)

for k, v in org_modified.items():
    new_record = FileOpsStat(k, s_timestamp, 'Modified', v)
    self.edb_session.add(new_record)

向FileOpsStat中添加信息。可以看出返回条件1是因为已经进行过计数操作,对数据库进行了更新。

FileOpsStat:id,timestemp,op_type,number,org_id
以timestemp开始的一定时间内,org_id的用户进行op_type操作的操作次数为number

TotalStorageCounter

对每个用户在一定时间点的存储使用情况进行记录

q = self.seafdb_session.query(func.sum(RepoSize.size).label("size"),
                              OrgRepo.org_id).outerjoin(VirtualRepo,\
                              RepoSize.repo_id==VirtualRepo.repo_id).outerjoin(OrgRepo,\
                              RepoSize.repo_id==OrgRepo.repo_id).filter(\
                              VirtualRepo.repo_id == None).group_by(OrgRepo.org_id)

查询用户的数据库大小

for result in results:
    org_id = result.org_id
    org_size = result.size
    if not org_id:
        org_id = -1

    q = self.edb_session.query(TotalStorageStat).filter(\
                               TotalStorageStat.org_id==org_id,
                               TotalStorageStat.timestamp==timestamp)
    r = q.first()
    if not r:
        newrecord = TotalStorageStat(org_id, timestamp, org_size)
        self.edb_session.add(newrecord)

进行更新

TotalStorageStat:id,timestamp,total_size,org_id
org_id的用户在timestamp的时刻,数据库大小为total_size

TrafficInfoCounter

更新用户流量和系统流量

for row in local_traffic_info[date_str]:
    trans_count += 1
    org_id = row[0]
    user = row[1]
    oper = row[2]
    size = local_traffic_info[date_str][row]
    if size == 0:
        continue
    if (org_id, oper) not in org_delta:
        org_delta[(org_id, oper)] = size
    else:
        org_delta[(org_id, oper)] += size

    try:
        q = self.edb_session.query(UserTraffic.size).filter(
                                   UserTraffic.timestamp==date,
                                   UserTraffic.user==user,
                                   UserTraffic.org_id==org_id,
                                   UserTraffic.op_type==oper)
        result = q.first()
        if result:
            size_in_db = result[0]
            self.edb_session.query(UserTraffic).filter(UserTraffic.timestamp==date,
                                                       UserTraffic.user==user,
                                                       UserTraffic.org_id==org_id,
                                                       UserTraffic.op_type==oper).update(
                                                       {"size": size + size_in_db})
        else:
            new_record = UserTraffic(user, date, oper, size, org_id)
            self.edb_session.add(new_record)

        # commit every 100 items.
        if trans_count >= 100:
            self.edb_session.commit()
            trans_count = 0
    except Exception as e:
        logging.warning('Failed to update traffic info: %s.', e)
        return
for row in org_delta:
    org_id = row[0]
    oper = row[1]
    size = org_delta[row]
    try:
        q = self.edb_session.query(SysTraffic.size).filter(
                                   SysTraffic.timestamp==date,
                                   SysTraffic.org_id==org_id,
                                   SysTraffic.op_type==oper)
        result = q.first()
        if result:
            size_in_db = result[0]
            self.edb_session.query(SysTraffic).filter(SysTraffic.timestamp==date,
                                                      SysTraffic.org_id==org_id,
                                                      SysTraffic.op_type==oper).update(
                                                      {"size": size + size_in_db})
        else:
            new_record = SysTraffic(date, oper, size, org_id)
            self.edb_session.add(new_record)

    except Exception as e:
        logging.warning('Failed to update traffic info: %s.', e)

不难得知MonthlyTrafficCounter是对月流量进行计数。UserActivityCounter是对用户活跃度进行统计。

总结:Statistics模块完成的功能是对包括用户的操作数量,数据库大小,流量和活跃度在一定时间内进行统计

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值