2，Python数分之Pandas训练，力扣，1747. 应该被禁止的 Leetflex 账户

凡梦_leo

于 2024-09-20 21:47:37 发布

阅读量950

点赞数 26

分类专栏：数分之Pandas实战训练文章标签： linux 运维服务器数据库 python leetcode pandas

本文链接：https://blog.csdn.net/qq_55006020/article/details/142405516

版权

数分之Pandas实战训练专栏收录该内容

32 篇文章 0 订阅

订阅专栏

学习：知识的初次邂逅
复习：知识的温故知新
练习：知识的实践应用

一，原题力扣链接

. - 力扣（LeetCode）

二，题干

表: LogInfo

+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| account_id  | int      |
| ip_address  | int      |
| login       | datetime |
| logout      | datetime |
+-------------+----------+
该表可能包含重复项。
该表包含有关Leetflex帐户的登录和注销日期的信息。 它还包含了该账户用于登录和注销的网络地址的信息。
题目确保每一个注销时间都在登录时间之后。

编写解决方案，查找那些应该被禁止的Leetflex帐户编号 account_id 。如果某个帐户在某一时刻从两个不同的网络地址登录了，则这个帐户应该被禁止。

可以以 任何顺序 返回结果。

查询结果格式如下例所示。

示例 1:

输入：
LogInfo table:
+------------+------------+---------------------+---------------------+
| account_id | ip_address | login               | logout              |
+------------+------------+---------------------+---------------------+
| 1          | 1          | 2021-02-01 09:00:00 | 2021-02-01 09:30:00 |
| 1          | 2          | 2021-02-01 08:00:00 | 2021-02-01 11:30:00 |
| 2          | 6          | 2021-02-01 20:30:00 | 2021-02-01 22:00:00 |
| 2          | 7          | 2021-02-02 20:30:00 | 2021-02-02 22:00:00 |
| 3          | 9          | 2021-02-01 16:00:00 | 2021-02-01 16:59:59 |
| 3          | 13         | 2021-02-01 17:00:00 | 2021-02-01 17:59:59 |
| 4          | 10         | 2021-02-01 16:00:00 | 2021-02-01 17:00:00 |
| 4          | 11         | 2021-02-01 17:00:00 | 2021-02-01 17:59:59 |
+------------+------------+---------------------+---------------------+
输出：
+------------+
| account_id |
+------------+
| 1          |
| 4          |
+------------+
解释：
Account ID 1 --> 该账户从 "2021-02-01 09:00:00" 到 "2021-02-01 09:30:00" 在两个不同的网络地址(1 and 2)上激活了。它应该被禁止.
Account ID 2 --> 该账户在两个不同的网络地址 (6, 7) 激活了，但在不同的时间上.
Account ID 3 --> 该账户在两个不同的网络地址 (9, 13) 激活了，虽然是同一天，但时间上没有交集.
Account ID 4 --> 该账户从 "2021-02-01 17:00:00" 到 "2021-02-01 17:00:00" 在两个不同的网络地址 (10 and 11)上激活了。它应该被禁止.

三，建表语句

import pandas as pd

data = [[1, 1, '2021-02-01 09:00:00', '2021-02-01 09:30:00'], [1, 2, '2021-02-01 08:00:00', '2021-02-01 11:30:00'], [2, 6, '2021-02-01 20:30:00', '2021-02-01 22:00:00'], [2, 7, '2021-02-02 20:30:00', '2021-02-02 22:00:00'], [3, 9, '2021-02-01 16:00:00', '2021-02-01 16:59:59'], [3, 13, '2021-02-01 17:00:00', '2021-02-01 17:59:59'], [4, 10, '2021-02-01 16:00:00', '2021-02-01 17:00:00'], [4, 11, '2021-02-01 17:00:00', '2021-02-01 17:59:59']]
log_info = pd.DataFrame(data, columns=['account_id', 'ip_address', 'login', 'logout']).astype({'account_id':'Int64', 'ip_address':'Int64', 'login':'datetime64[ns]', 'logout':'datetime64[ns]'})

四，分析

题解：

表：登录信息表

字段：用户id，ip地址，登录时间，登出时间

求，同一时间，在不同id登录的同一个id

第一步：内连接这个表

df1 = pd.merge(log_info,log_info,how='inner',on='account_id')  #拼接2个表  采用内连接的方式

第二步：过滤掉正常同一个ip登录的用户

df2 = df1[df1['ip_address_x'] != df1['ip_address_y']]  #过滤掉本地登录的 留下异地登录的

第三步：过滤掉a这边登录时间早于 b那边下线时间的人

    df3 = df2[df2['login_x']<=df2['logout_y']]  #过滤掉 我还没下线你怎么另外一台机器就登录了的人
    df4 =df3[df3['login_y']<=df3['logout_x']]  #继续过滤这边 我还没下线，你怎么就登录了呀

最后一步去重并且把series转为data_frame对象

    df5 = df4['account_id'].drop_duplicates()
    df6 = df5.to_frame()

五，Pandas解答

import pandas as pd

def leetflex_banned_accnts(log_info: pd.DataFrame) -> pd.DataFrame:
    df1 = pd.merge(log_info,log_info,how='inner',on='account_id')  #拼接2个表  采用内连接的方式
    df2 = df1[df1['ip_address_x'] != df1['ip_address_y']]  #过滤掉本地登录的 留下异地登录的
    df3 = df2[df2['login_x']<=df2['logout_y']]  #过滤掉 我还没下线你怎么另外一台机器就登录了的人
    df4 =df3[df3['login_y']<=df3['logout_x']]  #继续过滤这边 我还没下线，你怎么就登录了呀
    df5 = df4['account_id'].drop_duplicates()
    df6 = df5.to_frame()
    return df6

六，验证

七，知识点总结

pandas拼接两个表的运用 API： merge 参数：how 和on
pandas条件过滤的运用 API：用切片的方式
pandas去重的运用 API： drop_duplicates
pandas中series对象转为dataframe对象的运用 API:to_frame
python函数的运用