import pandas as pd
path1 = './ml_work1/result/extract_json.csv'
path2 = './ml_work2/result/extract_json.csv'
df1 = pd.read_csv(path1)
print(df1)
df1_s = df1.sort_values(by=['link'], axis=0, kind='mergesort', ignore_index=True)
print(df1)
print('******************************************************************************************')
df2 = pd.read_csv(path2)
print(df2)
df2_s = df2.sort_values(by=['link'], axis=0, kind='mergesort', ignore_index=True)
print(df2)
j = 0
for i in range(len(df1)):
print(df1_s.loc[i]['link'])
print('*****************')
print(df2_s.loc[i]['link'])
if df1_s.loc[i]['link'] == df2_s.loc[i]['link']:
j += 1
print(j)
print(len(df1))
print(len(df2))
python内置的数据结构,或者是第三方数据结构numpy pandas,这些结构的基本操作要非常熟悉
第三方的数据几个是,基本数据结构的组合,字典+列表而已。
kind='mergesort’采用稳定的合并排序,保证每次的排序结果一致,当存在相同元素时。
ignore_index=True另外排序后的索引可以忽略,按新的索引,
def sort_values(
self,
axis=0,
ascending=True,
inplace: bool = False,
kind: str = "quicksort",
na_position: str = "last",
ignore_index: bool = False,
key: ValueKeyFunc = None,
):
"""
Sort by the values.
Sort a Series in ascending or descending order by some
criterion.
Parameters
----------
axis : {0 or 'index'}, default 0
Axis to direct sorting. The value 'index' is accepted for
compatibility with DataFrame.sort_values.
ascending : bool, default True
If True, sort values in ascending order, otherwise descending.
inplace : bool, default False
If True, perform operation in-place.
kind : {'quicksort', 'mergesort' or 'heapsort'}, default 'quicksort'
Choice of sorting algorithm. See also :func:`numpy.sort` for more
information. 'mergesort' is the only stable algorithm.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
key : callable, optional
If not None, apply the key function to the series values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect a
``Series`` and return an array-like.
.. versionadded:: 1.1.0
pandas排序的方法有很多,sort_values表示根据某一列排序
pd.sort_values("xxx",inplace=True)
表示pd按照xxx这个字段排序,inplace默认为False,如果该值为False,那么原来的pd顺序没变,只是返回的是排序的