excel重复数据编号_计算Excel中的重复编号集

excel重复数据编号

Recently, someone asked me how to count duplicate number sets. There were hundreds of rows, with 6 numbers in each row. Did any of the number sets appear in the list more than once? How would you solve this problem, and count duplicate number sets?

最近,有人问我如何计算重复的数字集。 有数百行,每行有6个数字。 是否有任何数字集多次出现在列表中? 您将如何解决此问题,并计算重复的数字集?

计算重复编号集 (Count Duplicate Number Sets)

To find a solution, I set up a small table with random numbers. There are 11 rows of numbers, and 6 columns.  The 7th column, Dups, is where I tested my formulas.

为了找到解决方案,我建立了一个带有随机数的小表。 有11行数字和6列。 我在第七列Dups中测试了公式。

Formula to Count Duplicate Number Sets

有多少个匹配项? (How Many Matches?)

In each row, the goal is to find out how many times that set of 6 numbers appeared in the table.

在每一行中,目标是找出表中这组6个数字出现了多少次。

It doesn't matter if the numbers are in a different order. For example -- 1,2,3,4,5,6 is be a duplicate of 3,4,5,1,2,3 -- all 6 numbers are the same.

数字是否以不同顺序无关紧要。 例如-1,2,3,4,5,6是3,4,5,1,2,3的重复-所有6个数字都是相同的。

How would you calculate the number of matches?

您将如何计算比赛次数?

获取样本数据 (Get the Sample Data)

There are solutions below, but before you look at them, try to find your own solution.

下面有解决方案,但是在查看它们之前,请尝试找到自己的解决方案。

Download the sample file (#4) from the Excel Count Functions page on my Contextures site.

Contextures网站上的Excel Count Functions页面下载示例文件(#4)。

The sample data is in a table on the Challenge sheet.

示例数据在“挑战”表上的表格中。

解决方案1:很复杂 (Solution 1: It's Complicated)

I'll confess to doing a Google search when this question arrived in my email. There were suggestions for macros and User-Defined Functions, but I didn't try those.

当这个问题出现在我的电子邮件中时,我将承认要进行Google搜索。 对于宏和用户定义函数有一些建议,但是我没有尝试过。

Finally, I found an array-entered formula on StackOverflow, which pointed me in the right direction. Here's the formula, which was for a list with 4 columns and 1000 rows:

最后,我在StackOverflow上找到了一个输入数组的公式 ,该公式为我指明了正确的方向。 以下是公式,该公式适用于具有4列1000行的列表:

=IF(SUM(IF(MMULT({1,1,1,1}, TRANSPOSE(COUNTIF($A1:$D1, $A$1:$D$1000)))=4,1))>1, "duplicate", "unique")

= IF(SUM(IF(MMULT({1,1,1,1},TRANSPOSE(COUNTIF($ A1:$ D1,$ A $ 1:$ D $ 1000)))= 4,1))> 1,“重复“, “独特”)

That certainly looked complicated!

那看起来确实很复杂!

调整公式 (Adjusting the Formula)

I needed to adjust the formula, so it would work with 6 columns, instead of 4.

我需要调整公式,因此它将适用于6列而不是4列。

  • MMULT({1,1,1,1,1,1}

    MMULT({1,1,1,1,1,1}

  • COUNTIF(B2:G2

    COUNTIF(B2:G2

My list has 11 rows, instead of 1000. (My data is in a named Excel table, so the range will adjust automatically if the table size changes.)

我的列表有11行,而不是1000行。(我的数据在一个命名的Excel表中,因此如果表大小更改,范围将自动调整。)

  • $B$2:$G$12

    $ B $ 2:$ G $ 12

And I wanted the number of duplicates, not text, so the first IF was removed. Here's my revised formula (array entered, with Ctrl+Shift+Enter):

我想要重复的数量,而不是文本,因此删除了第一个IF。 这是我修改过的公式(使用Ctrl + Shift + Enter输入数组):

=SUM(IF(MMULT({1,1,1,1,1,1},TRANSPOSE(COUNTIF(B2:G2,$B$2:$G$12)))=6,1))

= SUM(IF(MMULT({1,1,1,1,1,1},TRANSPOSE(COUNTIF(B2:G2,$ B $ 2:$ G $ 12)))= 6,1)))

这个怎么运作 (How It Works)

Here's a brief explanation of how the works. There are more notes in the sample file, on the Solution01 sheet.

这是工作原理的简要说明。 示例文件的Solution01工作表中还有更多说明。

  1. COUNTIF checks each table row, to see if each number has a match in the formula's row. The result is an array with 11 rows and 6 columns

    COUNTIF检查每个表行,以查看每个数字在公式行中是否匹配。 结果是一个包含11行 6列的数组

    1. Array1 is {1,1,1,1,1,1}

      Array1为{1,1,1,1,1,1}
    2. For Array2, TRANSPOSE switches the rows and columns in the COUNTIF array, to create an array with 6 rows and 11 columns.

      对于Array2, TRANSPOSE切换COUNTIF数组中的行和列,以创建具有6行11列的数组。

    MMULT will return an array with the number of matches in each row. We need that array to have 1 row, and 11 columns (1 column for each set of numbers)

    MMULT将返回一个数组,其中包含每行中匹配的数目。 我们需要该数组具有1行和11列 (每组数字1列)

  2. IF returns a 1, if the number of matches is 6 (all others = FALSE)

    如果匹配数为6,则IF返回1(所有其他值均为FALSE)

  3. SUM adds up all the 1s to give the number of duplicate sets

    SUM将所有1加起来,得出重复集的数量

工作表结果 (Worksheet Results)

The screen shot below shows what the COUNTIF calculations would look like for Row 4, with the rows and columns transposed.

下面的屏幕快照显示了第4行的COUNTIF计算结果,行和列进行了转置。

The MMULT result is 1 row, with 11 columns, and the final SUM is 2, because rows 4 and 6 have the same set of numbers.

MMULT结果为1行,包含11列,最终的SUM为2,因为第4行和第6行具有相同的数字集。

Worksheet Formula to Count Duplicate Number Sets

计算缓慢 (Slow Calculations)

This formula isn't too bad in a small set of numbers, but could take a long time in a larger dataset.

这个公式在少量数字中并不太坏,但是在较大的数据集中可能需要很长时间。

In the sample file, there's a sheet named LargeSet, which has 1500 number sets, if you want to test that.

在示例文件中,有一个名为LargeSet的工作表,如果要测试的话,它具有1500个数字集。

解决方案2:电源查询 (Solution 2: Power Query)

The second solution in the sample file is from Jonathan Cooper, who used Power Query (Get & Transform).

样本文件中的第二个解决方案来自Jonathan Cooper,他使用了Power Query(Get&Transform)。

  • Full List has all the rows, an index column, and a column with sorted numbers

    “完整列表”具有所有行,一个索引列和一个具有排序编号的列

  • Unique List only has the Index column and sorted numbers, with duplicates removed

    唯一列表仅具有“索引”列和排序编号,已删除重复项

  • Merge List combines the Full and Unique lists, and calculates the count for each unique number set

    合并列表合并完整列表和唯一列表,并计算每个唯一编号集的计数

解决方案3:小和TEXTJOIN (Solution 3: SMALL and TEXTJOIN)

Jonathan's solution had a comma-separated list of sorted numbers in one step. That led me to a different Excel formula solution, using the SMALL and TEXTJOIN functions.

乔纳森(Jonathan)的解决方案只需一步就可以用逗号分隔列表。 这导致我使用SMALL和TEXTJOIN函数找到了一个不同的Excel公式解决方案。

To show how it works, the SMALL results are done in 6 separate columns. Here's the formula in cell I4:

为了显示其工作原理,将在6个单独的列中完成小结果。 这是单元格I4中的公式:

=SMALL($B4:$G4,I$3)

=小($ B4:$ G4,I $ 3)

TEXTJOIN combines the results. Here's the formula in cell O4:

TEXTJOIN合并结果。 这是单元格O4中的公式:

=TEXTJOIN("|",,I4:N4)

= TEXTJOIN(“ |” ,, I4:N4)

And COUNTIF returns the number of instances of the comma-separated list from column O:

COUNTIF返回O列中逗号分隔列表的实例数:

=COUNTIF([Sorted],[@Sorted])

= COUNTIF([已排序],[@已排序])

多合一排序列表 (All-In-One Sorted List)

Instead of using helper columns for the SMALL function, you can use an array-entered formula to get the same results.

您可以使用输入数组的公式来获取相同的结果,而不是将辅助列用于SMALL函数。

Put this formula in cell O4 (array-entered):

将此公式放在单元格O4中(输入数组):

=TEXTJOIN("|",, SMALL(B4:G4, ROW($1:$6)))

= TEXTJOIN(“ |” ,, SMALL(B4:G4,ROW($ 1:$ 6)))

Then, delete the columns with the SMALL formulas.

然后,删除带有SMALL公式的列。

计算重复编号集样本文件 (Count Duplicate Number Sets Sample File)

To find your own solution, and to see the other solutions, download the sample file (#4) from the Excel Count Functions page on my Contextures site.

要找到自己的解决方案,并查看其他解决方案,请从我的Contextures网站上的Excel Count Functions页面下载示例文件(#4)。

The sample data is in a table on the Challenge sheet, and there is a larger dataset on the sheet name LargeSet.

样本数据位于“挑战”表上的表格中,并且表名称“ LargeSet”上有一个较大的数据集。

NOTE: The workbook does not contain macros, but you will see an alert about a data connection, because of the Power Query solution.

注意 :该工作簿不包含宏,但是由于Power Query解决方案,您将看到有关数据连接的警报。

翻译自: https://contexturesblog.com/archives/2019/12/05/count-duplicate-number-sets/

excel重复数据编号

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值