python2编码问题解决
为什么还要打扰呢? (Why Should You Even Bother With Sets?)
Almost no one really liked them while studying math back in school as they seemed pretty useless, but if you are preparing for a Python coding screen or a whiteboard interview and you wish to nail it, you really have to learn about sets operators and methods right now. Sets are a weapon you need to have in your arsenal as common uses include removing duplicates, computing math operations on sets (like union, intersection, difference and symmetric difference) and membership testing.
几乎没有人在学校重新学习数学时真的很喜欢它们,因为它们似乎毫无用处,但是,如果您正准备使用Python编码屏幕或白板面试并且想钉牢它,那么您确实必须正确学习集合运算符和方法现在。 集合是您需要在武器库中使用的一种武器,常见用途包括删除重复项 ,对集合进行数学运算 (如并集,交集,差和对称差)和成员资格测试 。
“If you are preparing for a Python coding screen or whiteboard interview and you wish to nail it, you really have to learn about sets operators and methods right now”
“如果您正在准备使用Python编码屏幕或白板面试,并且想钉牢它,那么您现在必须真正了解集运算符和方法”
Sets are not exactly the first thing I learnt in my Python journey either. If you you are a hands-on person like I am, you probably started with some real projects meaning that initially, you almost certainly used advanced packages more than built-in modules. However, as I begun solving algorithms to interview with big tech companies, I realized that sets were recurrently appearing in many challenges with different levels of difficulty and often offered shortcuts to solve them.
集合也不是我在Python旅程中学到的第一件事。 如果您像我一样是一个动手的人,那么您可能会从一些真实的项目开始,这意味着一开始,您几乎肯定会比内置模块更多地使用高级软件包。 但是,当我开始求解算法以采访大型科技公司时,我意识到在各种挑战中反复出现集合具有不同的难度,并常常提供解决它们的捷径。
“I realized that sets were recurrently appearing in many challenges with different level of difficulty and often offered shortcuts to solve them”
“我意识到布景经常出现在许多挑战中,难度各不相同,并经常提供捷径来解决它们”
In one recent article, I presented and shared the solution for a number of Python algorithms that I have been challenged with in real interviews (check it out!):
在最近的一篇文章中, 我介绍了与我分享的一些Python算法的解决方案,这些算法在实际采访中都受到了挑战( 请查看! ):
In this post I will adopt a similar approach by mixing theory with algorithmic challenges. Let’s solve them together!
在这篇文章中,我将通过将理论与算法挑战相结合来采用类似的方法。 让我们一起解决它们!
1)删除重复项 (1) Removing Duplicates)
In Python a set can be created in two ways:
在Python中,可以通过两种方式创建集合:
Using the built-in
set()
function. For example, if a strings
is passed toset()
, thenset(s)
generates a list of characters in the string:使用内置的
set()
函数。 例如,如果将字符串s
传递给set()
,则set(s)
在字符串中生成一个字符列表:
set(‘algorithm’)
Output: {'a', 'g', 'h', 'i', 'l', 'm', 'o', 'r', 't'} #<- unordered
Using curly braces
{}
. With this method, even iterable objects are placed into the set intact:使用花括号
{}
。 使用此方法,即使可迭代的对象也完整地放置到集合中:
set1 = {‘algorithm’}
print(set1)
Output: {'algorithm'}set2 = {'algorithm', 'interview'}
print(set2)
Output: {'algorithm', 'interview'}
In more detail, the built-in set()
data type has the following characteristics:
更详细地讲,内置 set()
数据类型具有以下特征:
It is unordered, iterable and mutable;
它是无序的, 可迭代的和可变的 ;
Its elements can be objects of different types (integer, float, tuple, string etc.) but these elements have to be of an immutable type (lists and dictionaries are then excluded);
它的元素可以是不同类型的对象( 整数 , 浮点数 , 元组 , 字符串等),但是这些元素必须是不可变的类型(然后将列表和字典排除在外);
It can only include unique elements. This means that duplicate elements are automatically removed while using sets.
它只能包含唯一 元素。 这意味着使用集时会自动删除重复的元素。
The last characteristic is pivotal as it means we can use set()
to remove duplicates from (for instance) an array.
最后一个特性很关键,因为这意味着我们可以使用set()
从(例如)数组中删除重复项。
问题1:包含重复项→容易 (PROBLEM #1: Contains Duplicates → Easy)
Output:
True
This problem can be solved as a one-liner thanks to sets. In fact when an array is passed to the set()
function, all duplicates elements are removed meaning that the length of the set will be equal to the length of the original array, only if the array included distinct elements in the first place.
通过设置,可以单线解决此问题。 实际上,当将数组传递给set()
函数时,所有重复元素都将被删除,这意味着仅当数组首先包含不同元素时,集合的长度才等于原始数组的长度。
2)在集合上计算数学运算 (2) Computing Math Operations On Sets)
Because sets are unordered collections (they do not record element position or insertion order), many of the operations that can be performed on other Python data types (like indexing and slicing) are not supported by sets.
因为集合是无序集合 (它们不记录元素位置或插入顺序 ),所以集合不支持许多可以在其他Python数据类型上执行的操作(如indexing和slicing )。
Nevertheless, Python provides us with a large number of operators and methods to replicate the operations defined for mathematical sets. For example let’s suppose we had two sets ( s1 and s2
) both including strings and we wanted to union them. In this case we could perform the same operation with both the union operator ( |
) or the union method ( s1.union(s2)
):
尽管如此,Python为我们提供了大量运算符和方法来复制为数学集定义的运算。 例如,假设我们有两个集合( s1 and s2
)都包含字符串,并且我们想将它们合并。 在这种情况下,我们可以使用联合运算符( |
)或联合方法( s1.union(s2)
)执行相同的操作:
#Create 2 sets including strings:
s1 = {‘python’, ‘interview’, ‘practice’}
s2 = {‘algorithm’, ‘coding’, ‘theory’}#Using the operator:
print(s1|s2)# Using the method:
print(s1.union(s2))Output:
{'python', 'algorithm', 'interview', 'practice', 'coding', 'theory'}
{'python', 'algorithm', 'interview', 'practice', 'coding', 'theory'}
As you can see, it looks like set operators and methods behave identically and they can be used interchangeably, but there is a slight difference between them. When an operator is used, both operands must be sets, whereas a method will take any argument, convert it to set first and then perform the operation. Moreover, a method exists for each mathematical operation on sets , but the same is not true for operators.
正如您所看到的,看起来集运算符和方法的行为相同 ,可以互换使用 ,但是它们之间略有不同。 使用运算符时,必须同时设置两个操作数,而方法将采用任何参数,先将其转换为set,然后再执行运算。 而且,对于集合上的每个数学运算都存在一种方法,但对于运算符而言并非如此。
“When an operator is used, both operands must be sets, whereas a method will take any argument, convert it to set first, and then perform the operation”
“使用运算符时,必须同时设置两个操作数,而方法将采用任何参数,先将其转换为set,然后再执行运算”
Let’s discover some more common set operators and methods by solving the following three problems:
通过解决以下三个问题,让我们发现一些更常见的集合运算符和方法:
问题2:两个数组的交集→容易 (PROBLEM #2: Intersection Of Two Arrays → Easy)
Output:
[5, 7]
[5, 7]
As shown above, to return the elements common to both arrays, you can either transform both nums1
and nums2
to sets and then find their intersection with the &
operator, or just convert nums1
to set and use nums2
as an argument for the .intersection()
method (this will automatically convert nums2
from array to set before performing the intersection). The second method (solution_two) entails some more typing but it’s safer, faster and requires less memory and should then be preferred while solving algorithms in real interviews. A few times, in my solution, I opted for an operator and then I was asked by the interviewer to talk about the equivalent method and re-write the solution accordingly…
如上所示,要返回两个数组nums1
的元素,可以将nums1
和nums2
都转换为集合,然后找到与&
运算符的交集,也可以将nums1
转换为set并使用nums2
作为.intersection()
方法( 执行交集之前, 这会自动将 nums2
从数组 转换 为set )。 第二种方法( solution_two )需要更多类型的输入,但是它更安全,更快捷并且需要更少的内存,因此在实际采访中解决算法时应优先使用。 几次,在我的解决方案中,我选择了一名操作员,然后面试官要求我讨论等效的方法,并据此重新编写解决方案……
问题#3:键盘行中的单词→简单 (PROBLEM #3: Words From Keyboard Row → Easy)
Output:
['Type', 'Router', 'Dash', 'Top', 'Rower']
The solution to the problem above is both concise and efficient. In effect to check if each word in the words
array is made of the letters of a single row on the american keyboard, each word is initially converted to a set using the set()
function. As discussed earlier, when we do that, the word is split so that each letter becomes an element of the set. For instance, if we pass the word “Type” to the set()
function (and convert to lowercase) we obtain:
上述问题的解决方案既简洁又有效。 实际上,要检查words
数组中的每个单词是否由美式键盘上单行的字母组成,首先使用set()
函数将每个单词转换为一个集合。 如前所述,当我们这样做时,单词将被拆分,以使每个字母成为集合中的一个元素。 例如,如果我们将单词“ Type”传递给 set()
函数(并转换为小写字母)可以得到:
set1 = set(‘Type’.lower())
print(set1)
Output: {'t', 'e', 'p', 'y'}
Now we can take advantage of the .difference()
method to check if the difference between {'t','e','p','y'}
and the three sets of keyboard letters returns and empty set. If an empty set is returned ( if not
), it means that that word is made of letters belonging to a single specific row and then it should be appended to the result
.
现在,我们可以利用.difference()
方法来检查{'t','e','p','y'}
与三组键盘字母之间的差异是否返回且为空。 如果返回一个空集( if not
),则意味着该单词由属于单个特定行的字母组成,然后应将其附加到result
。
问题4:从2个句子中选出不常见的单词→简单 (PROBLEM #4: Uncommon Words From 2 Sentences → Easy)
#Output:
['enjoy', 'love'] #<-- note how "you" and "we" are also excluded
In actual interviews, you are often provided with two arrays of integers (or two sentences like in this case) and you are asked to find all the elements belonging to either the first or the second item group, but not both. This is the kind of problem that can be solved by using the .symmetric_difference()
method or the equivalent ^
operator.
在实际面试中,通常会为您提供两个整数数组(在这种情况下为两个句子),并且要求您查找属于第一个或第二个项目组的所有元素,但不能同时找到这两个元素。 这是可以通过使用.symmetric_difference()
方法或等效的^
运算符解决的问题。
However, if on one hand this algorithm makes your life easier by providing sentences that just include space separated / lowercase words, on the other, it requires for each word to also be unique, meaning that you will need to find a way to count words ( collections.Counter()
)and verify that they “appear exactly once” in each sentence.
但是,如果此算法一方面通过提供仅包含空格/小写单词的句子使您的生活更轻松,另一方面,它要求每个单词也必须唯一,这意味着您将需要找到一种计算单词数的方法( collections.Counter()
)并验证它们在每个句子中“ 恰好出现一次 ”。
3)执行会员资格测试 (3) Perform Membership Testing)
Oftentimes, to solve an algorithm, you are are required to determine whether or not two or more arrays have any elements in common or whether an array is a subset of the other. Fortunately, Python’s Sets offer a number of operators to perform membership testing, but the ones that I used the most in real interviews are:
通常,为了求解算法,需要确定两个或多个数组是否具有相同的任何元素,或者数组是否为另一个的子集 。 幸运的是,Python的Set提供了许多运算符来执行成员资格测试,但是我在实际采访中使用最多的是:
s1.isdisjoint(s2)
: this operation returnsTrue
ifs1
ands2
have no elements in common. There is no operator that corresponds to this method.s1.isdisjoint(s2)
:如果s1
和s2
没有共同的元素,则此操作返回True
。 没有与此方法对应的运算符。s1.issubset(s2)
ors1<=s2
: this operation returnsTrue
ifs1
is a subset ofs2
. Note that because a sets1
is considered to be a set of itself, to check that a set is a proper subset of another, it is better to uses1<s2
. In this way it can be verified that every element ofs1
is ins2
, but the two sets are not equal.s1.issubset(s2)
或s1<=s2
:如果s1
是s2
的子集,则此操作返回True
。 请注意,由于将集合s1
视为自身的集合,因此要检查集合是否是另一个集合的正确子集,最好使用s1<s2
。 通过这种方式可以验证s1
每个元素都在s2
,但是两组不相等。
In the last problem of this article, you will be able to apply one of the methods described above. Bear in mind that the following algorithm is a bit more challenging than the other four (level : medium), therefore take your time to find the proper strategy to solve it.
在本文的最后一个问题中,您将能够应用上述方法之一。 请记住,以下算法比其他四个算法(级别: 中)更具挑战性,因此请花一些时间来寻找解决问题的合适策略。
问题5:最喜欢的公司列表→中 (PROBLEM #5: The Favorite Companies List → Medium)
Output:
[0, 1, 4]
In the solution above, enumerate()
is used twice: the first loop returns each sub-list included in favcomp
, whereas the second loop checks whether the list obtained with the first loop is a subset of any other list in favcomp
. When the set method returns True
, the flag
(initially set to True
) switches to False
, so that eventually only the indexes matching with lists where the flag
kept a True
value are assigned to result
. This is a BF solution, so feel free to find a more efficient approach and share it in the comments.
在上面的解决方案中, enumerate()
被使用了两次:第一个循环返回favcomp
中包含的每个子列表,而第二个循环检查通过第一个循环获得的列表是否是favcomp
中任何其他列表的子集。 当set方法返回True
,该flag
(最初设置为True
)将切换为False
,以便最终仅将与该flag
保留True
值的列表匹配的索引分配给result
。 这是一个BF解决方案,因此随时可以找到更有效的方法并在评论中分享。
结论 (Conclusion)
In this article I presented 5 Python coding challenges and solved them using sets operators and methods. Set methods are very powerful tools you can employ when an algorithm requires to remove duplicates, perform common mathematical operations or verify membership. Rest assured that as you keep practicing for coding rounds, recurring patterns will become more evident and you will learn to apply sets operations on the fly!
在本文中,我提出了5种Python编码难题,并使用集合运算符和方法解决了这些难题。 设置方法是一种非常强大的工具,可以在算法需要删除重复项,执行常见的数学运算或验证成员资格时使用。 请放心,随着您继续练习编码回合,重复模式将变得更加明显,并且您将学会动态应用集操作!
Note that the exercises presented in this post (together with their solutions) are slight reinterpretations of problems available on Leetcode. I am far from being an expert in the field therefore the solutions I presented are just indicative ones.
请注意,本文中提供的练习(及其解决方案)只是对Leetcode上存在的问题的重新解释。 我远不是该领域的专家,因此我提供的解决方案仅是指示性的解决方案。
Leave a comment if you enjoyed the post. I wish you good luck with your interview preparation!
如果您喜欢这篇文章,请发表评论。 祝您面试准备好!
python2编码问题解决