微软大数据_我对Microsoft的数据科学采访

最新推荐文章于 2024-03-19 09:38:29 发布

weixin_26746401

最新推荐文章于 2024-03-19 09:38:29 发布

阅读量523

点赞数

文章标签： python java

原文链接：https://towardsdatascience.com/my-data-science-interview-with-microsoft-6b7ec840b80e

版权

微软大数据

Microsoft was one of the software companies that come to hire interns at my university for 2021 summers. This year, it was the first time that Microsoft offered any Data Science Internship for pre-final year undergraduate students.

微软是到2021年夏天来我大学招聘实习生的软件公司之一。今年，这是微软首次为预科本科生提供任何数据科学实习。

Microsoft set the requirements as follows:-

Microsoft将要求设置如下：

The student must have a minimum CGPA of 8.
学生的最低CGPA必须为8。
The student should be pursuing a Computer Science or Mathematics major.
该学生应攻读计算机科学或数学专业。

All the eligible students had to fill the Internship application form on the Microsoft Career website with a resume. Students who had filled the application form received the test link within 1–2 days.

所有符合条件的学生都必须用简历填写Microsoft Career网站上的实习申请表。填写申请表的学生将在1-2天内收到测试链接。

在线测试： (Online Test:)

About 60–70 students give the test for the internship, conducted on the mettl platform. The duration of the test was 1 hour. The test consists of 62 Multiple Choice Questions, which touches almost every aspect of machine learning. There was no information about the marking scheme for the test.

在mettl平台上进行的实习测试大约有60-70名学生。测试时间为1小时。该测验包含62个多项选择题，几乎涵盖了机器学习的各个方面。没有有关测试标记方案的信息。

The key points takeaways from the online test were:

在线测试的要点是：

Questions ranged from various topics such as Linear Regression, Logistic Regression, SVM, Decision Trees, Random forests, Underfitting Overfitting, Bias, Variance, Bagging, Boosting, Clustering, Recommender Systems, PCA, LDA, and Neural Networks. There were some basic questions from Probability and Statistics.
问题涉及多个主题，例如线性回归，逻辑回归，SVM，决策树，随机森林，拟合不足的过拟合，偏差，方差，装袋，增强，聚类，推荐系统，PCA，LDA和神经网络。概率论和统计学有一些基本问题。
Most of the questions were conceptual, such as about the kernel function in the SVM or the central limit theorem.
大多数问题都是概念性的，例如关于SVM中的内核功能或中央极限定理。
There were fewer questions on Neural Networks, so the students were expected to be well-versed with traditional Machine Learning algorithms.
神经网络上的问题较少，因此希望学生们精通传统的机器学习算法。
There were no coding questions or questions like what is the correct code for this algorithm using sklearn etc.
没有编码问题或诸如使用sklearn等对该算法的正确代码是什么的问题。

I was able to complete about 50 out of 62 questions in the 1 hour time.

我在1小时的时间内完成了62个问题中的50个。

Since I didn’t know much about Recommender Systems and LDA algorithms, so I wasn’t able to answer those questions in addition to questions on Convex optimization(about 2–3 in number).

由于我对Recommender系统和LDA算法了解不多，所以除了关于凸优化的问题(数量约为2-3)之外，我无法回答这些问题。

Microsoft didn’t release the exact results for the test but released a list of 6 students shortlisted for the interviews, including me!

微软没有公布测试的确切结果，但公布了入围面试的6名学生的名单，其中包括我！

I had about a day to prepare for the interview and had no idea about a Data Science Interview. I took some help from seniors and revised the concepts asked during the online test(mostly traditional machine learning algorithms) from Stanford CS229 notes. In addition to this, I also reviewed everything about the projects on my resume.

我有大约一天的时间为面试做准备，但对数据科学面试一无所知。我从前辈那里获得了一些帮助，并修改了斯坦福CS229笔记在在线测试(大多数是传统的机器学习算法)中提出的概念。除此之外，我还在简历中回顾了有关项目的所有内容。

Interviews were taken online on the Microsoft Teams platform due to COVID-19, and there was a total of 3 rounds of technical interviews for each candidate.

由于COVID-19，面试是在Microsoft Teams平台上进行的，每位候选人总共进行了3轮技术面试。

第1轮： (Round 1:)

At first, the interviewer asked me to introduce myself and speak about my interests in which I talked about my interests in computer vision.

最初，面试官让我自我介绍并谈论自己的兴趣，其中我谈到了我对计算机视觉的兴趣。

I was asked the following questions:-

我被问到以下问题：

Explain the working of a convolutional layer and design a CNN for Image Classification? Explain the loss function, regularization, and activation function used for it?
解释卷积层的工作并设计用于图像分类的CNN？请解释用于它的损失函数，正则化和激活函数吗？
Explain the Decision Tree algorithm? Also, explain the bagging and boosting algorithm with Decision Trees? Explain the weighting function used in the boosting algorithm?
解释决策树算法？另外，用决策树解释装袋和提升算法吗？解释提升算法中使用的加权函数？
Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation.
设计垃圾邮件分类系统？另外，说明用于评估的特征提取，算法和度量。
Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the convex optimizations, kernel functions, and what is support vectors.
解释支持向量机(SVM)的深入工作？另外，请解释凸优化，核函数以及什么是支持向量。

I was able to answer all the questions except for the working of SVMs, in which I was able to explain up to margins and kernel functions but as not able to explain the convex optimization part. I explained the answers by illustrating the algorithms on a shared screen.

除了支持SVM的工作之外，我能够回答所有问题，在SVM中，我最多可以解释边距和内核函数，但不能解释凸优化部分。我通过在共享屏幕上显示算法来解释答案。

He then asked me if I have any questions. I then asked about some data science use cases in Microsoft. And the interview was over. The entire interview took about 45 minutes.

然后他问我是否有任何问题。然后，我询问了Microsoft中的一些数据科学用例。采访结束了。整个采访耗时约45分钟。

Three students made it to the second round, which took place after a couple of hours.

两个小时后，三名学生进入了第二轮比赛。

I revised SVM during the time between the 1st and 2nd rounds.

我在第一轮和第二轮之间修改了SVM。

第二回合 (Round 2:)

This round was similar to round 1, but the interviewer asked a significant number of NLP(Natural Language Processing) questions.

该回合与第一回合相似，但面试官问了很多NLP(自然语言处理)问题。

The round starts similarly with introducing myself and my interests.

此轮以介绍自己和我的兴趣类似地开始。

I was asked the following questions:-

我被问到以下问题：

What is the difference between bias and variance?
偏差和方差有什么区别？
Explain multiclass classification using Logistic Regression? Also, explain the softmax activation, cross-entropy loss, and write the equations for the same?
使用Logistic回归解释多类分类？另外，解释softmax激活，交叉熵损失，并写出相同的方程式吗？
Explain the working of RNNs, GRUs, and LSTMs? Also, explain the pros and cons of each type of network? Also, explain why transformer-based models are better than these?
解释RNN，GRU和LSTM的工作？另外，请解释每种网络的利弊？另外，请解释为什么基于变压器的模型比这些模型更好？
Explain the training procedure to obtain Glove embedding?
请解释训练程序以获得手套嵌入？
Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation?
设计垃圾邮件分类系统？另外，请解释用于评估的特征提取，算法和指标？
Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the kernel functions? And how SVM classifies when there is no linear separation between different classes?
解释支持向量机(SVM)的深入工作？另外，解释内核功能吗？当不同类别之间没有线性分隔时，SVM如何分类？
Which algorithm should be used to extract Nouns from search engine queries? And explain why?
应该使用哪种算法从搜索引擎查询中提取名词？并解释为什么？
Derive the equations for the forward and backward pass in a Linear Regression?
推导线性回归中向前和向后通过的方程式？

I was able to answer most of the questions in the interview, except the mathematical equations involved in SVMs. The interviewer seemed satisfied with most of my answers. I explained the answers by illustrating the algorithms on a shared screen.

除了SVM中涉及的数学方程式，我能够回答采访中的大多数问题。面试官似乎对我的大部分回答感到满意。我通过在共享屏幕上显示算法来解释答案。

She then asked me if I have any questions. I then asked the same question as round 1. The entire interview took about 45 minutes.

然后她问我是否有任何问题。然后，我问了与第1轮相同的问题。整个采访耗时约45分钟。

Round 3:

第三回合

The interviewer didn’t have a Data Science background, so he asked me questions on Data Structures & Algorithms. But he mentioned that it wouldn’t be hard since the interview was for a data science role.

面试官没有数据科学背景，所以他问我有关数据结构和算法的问题。但他提到，由于面试是针对数据科学职位，所以这并不难。

The interview starts with the formal introduction, and he asked me to introduce myself as usual.

采访从正式介绍开始，他让我像往常一样自我介绍。

I was asked the following questions:-

我被问到以下问题：

Given an array A=[a1,a2,a3…an,b1,b2,b3…bn], convert the array into the array B=[a1,b1,a2,b2…..an,bn] using only O(1) space.
给定数组A = [a1，a2，a3 ... an，b1，b2，b3 ... bn]，仅使用O()将数组转换为数组B = [a1，b1，a2，b2 ..... an，bn] 1)空间。
In the previous question, given an index, in the array A, return the index it would have in array B.
在上一个问题中，给定索引，在数组A中返回数组B中应具有的索引。
You have an array of ‘2N’ elements consisting of ’N’ even, and ’N’ odd elements, using the minimum number of swaps make sure that even elements are at odd indexes and odd elements are at even indexes.
您有一个由'N'个偶数和'N'个奇数元素组成的'2N'个元素数组，使用最小数量的交换来确保偶数元素在奇数索引处，奇数元素在偶数索引处。
In the previous question, assume that the information about the number of even is equal to the number of odd elements is not given, so verify the same while using the minimum number of swaps and only in one iteration on the array.
在上一个问题中，假设没有提供有关偶数等于奇数元素的信息，因此在使用最小交换次数并且仅在数组上进行一次迭代时，请验证相同的信息。

I was not able to answer the first question correctly, so the interviewer modified it to 2nd question, which I answered correctly and coded in a shared screen. He seemed satisfied by the answer to the 2nd question.

我无法正确回答第一个问题，因此面试官将其修改为第二个问题，我回答正确并在共享屏幕中进行了编码。他似乎对第二个问题的回答感到满意。

He then asked me the 3rd question, which I answered using the 2-pointer technique, and I coded the solution after explaining to him. He seemed satisfied with the answer.

然后，他问了我第三个问题，我使用2指针技术回答了这个问题，在向他解释后我对解决方案进行了编码。他似乎对答案感到满意。

The interviewer then modified the question to 4th question, for which I changed the loop and added some if-else statements in the loop, after which the interview discussed some edge cases in which the solution will fail, I then modified the code to accommodate edge cases. The interviewer seemed satisfied with the answer.

然后，采访者将问题修改为第四个问题，为此我更改了循环，并在循环中添加了一些if-else语句，此后，采访者讨论了一些解决方案将失败的边缘情况，然后我修改了代码以适应边缘案件。面试官似乎对答案感到满意。

He then asked if I have any questions, then I asked him about the work culture at Microsoft and the work he does at the company. After this, the interview was over. The whole interview took 45 minutes.

然后他问我是否有任何问题，然后我问他有关Microsoft的工作文化以及他在公司所做的工作。此后，采访结束了。整个采访耗时45分钟。

Key takeaways:

关键要点：

It is crucial to understand the mathematical concepts behind the algorithm rather than treating it as black-box algorithms.
了解算法背后的数学概念而不是将其视为黑盒算法至关重要。
Having machine learning projects on your resume is a huge plus point since every other candidate had to explain their projects. Review your projects thoroughly.
在简历上拥有机器学习项目是一个巨大的优势，因为其他所有候选人都必须解释他们的项目。彻底检查您的项目。
Have some decent practice of DSA questions. There might be some DSA rounds involved in the process. I was the only one to go through a DSA round among six candidates.
有一些体面的DSA问题练习。此过程可能涉及一些DSA回合。在六名候选人中，我是唯一一个参加DSA回合的人。
Read about some use-cases of machine learning in Industry, since most of data science interviews have these type of questions.
阅读有关工业中机器学习的一些用例，因为大多数数据科学访谈都涉及这类问题。

结论： (Conclusions:)

I was very confident about my performance in the first two rounds but was a little unsure of my performance in the 3rd round since I was pretty weak in Data Structures and Algorithms.

我对前两轮的表现非常有信心，但是由于我在数据结构和算法方面的能力很弱，因此对第三轮的表现有些不确定。

After three days, Microsoft declared the results for the internship position, and three students received the offer, and I was one of them!

三天后，微软宣布了实习职位的结果，三名学生收到了录取通知书， 我就是其中之一！

I will now intern at one of the offices at Microsoft India during May 2021-July 2021.

我现在将在2021年5月2021年7月间在Microsoft印度的一个办事处实习。

翻译自: https://towardsdatascience.com/my-data-science-interview-with-microsoft-6b7ec840b80e

微软大数据

weixin_26746401

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
微软大数据_我对Microsoft的数据科学采访

微软大数据Microsoft was one of the software companies that come to hire interns at my university for 2021 summers. This year, it was the first time that Microsoft offered any Data Science Internship for pr...
复制链接

扫一扫