点击上方蓝字 会变美
“
每
日
一
练
”
Jun.
30
Data Application Lab 自2017年6月15日起,每天和你分享讨论一道数据科学(DS)和商业分析(BA)领域常见的面试问题。
自2017年10月4日起,每天再为大家分享一道Leetcode 算法题。
希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考,我们将会在第二天给出答案。
“
每
日
一
练
”
Day
802
DS Interview Question
How can you ensure that you don’t analyse something that ends up producing meaningless results?
BA Interview Question
Trips and Users
The Trips table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are both foreign keys to the Users_Id at the Users table. Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).
+----+-----------+-----------+---------+--------------------+----------+
| Id | Client_Id | Driver_Id | City_Id | Status |Request_at|
+----+-----------+-----------+---------+--------------------+----------+
| 1 | 1 | 10 | 1 | completed |2013-10-01|
| 2 | 2 | 11 | 1 | cancelled_by_driver|2013-10-01|
| 3 | 3 | 12 | 6 | completed |2013-10-01|
| 4 | 4 | 13 | 6 | cancelled_by_client|2013-10-01|
| 5 | 1 | 10 | 1 | completed |2013-10-02|
| 6 | 2 | 11 | 6 | completed |2013-10-02|
| 7 | 3 | 12 | 6 | completed |2013-10-02|
| 8 | 2 | 12 | 12 | completed |2013-10-03|
| 9 | 3 | 10 | 12 | completed |2013-10-03|
| 10 | 4 | 13 | 12 | cancelled_by_driver|2013-10-03|
+----+-----------+-----------+---------+--------------------+----------+
The Users table holds all users. Each user has an unique Users_Id, and Role is an ENUM type of (‘client’, ‘driver’, ‘partner’).
+----------+--------+--------+
| Users_Id | Banned | Role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
Write a SQL query to find the cancellation rate of requests made by unbanned users between Oct 1, 2013 and Oct 3, 2013. For the above tables, your SQL query should return the following rows with the cancellation rate being rounded to two decimal places.
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
LeetCode Question
Combinations
Description:
Given two integers n and k, return all possible combinations of k numbers out of 1 … n.
Input: n = 4 and k = 2
Output: [[2,4],[3,4],[2,3],[1,2],[1,3],[1,4],]
Day
801
答案揭晓
DS Interview Question & Answer
How do data management procedures like missing data handling make selection bias worse?
Missing value treatment is one of the primary tasks which a data scientist is supposed to do before starting data analysis. There are multiple methods for missing value treatment. If not done properly, it could potentially result into selection bias. Let see few missing value treatment examples and their impact on selection-
Complete Case Treatment: Complete case treatment is when you remove entire row in data even if one value is missing. You could achieve a selection bias if your values are not missing at random and they have some pattern. Assume you are conducting a survey and few people didn’t specify their gender. Would you remove all those people? Can’t it tell a different story?
Available case analysis: Let say you are trying to calculate correlation matrix for data so you might remove the missing values from variables which are needed for that particular correlation coefficient. In this case your values will not be fully correct as they are coming from population sets.
Mean Substitution: In this method missing values are replaced with mean of other available values.This might make your distribution biased e.g., standard deviation, correlation and regression are mostly dependent on the mean value of variables.
Hence, various data management procedures might include selection bias in your data if not chosen correctly.
BA Interview Question & Answer
Rising Temperature
Given a Weather table, write a SQL query to find all dates' Ids with higher temperature compared to its previous (yesterday's) dates.
+---------+------------------+------------------+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
+---------+------------------+------------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+---------+------------------+------------------+
For example, return the following Ids for the above Weather table:
+----+
| Id |
+----+
| 2 |
| 4 |
+----+
Answer:
SELECT DISTINCT w1.Id
FROM Weather as w1, Weather as w2
WHERE w1.Temperature > w2.Temperature
AND w1.RecordDate = DATE_ADD(w2.RecordDate, INTERVAL 1 DAY);
Reference:
https://leetcode.com/problems/rising-temperature/description/
LeetCode Question & Answer
Array DFS Subsetss
Description:
Given a set of distinct integers, nums, return all possible subsets.
Input: [1,2,3]
Output: [[],[1],[1,2],[1,2,3],[1,3],[2],[2,3],[3]]
Assumptions:
The solution set must not contain duplicate subsets.
Solution:
子集问题就是隐式图的深度优先搜索遍历。因为是distinct,所以不需要去重。
Time Complexity: O(2 ^ n)
Space Complexity: O(n)
往期精彩回顾
招人啦!Web Developer看过来~
Python中的类 (Classes) : 数据科学家的基础
求职必知:组织数据科学项目的诀窍
亚麻DS这些面试真题,你能答对几道?
求职面试、统计概率、Tableau、Python、Sql、R等近三十门线上课程最低只要99!
点「在看」的人都变好看了哦