开始整理和复盘机器 学习或数据挖掘面试中涉及的问题
ps:整理小米机器学习去年校招
一面
讲项目
特征选择的常用方法
https://blog.csdn.net/SecondLieutenant/article/details/80693765
bagging和boosting的区别
https://www.cnblogs.com/earendil/p/8872001.html
手推逻辑回归
https://blog.csdn.net/u014472643/article/details/80662532
过拟合的解决办法,以及在你的项目中怎么用的
http://www.360doc.com/content/18/0805/10/11935121_775819522.shtml
L1和L2的区别 L1为什么能稀疏矩阵 L2为什么不能,L2为什么能解决过拟合
https://blog.csdn.net/haidixipan/article/details/83186850
gbdt,xgboost模型的比较
https://blog.csdn.net/m0_37870649/article/details/81022040
lstm和Rnn区别
https://blog.csdn.net/lanmengyiyu/article/details/79941486
梯度消失的解决办法
https://blog.csdn.net/qq_25737169/article/details/78847691
手撕代码:链表反转 最大子序列和
1.输入一个链表,输出反转后的链表
#非递归实现:
# -*- coding:utf-8 -*-
# class ListNode:
# def __init__(self, x):
# self.val = x
# self.next = None
class Solution:
# 返回ListNode
def ReverseList(self, pHead):
# write code here
if pHead is None:
return pHead
last = None #指向上一个节点
while pHead:
# 先用tmp保存pHead的下一个节点的信息,
# 保证单链表不会因为失去pHead节点的next而就此断裂
tmp = pHead.next
# 保存完next,就可以让pHead的next指向last了
pHead.next = last
# 让last,pHead依次向后移动一个节点,继续下一次的指针反转
last = pHead
pHead = tmp
return last
2.最大子序列和
#O(n*2)解法(最简单粗暴的方式,双层循环,用一个maxsum标识最大连续子序列和。然后每次判断更新)
def maxSum(list):
maxsum = list[0]
for i in range(len(list)):
maxtmp = 0
for j in range(i,len(list)):
maxtmp += list[j]
if maxtmp > maxsum:
maxsum = maxtmp
return maxsum
#O(n)解法(动态规划)
'''
假设数组为a[i],因为最大连续的子序列和必须是在位置0-(n-1)之间的某个位置结束。那么,当循环遍历到第i个位置时,如果其前面的连续子序列和小于等于0,那么以位置i结尾的最大连续子序列和就是第i个位置的值即a[i]。如果其前面的连续子序列和大于0,则以位置i结尾的最大连续子序列和为b[i] = max{ b[i-1]+a[i],a[i]},其中b[i]就是指最大连续子序列的和
'''
def maxSum(list_of_nums):
maxsum = 0
maxtmp = 0
for i in range(len(list_of_nums)):
if maxtmp <= 0:
maxtmp = list_of_nums[i]
else:
maxtmp += list_of_nums[i]
if(maxtmp > maxsum):
maxsum = maxtmp
return maxsum
智力题:马匹赛跑 25匹马,5个跑道,没有计时器,要找出前三名,最少要比多少场,答案是7
http://www.cnblogs.com/vincently/p/4802592.html
二面
手推gbdt
https://blog.csdn.net/blank_tj/article/details/82262431
手推xgboost
https://blog.csdn.net/u014472643/article/details/80658009
手撕代码 两个有序数组,求其中位数,然后改进时间复杂度
#排序sorted解法(时间复杂度:最坏O(nlog(n)),最优O(n),平均O(nlog(n)))
class Solution:
def findMedianSortedArrays(self, nums1, nums2):
"""
:type nums1: List[int]
:type nums2: List[int]
:rtype: float
"""
nums = sorted(nums1+nums2)
if len(nums)%2 ==0:
answer = (nums[int(len(nums)/2-1)]+nums[int(len(nums)/2)])/2
else:
answer = nums[int((len(nums)+1)/2)-1]
return answer
用二分查找改进时间复杂度
#二分查找,时间复杂度O(log(n))
class Solution:
def findMedianSortedArrays(self, nums1, nums2):
"""
:type nums1: List[int]
:type nums2: List[int]
:rtype: float
"""
m, n = len(nums1), len(nums2)
if m > n:
nums1, nums2, m, n = nums2, nums1, n, m
if n == 0:
raise ValueError
imin, imax, half_len = 0, m, (m + n + 1) // 2
while imin <= imax:
i = (imin + imax) // 2
j = half_len - i
if i < m and nums2[j-1] > nums1[i]:
# i is too small, must increase it
imin = i + 1
elif i > 0 and nums1[i-1] > nums2[j]:
# i is too big, must decrease it
imax = i - 1
else:
# i is perfect
if i == 0: max_of_left = nums2[j-1]
elif j == 0: max_of_left = nums1[i-1]
else: max_of_left = max(nums1[i-1], nums2[j-1])
if (m + n) % 2 == 1:
return max_of_left
if i == m: min_of_right = nums2[j]
elif j == n: min_of_right = nums1[i]
else: min_of_right = min(nums1[i], nums2[j])
return (max_of_left + min_of_right) / 2.0
LR和svm的区别
https://blog.csdn.net/u013385362/article/details/80219531
场景分析题,如何对新闻进行实效性分析,怎么挖特征
lstm.每个门的公式还会写吗?
1.对门的理解
https://blog.csdn.net/shenxiaoming77/article/details/76795376
2.公式推导
https://www.sohu.com/a/159470197_99916544