ZJ_Frank-CSDN博客

原创 Measure Theory (4): Caratheodory theorem

In last lecture, we have shown we have a unique way of extending μ\muμ, σ\sigmaσ-additive defined in a semi-algebra S\mathscr{S}S to ν\nuν, σ\sigmaσ-additive defined in an algebra A(S)\mathcal{A}(\mathscr{S})A(S), algebra generated by S\mathscr{S}SIn this

2021-08-04 20:30:52 439

原创小于n的质数数量 --- 埃氏筛 (Eratosthenes)

问题给定一个整数 n，请你找到小于等于它的所有质数数目。下面给出几种不同复杂度的解法，难度依次递增。最出彩的是第三种该算法，由希腊数学家厄拉多塞（Eratosthenes）提出，称为厄拉多塞筛法，简称埃氏筛，可以在接近线性时间内解决问题。Brutal Force O(n2)O(n^2)O(n2)def primeNumbers(n): cnt = 0 for i in range(1, n+1): if isPrime(i): cnt += 1 return cnt def is

2021-08-02 19:54:52 414

原创 Dijkstra algorithm - implemented using priority_queue

经典图论算法。用于计算两点之间的最短距离。也可以用于计算某一点（source）到其他所有点的最短距离。伪代码Dijstra (graph, n, src) initialize dist := [...inf...] with dist[src] = 0 initialize priority queue pq; add (0, src) to pq while pq is not emtpy u = pq.top()[1] pq.pop() for neighbor, wei

2021-08-02 10:34:34 347

原创 Fisher-Yates Shuffle Algorithm

今天我们介绍一种 Shuffle算法，它保证了对数组shuffle后，任何一个元素出现在数组中任何一个位置的概率是相等的。并且这个算法复杂度为 O(n)算法以及实现对于一个需要 Shuffle的数组 A，其长度为 n。假设我们有这样的一个函数 rand(0, n)，它可以均匀地生成 [0,…,n]之间的随机整数。Fisher-Yates Shuffle Algorithm 这样给出解决方案：Let k = n-1, (A is 0-indexed)start from node k, swa

2021-07-28 22:25:24 311

原创 Measure Theory (3): set functions

Definitionsμ:C→R+∪{∞}\mu:\mathcal{C}\to\mathbb{R}_+\cup\{\infty\}μ:C→R+∪{∞}DEFμ\muμ is continuous from below at E if ∀{Ei}i≥1,Ei∈C,En↑E\forall \{E_i\}_{i\ge1},E_i \in\mathcal{C}, E_n\uparrow E∀{Ei}i≥1,Ei∈C,En↑E andμ(En)→μ(E)\mu(E_n)\to\mu(E)μ(En

2021-07-27 23:49:40 169

原创 Measure Theory (2): semi-algebra, algebra, sigma-algebra

DefinitionsIn this post, we define the semi-algebra, algebra, sigma-algebra.Semi-algebraConsider Ω\OmegaΩ as the whole set (for example, Ω=R\Omega=\mathbb{R}Ω=R),S(Ω)\mathcal{S}(\Omega)S(Ω) is the collection of subset of Ω\OmegaΩDEF Semi-algebra S\ma

2021-07-25 00:13:50 790

原创 Measure Theory (1): Why measure theory? The motivation

这个专栏计划开坑更完 Measure Theory by Claudio Landim的 Lecture Notes。此为第一个视频的笔记。用一个反例阐述了学习测度论的必要性。引入首先考虑这样一个问题：我们如何测量 R\mathbb{R}R的子集的长度？直观地，如果这个子集为 (a,b](a, b](a,b]，那么用 b−ab-ab−a来定义这个区间的长度似乎是一个非常合理的想法。进一步的，我们对这个「测量函数」有如下的期待：这个函数的值域应该大于等于0特别地，对于区间 (a,b](a, b](

2021-07-23 22:23:05 433

原创谈谈quick Sort: 两种实现方法（额外空间与 in-place）

快排是一个很经典的排序算法了，虽然它的 worst case 复杂度是 O(n2)O(n^2)O(n2)，但是一般而言它的平均复杂度为O(nlog⁡n)O(n\log n)O(nlogn)，并且在实际中常常是最快的排序算法。而它可以有两种版本可以实现：额外空间开销或者原地算法。一般我们提到的快排都是原地算法，而其实现难度也更高。In Placedef inPlaceQuickSort(nums, start, end): if end - start <= 1: return p = p

2021-05-30 23:11:49 937 1

原创两个均匀分布的随机变量求和的分布

给定两个独立分布的随机变量 X, Y ~ uniform(0,1)求 Z = X+Y的分布先说结论：Z 取值范围在0到2之间，它的 cdf为：F(Z≤t)=t22,0≤t≤1F(Z\le t) = \frac{t^2}{2}, 0\le t\le 1F(Z≤t)=2t2,0≤t≤1F(Z≤t)=−t22+2t−1,1≤t≤2F(Z\le t) = -\frac{t^2}{2}+2t-1, 1\le t\le 2F(Z≤t)=−2t2+2t−1,1≤t≤2Derivation我们在学习知识

2021-05-30 16:19:46 19050 2

原创 [面经]快手搜索部门算法实习生二面

继续攒人品…二面的面试官人也蛮nice的，让我先介绍一下简历上的项目。然后我扯了一个深度学习balabala然后他兴趣就来了（哈哈哈哈哈哈），揪着我问了一堆相关问题：过拟合啦，sigmoid 函数啦，back propagation啦，erro rate vs training number啦…之后就说咱们来写个题吧。问的是，求一颗二叉树的最后一层节点数之和。我就说了遍历两次的思路，他说能不能只遍历一次？我想了想说可能需要额外空间，然后写了写，过了。（看起来手撕代码是非常常见的操作。这里手撕应该是A

2021-03-25 14:57:48 801

原创 [面经]快手搜索部门算法实习生一面

写写面筋攒攒人品~一面总体来说比较简单，首先是问简历上的项目经历/实习经历究竟做了什么，问得比较详细，并拓展问了很多问题。（比如在谈到深度学习的时候，问了很多深度学习的相关概念，比如为什么叫深度学习，与传统机器学习方法的区别，如何判断是否过拟合，过拟合有哪些解决方法）。总之是以吹水为主，面试官也比较温柔。然后就是喜闻乐见的手撕代码了，面试官问了两个链表问题，第一个是删除倒数第n个节点，这个题可能考点就是怎么做到一次遍历就能搞完。第二个是反转链表，大家应该都会做。反正写代码的时候不要闷头写，先把自己的思路

2021-03-24 20:42:56 959 1

原创找到某年某月的第四个星期三

背景：etf50期权的交割日期为到期月的第四个星期三。calendar是一个很有用的包，话不多说直接上代码。import calendarc = calendar.Calendar(firstweekday=calendar.SUNDAY)year = 2020; month = 2monthcal = c.monthdatescalendar(year,month)fourth_wednesday = [day for week in monthcal for day in week i

2021-03-16 10:42:45 581

原创 ML(1) Linear Regression

IntroductionLinear regression is perhaps the most fundamental algorithm in machine learning. In this setting, given a dataset D={(xi,yi)∣xi∈Rn,yi∈R}i=1mD=\{(x^i,y^i)|x^i\in \mathbb{R}^n, y^i\in\mathbb{R} \}_{i=1}^mD={(xi,yi)∣xi∈Rn,yi∈R}i=1m (x is featur

2021-03-02 16:49:28 201

转载 PyMongo Tutorial

https://cloud.tencent.com/developer/article/1005552?from=article.detail.1151814https://cloud.tencent.com/developer/article/1151814

2021-02-05 16:56:41 117

原创摩尔投票法

问题描述给定一个长度为n的数组，请找出其中出现次数大于 n/2的元素（假设一定存在）解决这个问题本身并不困难，大不了建个字典就行。麻烦的是如何在一次遍历、使用O(1)空间解决？这就引出了摩尔投票法。摩尔投票法伪代码如下：Given A of length n;function findMajorElement(A: array of length n): int cnt = 0, major = -1; for i in A: if cnt == 0: major = i

2021-01-14 10:05:36 155

原创 Selenium 突破网站反爬虫

今天在使用 Selenium以及谷歌浏览器进行爬虫的时候被拦截了。在查阅资料后得出解决方法如下：方法一换用火狐浏览器from selenium import webdriverurl = "SOME URL YOU WANT TO SCRAPE"user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605

2020-11-24 23:25:54 1824

原创 Python 爬取新浪财经 7x24(1): 下载数据

前言最近在做一个需要很多财经数据的项目。于是想到可不可以通过爬虫来解决（尝试白嫖）。提供财经数据的网站有很多个，其中比较靠谱和更新比较及时的应该是新浪7x24 live数据。http://finance.sina.com.cn/7x24/?tag=0这个问题有一定的难度，因为这个网页是动态加载的。每一次只会加载部分数据，而只有拉到底部才可以加载新的数据。于是乎，我们要做的事情是：下载数据保存到数据库本篇介绍如何下载数据。代码闲言少叙书归正文，代码的实现如下：import reque

2020-11-20 11:23:14 2340

原创 Python 汉字转拼音

在这里分享一个很有用的包：pypinyin安装pip3 install pypinyin 或者使用镜像网站安装https://blog.csdn.net/ZJ_11701/article/details/109378174使用import pypinyin# 默认给出比较详细的注音In[14]: pypinyin.pinyin("我可真能耐")Out[14]: [['wǒ'], ['kě'], ['zhēn'], ['néng'], ['nài']]# 如果不想加音标，声明即可In[

2020-11-18 21:47:01 179

原创 bs4:按类别和按class获取信息

在BeautifulSoup的官方文档中，给出了 find和 find_all两种 method，如果按类别获取信息，有如下语法：soup.find_all('a') # 获取所有形如 <a> ... </a>的信息如果我们想要按class得到呢？比如某一个信息在 div容器里面，它所对应的 class是 listBlk。例如<div class="listBlk">  <table cell

2020-11-16 23:27:28 3832

原创 python 使用 requests爬取网站出现格式错误的解决方案

编码的历史这里就不再赘述，直接上解决方案:res = requests.get(url)res.encoding = res.apparent_encoding

2020-11-16 21:29:50 494

原创 pandas: One-Hot-Encoding 独热编码

对于一些表示类别的变量（也就是 categorical variable），我们不应该分配数字，这样是没有意义的。相反，我们应当使用独热编码。（不知道还有哪些更合理的方式）直接上例子：>>> import statsmodels.api as sm>>> import pandas as pd>>> import numpy as np>>> np.random.seed(444)>>> data = {

2020-11-15 10:14:46 695

原创 git 拉取远程分支到本地

步骤：1、新建一个空文件2、初始化git init3、自己要与origin master建立连接（下划线为远程仓库链接）git remote add origin git@github.com:XXXX/nothing2.git4、把远程分支拉到本地git fetch origin dev #（dev为远程仓库的分支名）5、在本地创建分支dev并切换到该分支git checkout -b dev(本地分支名称) origin/dev(远程分支名称)6、把某个分支上的内容都拉取到本

2020-11-15 10:08:26 169

原创用pip镜像方法安装pandas包

像网址如下：pip install pandas -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com/pypi/simple注：其中https://mirrors.aliyun.com/pypi/simple/ 是阿里云网址–trusted-host=mirrors.aliyun.com/pypi/simple 表示信任此网址注：如需关于镜像安装第三方包更详细的介绍，请浏览本人上一篇博客“pyth

2020-10-30 08:57:35 5883

原创 A detailed derivation for the Bias Variance tradeoff Decomposition

Introduction在 ESL和 ISLR中，都给出了对于 bias和 variance的讨论，并给出这样的结论：Err(X)=Var(f^(X))+Bias(f^(X))2+Var(ϵ)Err(X) = Var(\hat{f}(X)) + Bias(\hat{f}(X))^2 + Var(\epsilon)Err(X)=Var(f^(X))+Bias(f^(X))2+Var(ϵ)但是笔者在查阅资料时发现，对这个结论的少有比较详尽的推导。故在此整合后加上自己的理解，将一个比较详尽的推导过程给出

2020-08-05 20:11:03 206

原创 Stochastic Process: the News Vendor Problem

IntroductionMany scenarios can be described as a stochastic process. Such as Go, DiDi, Inventory, Patient Wards, and Portfolio of Stocks. In this semester, we are going to talk about Markov Chains.To begin with, let’s look at a simple yet interesting pro

2020-08-03 15:47:52 346

原创语法碎碎念：begin v.s. cbegin（C++）

iterator当我们需要声明一个 iterator的时候，(以 vector为例子)。我们有两种声明方法：it = vec.begin();it = vec.cbegin();它们有什么区别呢？简单来说，使用 begin会视情况返回一个 const_iterator （不能更改指针所指向的值）或者 iterator（可以改变指针所指的值）。而使用 cbegin会直接返回一个 const_iterator。举个例子：std::vector<int> vec;const std::

2020-07-27 11:28:47 657

原创 Data Structure Lecture Note (Week 7, Lecture 20)

(Finally) The end of the course!(2, 4) treeMulti-way search tree:an ordered tree such that:each interval node has at least two children and stores d-1 key-element items ki,oik_i, o_iki,oi, where ddd is the number of childrenfor a node with children

2020-07-27 11:01:58 133

原创 Data Structure Lecture Note (Week 7, Lecture 19)

Advanced ADT:BBST: AVL, red-black, B tree, B+ treeHashing: unordered dictionary"In an interview, always ask CAN I USE HASH? "In C++, hashing table is implemented as std::unordered_mapIn Python, … is dict()How to implementKeys: an abstract object, we

2020-07-20 14:02:21 130

原创 LC97: 交错字符串

问题描述给定三个字符串 s1, s2, s3, 验证 s3 是否是由 s1 和 s2 交错组成的。示例 1:输入: s1 = “aabcc”, s2 = “dbbca”, s3 = “aadbbcbcac”输出: true示例 2:输入: s1 = “aabcc”, s2 = “dbbca”, s3 = “aadbbbaccc”输出: false解题思路这个题可以用动态规划解决，思路与LCS(longest common subsequence，最长公共子序列)类似。用 dp[i][j]d

2020-07-18 09:48:42 395

原创语法碎碎念：加速你的写入和写出（C++）

问题描述：在使用 C++逐行读取和写出数据时，有两种方法：cin & cout 或者 scanf & printf当问题规模不大的时候，它们没有太大区别。然而，如果输入和输出过多，可能会影响速度。这是一位 Stack Overflow上的一位大佬的回答：可以看出，使用 scanf & printf的组合是会更快的。他同时也提到：cin cout更慢的原因是因为有一些 synchronization需要做，等价于多做了蛮多无用功。尽管研究数据的输入输出有点歪门邪道的意思（

2020-07-17 15:24:12 495

原创 Data Structure Lecture Note (Week 6, Lecture 18)

Building blocks problems cont.dYou are given N wooden blocks, each with a weight Wi and a strength Si for each item iPlease find a way to stack the blocks such that the strength of a block should be larger or equal to the sum of the weights above this bl

2020-07-16 14:22:40 174

原创 Data Structure Lecture Note (Week 6, Lecture 17)

Some DP problems

2020-07-14 14:42:45 153

原创 Data Structure Lecture Note (Week 6, Lecture 16)

Reduction: vertex cover to hitting sethitting set problem: given a set O of objects and a collection C of subsets of O. Whether there is a set of K objects from O such that for each c in C, there is one element from K in c.e.g. O = {1,2,3,4,5} C= {{1,2},

2020-07-13 12:35:34 240

空空如也

空空如也