编码习惯，优化直觉

最新推荐文章于 2022-11-13 11:45:27 发布

gentledongyanchao

最新推荐文章于 2022-11-13 11:45:27 发布

阅读量937

点赞数

分类专栏： Stories 文章标签：编程习惯技巧优化直觉参考模仿

本文链接：https://blog.csdn.net/gentledongyanchao/article/details/56835048

版权

Stories 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

就像观察奥运会选手打乒乓球，

“顶尖水平”是一种参考，而不是模仿。

有人说打球要严格按照标准选手的动作打——在自己身上打出人家的影子。

借鉴对方的长处，来调整自己的打球策略——在人家身上找到自己的影子。

作为一个还有点追求的程序员，在编程实践中得来几点看法。就两点“编码习惯”和“优化直觉”，但本文有一个局限。

一、变量定义的越多，程序越好写

变量本质是内存中的一块区域，里面存储着某些值。在编程实践中，使用适当的变量，在适当的地方，存储适当的中间结果，有很多好处。比如省去重复计算，使程序流程清晰，易于表达，有时还是某些算法中的硬性需要。

1.以isPrime为例

bool isPrime(const int &x) {
	if(x<2)return false;
	if(x==2)return true;
	int bound = (int)sqrt(1.0*x) + 1;
	for (int i(2); i <= bound/*(int)sqrt(1.0*x) + 1*/; ++i)if (x%i == 0)return false;
	return true;
}//isPrime

在for循环中每次计算sqrt(1.0*x)+1，会很费时费力。采用bound暂存一下sqrt(1.0*x)+1的值，省去每次循环的重复计算。

尝试转换几次思考方向，避过sqrt。采用 i * i <= x 来判断会成立。

bool isPrime(const int &x) {

	......
	for (int i(2); i * i <= x; ++i)if (x%i == 0)return false;
	......
}//isPrime

照此发展， (int)pow(1.0*i,(double)i) <=x 来进一步替换，isPrime仍旧成立。

bool isPrime(const int &x) {
	......

	for (int i(2); (int)pow(1.0*i,(double)i) <= x; ++i)if (x%i == 0)return false;
	......
}//isPrime

最终使用myPow来代替系统函数pow，给出二分求幂的一般性代码（递归版）。

long long BPRecur(long long base, const int &exp) {
	if (exp == 0)return 1;
	long long tmp(BPRecur(base, exp >> 1));
	tmp *= tmp;
	return (exp & 1) ? base*tmp : tmp;
}//BPRecur
long long myPow(const int &a, const int &b) {
	long long base(a);
	int exp(b);
	//假定参数a，b合法，不同时为0,0
	if (base == 0 && exp == 0)return -1;//简单处置一下
	if (base == 0 || base == 1 || exp == 1)return base;
	long long tmp(BPRecur(base, exp >> 1));
	tmp *= tmp;
	return (exp & 1) ? base*tmp : tmp;
}//myPow
bool isPrime(const int &x) {
	if (x<2)return false;
	if (x == 2)return true;
	for (int i(2); (int)myPow(i, i) <= x; ++i) if (x%i == 0) return false;
	return true;
}//isPrime

核心函数BPRecur，利用tmp来暂存BPRecur(base,exp>>1)的值，然后将两值相乘。因为这两个值是相同的，没必要计算两次。

2. 以leetcode78为例，求某个集合的子集合为例

//这种方式见过好多哦遍了。
//一层一层地穿衣服，retVecs一个变量足矣。
//要暂存那个sizeOfRetVecs非常必要。
class Solution {
public:
	vector<vector<int>> subsets(const vector<int> &nums) {
		int sizeOfNums = (int)nums.size();
		vector<vector<int>> retVecs{ {} };
		for (int i(0); i < sizeOfNums; ++i) {
			int sizeOfRetVecs = (int)retVecs.size();
			for (int j(0); j < sizeOfRetVecs; ++j) {
				auto tmpVec(retVecs[j]);
				tmpVec.push_back(nums[i]);
				retVecs.push_back(tmpVec);
			}//for j
		}//for i
		return retVecs;
	}//subsets
};

牺牲了一小块内存来存储中间计算结果，省去可能发生的重复计算，所以叫做“以土地换和平”。

扩展：dp算法，对于当前问题的求解会依赖前面的若干子问题，而那些子问题的计算结果早就被存储在数组里，这样就省去对子问题的反复求解。

dp算法难在确定状态和转移方程，编码却比较简单。大多开辟数组来存储中间计算结果，来获得大规模提速。

（被利用的这点叫有重叠子问题性质，dp算法还有无后向性和最优化原理，url，还没写，占个坑）

二、不求有功，但求无过

编程是一件很危险的事，最重要的是保证逻辑正确，功能得到有效地实现。稍不留神，就有可能出bug。从现在开始码代码，命名规范，流程清晰，考虑全面，从头到尾，一气呵成，就像王勃写《滕王阁序》那样，意境开阔，才华横溢，挥毫泼墨，语惊四座。这种境界也是所有程序员梦寐以求的，但是现实很骨感，写程序犹如履薄冰，必须谨小慎微，步步为营，心态到位之后还需要扎实的基础和丰富的debug经验。

1．比如并查集的核心api之一

inline int findRoot(int x) {
		return parents[x] == x ? x : (parents[x] = findRoot(parents[x]));
}//findRoot

寻找 x 所在集合的根，即有根树的下标，并返回之。还执行了“路径压缩”这一优化手段。

inline int findRoot(int x) {
		if (parents[x] == x)return x;//递归出口
		int ret(-1);//记录有根树的根
		ret = findRoot(parents[x]);//递归找根
		parents[x] = ret;//路径压缩
		return ret;//将根返回
}//findRoot

展开：逻辑清晰，易懂，易调试，易注释。

条件运算符的确可以简化程序代码，提高运行效率。if else虽然朴实无华，但在表达分支流程方面，却是最简单有效的。

2．再比如Leetcode100，判断两棵二叉树是否相同

bool isSameTree(const TreeNode * const p,const TreeNode * const q){
return p==NULL&&q==NULL||p!=NULL&&q!=NULL&&p->val==q->val||isSameTree(p->lch,q->lch)&&isSameTree(p->rch,q->rch);
}//isSameTree

对于返回 bool 类型的递归函数，容易采用关系运算符来组织代码，尤其是遇到二叉树或字符串的问题。

但是对于生手来说，还是下面的程序更有表现力。

bool isSameTree(TreeNode* p, TreeNode* q) {
	if (p == NULL&&q == NULL)return true;
	if (p == NULL&&q != NULL || p != NULL&&q == NULL)return false;
	if (p->val != q->val)return false;
	return isSameTree(p->left, q->left) && isSameTree(p->right, q->right);
}//isSameTree

3.多用强制转换

很多语言都支持默认类型转换，比如表达式运算，实参向形参传值，函数返回值部分。

这种转换由系统完成，架空了程序员，既然编程是一件很危险的事，还是让程序员来一人承担所有的转换责任，以强制转换来代替默认转换，避免奇葩的错误出现。

for (unsigned i(0); i >= 0; --i) {
	printf("i=%d\n", i);
	//cout << "i=" << i << endl;
	system("pause");
}//for i

开头的 i 是 unsigned 类型，

如果不改动i的类型，就要改i >= 0为(int)i>=0，再观察程序的表现。

unsigned k(0);
long long times(0);
for (int i(0); i < k - 1; ++i) {
	//printf("i=%u\n", i);
	++times;
}//for i
//printf("times=%lld\n",times);

该例来源于《剑指offer》的求链表中倒数第k个结点。

i本来是int型，在i<k-1时，i被默认转换为unsigned，而右侧对应的正好是最大的unsigned数。

所以输出times为2^32-1。

4.小括号是保险措施

inttmpI(4);

if(tmpI & 1 == 0)puts("是福不是霍");

elseputs("是霍躲不过");

tmpI&1是用位运算来加速tmpI%2这种判断奇偶数的操作。

tmpI&1==0先算1==0得到假，在C/C++里面，假即是0；tmpI与0做&运算，永远得0。

优先级不明确，可以自己用小括号分割，只是多敲了几对小括号，并不是一件太蠢的事。

带参数的宏定义也是类似的例子。

#define ISLEAP(x) ((x)%100!=0&&(x)%4==0)||(x)&&400==0

对读入数据不放心，保险起见，全加括号。即使笨拙，但总不至于产生错误。

扩展：防御式编程就是“疑人不用，用人不疑”。程序应该具备一定的容错性，把不好的挡在外面，把可信的放进来。数据永远不会错，错的是我们。

5.函数、参数、变量，能定死的尽量定死。

bool operator<(const struct node &x)const;

void print(const vector<int> &nums);

重载小于运算符，返回bool，属于常成员函数；

打印函数一般都是只读的，不应该影响传入的数据。

如果某些函数是只读的，某些变量不应被更改，就不要给人们留下任何可能的幻想。

这种改进的确是可有可无的，加上它们也许就是为了图个心安。难道编程不是应该先求稳嘛？

不求有功，但求无过，也算是一种处事哲学。

扩展：一个java抽象类的设计

package cn.edu.zju.ccnt.PizzaTestDrive;

public abstract class PizzaStore {
	public final Pizza orderPizza(String type){
		Pizza pizza = null;
		pizza = createPizza(type);
		pizza.prepare();
		pizza.bake();
		pizza.cut();
		pizza.box();
		return pizza;
	}//orderPizza
	protected abstract Pizza createPizza(String type);
}//PizzaStore

PizzaStrore类来源于《HeadFirst》的工厂模式，该类极其巧妙且安全。

PizzaStore专门用于被继承的类，尽量用abstract注明，肯定不能实例化了。

orderPizza是“一统天下”的方法，采用final修饰符来禁止其被子类覆盖。

createPizza是工厂方法模式的核心，一定要留给子类覆盖，所以用abstract修饰。

如此设计，该避免的风险全都避免掉，该给的提醒全都提醒到。

三、无聊的编程增添一点乐趣

1.命名规则：驼峰标识，下划线分割，匈牙利风格，帕斯卡风格。

tmpVec,tmpI,tmpStr, dummyHead

retVec,retVecs,retStr,retStrs

routine_backup, node_lamb

minCost,allCost,marks,parents

isPrime,myPow

smallYellowCar,bigBeautifulGirl

匈牙利风格写MFC时用过；帕斯卡风格，我用在函数名上的少，用在类名上的多，尤其是java的类。

灵感来源于英语语法：形容词前置做定语，副词后置做定语，状词后置做定语，名词连成一小串。

公认的缩写：temp->tmp,count->cnt,number->num,increment->inc。

2.刻在骨子里的小习惯。

一个不漏的优化语句。

二分求幂的迭代版

long long myPow(const int &a, const int &b) {
	long long ret(1);
	long long base(a);
	int exp(b);
	//输入自觉点，不要出非法的，防御式没做
	if (base == 0 || base == 1 || exp == 1)return base;
	while (exp) {
		ret *= (exp & 1) ? base : 1;
		base *= base;
		exp >>= 1;
	}//while
	return ret;
}//myPow

能优化的尽量都优化：

exp%2==1--->(exp&1)==1

exp = exp/2--->exp>>=1

base = base*base--->base *=base

这不是算法上的优化直觉，而是每次敲键盘时，就要有编码习惯。

一个不漏的初始化。

int tmp(-1)，ListNode* head(NULL)；

尽管C++的静态变量或是java类的成员变量都有默认值，但是显示的初始化并不会造成误解，就是为了突出一个严谨的态度。

一个不漏的返回值。

void print(){

//do something

return;

}//print

即便是空函数，也有返回的必要。

3、多思考，多变化，多封装，多优化

二叉树层序遍历的经典代码是使用队列作为辅助数据结构，但是对于在C语言环境下长大的孩子来说，levelOrder是这个样子的。

void levelOrder(TreeNode *root) {
	TreeNode *myQueue[100000];
	int front(0), back(0);
	if (root)myQueue[front] = root, ++back;
	//int nextLine(0);
	while (front < back) {
		TreeNode *cur = myQueue[front];
		printf("%d ", cur->val);
		//if (front == nextLine)printf("\n");
		++front;
		if (cur->lch)myQueue[back++] = cur->lch;
		if (cur->rch)myQueue[back++] = cur->rch;
		//if (front - 1 == nextLine)nextLine = back - 1;
	}//while
	return;
}//levelOrder

后来有了STL里面的queue，levelOrder变成了这样

void levelOrder(TreeNode *root) {
	queue<pair<TreeNode*, int>> que;
	if (root)que.push(make_pair(root, 0));
	//int preLevel(0);
	while (que.empty() == false) {
		auto cur = que.front();
		que.pop();
		//if (cur.second > preLevel) preLevel = cur.second, printf("\n");
		printf("%d ",cur.first->val);
		if (cur.first->left)que.push(make_pair(cur.first->left, cur.second + 1));
		if (cur.first->right)que.push(make_pair(cur.first->right, cur.second + 1));
	}//while
	return;
}//levelOrder

前者对队列的操作，都暴露在front和back上了。

后者把这些操作，都封装进queue中，这应该是一种进步吧。

封装后的队列，操作更简单，思路更清晰，维护更容易，更能突出主要业务逻辑代码。

例如，比较两者注释部分的代码，都是完成层序遍历的换行操作。

多思考一下，二叉树的层序遍历是否有递归版？

//用递归来做levelOrder，
//没有用队列。
//
//其实buildVec里面的那三行代码顺序可以任意，
//依照目前的排列，是DLR。
//也可以是DRL，RDL等等。
class Solution {
public:
	vector<vector<int>> levelOrder(TreeNode *root) {
		vector<vector<int>> vecs;
		buildVec(root, 0, vecs);
		return vecs;
	}//levelOrder
private:
	void buildVec(TreeNode *root, int level, vector<vector<int>> &vecs) {
		if (root == NULL)return;
		if ((int)vecs.size() <= level)vecs.push_back(vector<int>{});
		vecs[level].push_back(root->val);
		buildVec(root->left, level + 1, vecs);
		buildVec(root->right, level + 1, vecs);
		return;
	}//buildVec
};

再想想，如果将辅助数据结构queue改成stack，又会是何种遍历？

答：是DLR遍历。参考Leetcode111背景。

//基于levelOrder的遍历
class Solution {
public:
	int minDepth(TreeNode *root) {
		queue<pair<TreeNode*, int>> que;
		if (root)que.push(make_pair(root, 0));
		int ret(-1);
		while (que.empty() == false) {
			auto cur = que.front();
			que.pop();
			if (cur.first->left == NULL&&cur.first->right == NULL) if (ret == -1 || cur.second < ret)ret = cur.second;
			if (cur.first->left)que.push(make_pair(cur.first->left, cur.second + 1));
			if (cur.first->right)que.push(make_pair(cur.first->right, cur.second + 1));
		}//while
		return ret + 1;
	}//minDepth
};

//换queue为stack，深度优先遍历的。
class Solution {
public:
	int minDepth(TreeNode *root) {
		stack<pair<TreeNode*, int>> stk;
		if (root)stk.push(make_pair(root, 0));
		int ret(-1);
		while (stk.empty() == false) {
			auto cur = stk.top();
			stk.pop();
			if (cur.first->left == NULL&&cur.first->right == NULL) if (ret == -1 || cur.second < ret)ret = cur.second;
			if (cur.first->right)stk.push(make_pair(cur.first->right, cur.second + 1));

			if (cur.first->left)stk.push(make_pair(cur.first->left, cur.second + 1));
		}//while
		return ret + 1;
	}//minDepth
};

灵感来源于二叉树的DLR，LDR，LRD的递归和非递归实现，url（还未写，再占个坑）。

4.优化直觉主要是对时间和空间的直觉。

作为一种追求，要靠耐心、经验和积累，也算是闲得蛋疼时，给无聊的编程增添一点乐趣。

参考top1001的背景，详细情况参考http://blog.csdn.net/gentledongyanchao/article/details/56047650

disjointSet的quick-find版本

//quick-find
//findRoot是O(1)的，用来得到集合序号
//isOneSet用来判断xRoot和yRoot是否为同一集合。
//unionSet是O(n)的，用来合并xRoot到yRoot里面
//getDates用来得到该集合的元素个数。因此初始化全为1。dates还有别的用途。
class disjointSet {
private:
	vector<int> parents, dates;
	int cnt, size;
public:
	disjointSet(int size) {
		this->size = size;
		cnt = size;
		parents.resize(size);
		for (int i(0); i < size; ++i)parents[i] = i;
		dates.assign(size, 1);
		return;
	}//disjointSet
	inline int findRoot(int x) {
		return parents[x];
	}//findRoot
	inline int getCount() {
		return cnt;
	}//getCount
	inline bool isOneSet(int xRoot, int yRoot) {
		return xRoot == yRoot;
	}//inOneSet
	void unionSet(int xRoot, int yRoot) {
		for (int i(0); i < this->size; ++i) {
			if (parents[i] != xRoot)continue;
			parents[i] = yRoot;
		}//for i
		dates[yRoot] += dates[xRoot];
		--cnt;
		return;
	}//unionSet
	inline int getDates(int x) {
		return dates[x];
	}//getDates
};

调整一下findRoot和unionSet的策略，得到侧重于quick-union的版本，适用范围更广。

//quick-union
//findRoot是近似O(1)的，里面采用的是路径压缩，用来得到集合序号，findRoot本身是递归实现的
//isOneSet用来判断xRoot和yRoot是否为同一集合。
//unionSet是O(1)的，采用ranks来优化的，用来合并xRoot到yRoot里面
//getDates用来得到该集合的元素个数。因此初始化全为1。dates还有别的用途。
class disjointSet {
private:
	vector<int> parents, ranks, dates;
	int cnt;
public:
	disjointSet(int size) {
		cnt = size;
		parents.resize(size);
		for (int i(0); i < size; ++i)parents[i] = i;
		ranks.assign(size, 0);
		dates.assign(size, 1);
		return;
	}//disjointSet
	inline int findRoot(int x) {
		return parents[x] == x ? x : (parents[x] = findRoot(parents[x]));
	}//findRoot
	inline int getCount() {
		return cnt;
	}//getCount
	inline bool isOneSet(int xRoot, int yRoot) {
		return xRoot == yRoot;
	}//inOneSet
	void unionSet(int xRoot, int yRoot) {
		if (ranks[xRoot] == ranks[yRoot])++ranks[yRoot];
		if (ranks[xRoot] < ranks[yRoot])parents[xRoot] = yRoot, dates[yRoot] += dates[xRoot];
		else parents[yRoot] = xRoot, dates[xRoot] += dates[yRoot];
		--cnt;
		return;
	}//unionSet
	inline int getDates(int x) {
		return dates[x];
	}//getDates
};

quick-union里面的findRoot是递归实现的，可能会造成函数调用栈的溢出。

还可以用while来实现。

下面这种最直观，易懂。
tmpVec来存储中间结点，然后将tmpVec里面的点统统指向x。

但是由于findRoot调用频率高，所以会频繁产生tmpVec，虽然这个数组是在栈上开辟的，但是相应的分配消耗还是存在。

int findRoot(int x) {//提交后最后一个case是超时的
	vector<int> tmpVec;
	while (x != parents[x]) {
		tmpVec.push_back(x);
		x = parents[x];
	}//while
	for (auto tmp : tmpVec)parents[tmp] = x;
	return x;
}//findRoot

所以采用了将x结点的父结点设置为它的爷爷结点这个策略。
这个方法的压缩幅度不太狠，但是总体看来，效果还算不错。

int findRoot(int x) {//那个超时解决了
	while (x != parents[x]) {
		//这行代码也算是路径压缩，将x结点的父结点设置为它的爷爷结点。
		parents[x] = parents[parents[x]];
		x = parents[x];
	}//while
	return x;
}//findRoot

ranks的那种优化手段，效果不如findRoot里面路径压缩强。
类似的优化手段还有很多，比如dates初始化为全1，里面的值表达该集合含有元素的个数，可以采用
if(dates[xRoot]<=dates[yRoot])parents[xRoot]=yRoot;
else parents[yRoot]=xRoot;
这也是为了平衡一下这根树，尽量让“小树”向“大树”靠拢。

关于dates里面放什么数据，这里初始化为全1，dates的意义就是集合个数。
里面放每个城市的人口数，关联关系按“是否属于同省”来定义，dates的意义就是每个省的总人口。
里面放每个城市的石油储备，关联关系按“是否属于同省”来定义，dates的意义就是每个省的总石油。

只因为我不是世界冠军，并不代表我打乒乓球的方法不可取。

局限：实践来源都是C/C++、java这种强类型静态编译的语言。

Slowly but surely, we’ll become something else, something better.

Gentle Dong, Fourth Version,20170226