contents
源码已经全部更新到我的GitHub。
主要架构
缓存对象的构建
为什么要使用缓存对象?
其实我们在进行小量存取的过程当中,不加这个缓存,效率并不受影响。但是会存在比较极端的情况:也就是频繁连续增改查同一个表的内容,这时如果有一个缓存那么效率将得到极大的提升。
当然本来应该将删除也考虑在里面的,后来考虑到vector本身删除效率的问题,我们最终将删除操作通过标志位来完成。这个过程的结果就是,删除部分元素的过程结束之后,总数和vector中元素个数对不上,这就导致了文件脱节,终将产生缓冲区死循环并崩溃。
列参数表——有序pair,以及虚函数
在先前为了传递新建表时产生的参数,我们已经使用过列参数表这个对象:
主要成员是一个pair的vector,用于储存列名和列变量的长度信息。
0代表数字信息。长度已经默认最大为10
class ColInfo
{
protected:
int col_num_;
public:
//ordered pairs: first is the length of col_name, indicating the type of col_name; second is the name is the col_name
std::vector<std::pair<string, int>> col_info_;
ColInfo() : col_num_(), col_info_() { }
void AddCol(string, int, bool mode = 1);
int col_num() const { return col_num_; }
int col_len(int col) const { return col_info_[col].second; }
int FindCol(const string&); //O(n)
virtual void SetColNum(int col_num) { col_num_ = col_num; col_info_.resize(col_num);}
virtual void PrintInfo() const;
};
其实在先前使用的时候,没有使用这么多的成员函数,但后来边用边加哎,这可能使得建立表格的过程在一定程度上被赘上了效率的浪费。
但这种扩展导致先前浪费的同时,却给后来带来了很大的方便。或许应该给传参多加一种结构体。
缓存对象——继承和友元函数
在这个本来倾向于数据结构的lab中,我还是使用了继承的技术:
缓存结构体
class Table : public ColInfo
{
private:
int elem_num_;
std::vector<bool> isDelete;
std::vector<std::vector<std::pair<string,int>>> elem_info_; // used to store the data temporarily. the ordered 'map' guarantee the finding cost is O(n): a traversal cost. first dimension is column, second is id within vector, key is elem, value is index.
public:
Table() : ColInfo(), elem_num_(0) { }
int elem_num() const { return elem_num_; }
inline void AddElem(int, int, string);
std::pair<string, int> GetElem(int col, int row)const { return elem_info_[col][row]; }
void SetElem(int col, int row, string newval) { elem_info_[col][row].first = newval; }
void SetElemNum(const int elem_num);
void SetColNum(const int col_num );
void EraseElem(const int& id);
void ClearError(int, int);
void InitRead();
void PrintInfo() const;
friend void Insert(string, std::vector<string>);
friend void Select(string, std::vector<string>, Clause);
friend void Update(string table_name, string col_name, string newvalue, Clause);
friend void Delete(string table_name, Clause);
#ifdef _TEST5_
friend void WATCH();
#endif
};
友元函数
- 这里的友元函数完全是debug一时紧急所为。其实整个程序只使用了一个cache对象,实则无论是否友元,效果几乎相同,所以最终也没有进行进一步的封装优化。
- 其实从某种程度上说,我的这个结构体完全就是为了将受体对象相同的一族函数放在一起,有利观瞻而已。
函数实现
列参数结构Col_Info
还有三个函数没有实现:
添加列
void ColInfo::AddCol(string s, int i, bool mode)
{
#ifdef _LOC_
std::clog << "a col was added" << endl;
#endif
col_info_.push_back(make_pair(s, i));
if (mode)
col_num_++;
}
这个函数利用mode的bool变量区分是初始的添加,还是在read当中作为读入工具。和先前的想法相同,我们或许应该牺牲一点代码量给sql_itp
中的列参数重新建造一个结构体。
寻找列名的函数:
这里没有返回末端,虽然是违背了STL的编码习俗,但是在判断起来写着不是很顺手嘛(坏笑
int ColInfo::FindCol(const string & name)
{
vector<std::pair<string, int>>::iterator it = cache.col_info_.begin();
for (auto it = col_info_.begin(); it < col_info_.end(); it++)
{
if ((*it).first == name)
return it - col_info_.begin();
}
error("col not found");
return -1;
}
列参数的输出
没什么多说的。
void ColInfo::PrintInfo() const
{
#ifdef _LOC_
std::clog << "ColInfo::PrintInfo() called. column info was put once" << endl;
#endif
cout << col_num_ << ' ' << 0 << endl; // the second 0 indicates the num of elem (when table is created)
for (auto it : col_info_)
cout << it.first << ' ' << it.second << endl;
}
缓存表对象Table
还有
增加元素的函数:
- 这里的日志log是后续debug的典范。完美体现了先前提到的何地干何时。清晰反映了变化,这在增删改查中是很重要的。
- vector时刻都会越界,所以增添时要注意使用
push_back
,同时要保证后来加入的标志位向量isDelete
也同时添加了相应的位数。
这个越界恶心至极。虽然我现在已经知道那跳到那一行出现的问题,一定会是这个问题。下次有空把那一行添在这里。
void Table::AddElem(int col_no, int index_num, string elem)
{
#ifdef _LOC_
std::clog << "vector[" << col_no << "] was added with elem. now elem_num is " << index_num + 1 << endl;
#endif
isDelete.push_back(false);
elem_info_[col_no].push_back(make_pair(elem, index_num));
#ifdef _TEST5_
std::clog << "vector[" << col_no << "] was added with " << elem << ". now elem_num is " << index_num + 1 << endl;
#endif
}
设置元素数量。
- 本来是个挺简单的事,碰上二维向量,就会被越界搞得草木皆兵。事实证明只需要resize就可以了
void Table::SetElemNum(const int elem_num)
{
elem_num_ = elem_num;
for (auto it: elem_info_)
it.reserve(elem_num*2), it.resize(elem_num);
}
同样的设置列数。不仅要调用列参数表结构的函数(那里面会将继承来的列名结构及时的扩容),同时不要忘了对二维数组扩容。
void Table::SetColNum(const int col_num)
{
#ifdef _LOC_
std::clog << "column num was set as "<< col_num << endl;
#endif
ColInfo::SetColNum(col_num);
elem_info_.resize(col_num);
}
删除元素
警惕多次删除(虽然原理上这个问题根本不会发生)
void Table::EraseElem(const int& id)
{
#ifdef _TEST5_
std::clog << "the "<< id << "th elem was erased" << endl;
#endif
if (isDelete[id]) return;
elem_num_--;
isDelete[id] = true;
}
然后是比较复杂的初始化读入函数
其中Break是一个标志读入数据不完整的函数。程序并没有设置对损毁文件的修复机制。
void Table::InitRead()
{
// for (auto it: elem_info_)
// while (!it.empty())
// it.pop_back();
#ifdef _LOC_
std::clog << "the table was initialized" << endl;
#endif
cin >> col_num_ >> elem_num_;
Break();
string tmp_name; int tmp_length; string tmp_elem;
SetColNum(col_num_);
isDelete.resize(elem_num_ * 2);
for (int i = 0; i < col_num_; i++)
{
cin >> tmp_name >> tmp_length;
#ifdef _TEST5_
std::clog << cin.eof() << " " << cin.bad() << " " << cin.fail() << std::endl;
#endif
Break();
col_info_[i].first = tmp_name;
col_info_[i].second = tmp_length;
elem_info_[i].resize(elem_num());
for (int j = 0; j < elem_num_; j++)
{
cin >> tmp_elem;
Break();
elem_info_[i][j].first = tmp_elem;
elem_info_[i][j].second = j;
}
}
}
如果在调用初始读入的过程当中发生了数据损坏,那么将同时调用以下这个函数保证先前读入的部分被清除掉,进而回到正常运作轨道。
void Table::ClearError(int row_num, int err_col)
{
for (int i = 0; i < err_col; i++)
elem_info_[i].erase(elem_info_[i].begin()+row_num);
}
然后是输出函数,这个函数的稳定性保证了缓存中的改变能够及时进入文件
其中运行宏,vector越界,合格的log,都已经提到过
这里唯一要考虑的是,删除后的输出,由于对象元素个数和vector中元素个数并不一致,从而我们要思考这个上限;
中途考虑了很多中情况,不如加一个动态改变输出数量的分支。
void Table::PrintInfo() const
{
#ifdef _LOC_
std::clog << "table print once" << endl;
#endif
cout << col_num_ << ' ' << elem_num_ << endl;
for (int i = 0; i < col_num_; i++)
{
cout << col_info_[i].first << ' ' << col_info_[i].second << ' ';
#ifdef _LOC_
std::clog << "elem num is " << elem_num_ << " and before deleting: "<< elem_info_[0].size() << endl;
#endif
// int tmp = elem_num() < elem_info_[0].size() ? elem_num() : elem_info_[0].size();
// after deleting, use the elem_info[0], if insert is haulting, use elem_num; vector size is quite dangerous, use elem_num is somehow
int tmp = elem_num();
#ifdef _TEST5_//the rang is very strange, so we spent lots of time on it.
std::clog << "print info: " << tmp << endl;
int dcnt = 0;
for (int i = 0; i < isDelete.size(); i++)
if (isDelete[i])
std::clog << "isDelete[" << i << "] is true; " << endl;
#endif
#ifdef _TEST5_//the rang is very strange, so we spent lots of time on it.
std::clog << "print info: " << tmp << endl;
#endif
for (int j = 0; j < tmp; j++)
if (!isDelete[j])
{
cout << elem_info_[i][j].first << ' ';
#ifdef _TEST5_
std::clog << "print info: " << tmp << endl;
#endif
}
// else if (j == 0) tmp++;
else tmp++;
cout << endl;
}
}
值得注意的还有一个编译时函数
它符合我们对于调试监视可见+运转高效(因为没有断点)的两重特性。
#ifdef _TEST5_
void WATCH()
{
for (int i = 0; i < cache.elem_info_.size(); i++)
{
for(int j = 0; j < cache.elem_info_[i].size(); j++)
std::clog << cache.elem_info_[i][j].first << ' ';
cout << std::endl;
}
}
#endif
创建函数
- 通过字符串的操作,我们将文件名称有效地确定下来,之后我们还要用到这种技术。
- 我们存储到硬盘中的文件,是以文件夹为数据库,文件作为表格的。同时如果有重名table,已经在后来的版本中加入了重名的判定。
void CreateDatabase(const string& name)
{
string mkdir = "mkdir " + name;
if (!system(mkdir.c_str()))
{
cout << "Query OK, 1 row affected ";
time_cnt::end();
cout << endl;
}
}
void CreateTable(const string& name, const ColInfo& column_info)
{
if (cur_db.size() == 0)
{ error("No database selected"); return; }
string file_path = cur_db + "\\" + name;
if (exist(file_path))
{ error("file already exists"); return;}
freopen(file_path.c_str(), "w", stdout);
column_info.PrintInfo();// if we want to remain the const of table_info, we should define the print() as const.
freopen("CON", "w", stdout);
cout << "Query OK, 0 rows affected ";
time_cnt::end();
cout << endl;
}
增删改查
接下来具体分析几个操作函数。这些函数都是在缓存的基础上作的。
增insert
关于增添比较后悔的一点,就是为了降常数,把添加元素和读入放在了一起。实际上vector的添加是O(1)的啊(っ °Д °;)っ
最后消耗了大量的功夫来进行除错。不仅写了ClearError
函数,同时还在此花费了大量的功夫来调试。
写到这里,发现了更后悔的一点,就是没有在debug结束之后立马开始写这份博客!!!真的什么都记不清了好吗!!!!
void Insert(string table_name, vector<string> col_name)
{
if (!cur_db.size())
{ error("no database used"); return;}
string file_path = cur_db + "\\" + table_name;//default
if (!exist(file_path))
{ error("table ", table_name, " doesn't exists"); return; }
if (cur_tb == table_name) // to use cache to just insert into cache
{
cache.SetColNum(cache.col_num());
if (cache.col_num() != col_name.size())
{ error("in insert: param num not fit"); cur_tb = ""; return;}
for (int n_col = 0; n_col < cache.col_num(); n_col++)
{
if (cache.col_len(n_col) && !trim(col_name[n_col]))
{
error("insert syntax wrong: varchar to be inserted illegal");
cur_tb = ""; cache.ClearError(cache.elem_num()+1, n_col);
return;
}//if returns in these two interfaces, we must clear the polluted data.
if (cache.col_len(n_col) && col_name[n_col].size() > cache.col_len(n_col))
{
error("Elem to be inserted illegal: too long");
cur_tb = ""; cache.ClearError(cache.elem_num()+1, n_col);
return;
}//if returns here, the data which has been inserted was deprecated. for the elem_num_ remain unchanged.
cache.AddElem(n_col, cache.elem_num(), col_name[n_col]);
}
}
else {
freopen(file_path.c_str(), "r", stdin);
int tmp_num;
cin >> tmp_num; cache.SetColNum (tmp_num);
cin >> tmp_num; cache.SetElemNum(tmp_num);
string tmp_name; int tmp_length; string tmp_elem;
for (int n_col = 0; n_col < cache.col_num(); n_col++)
{
cin >> tmp_name >> tmp_length;
cache.col_info_[n_col].first = tmp_name;//don't try to use add col: just here, for the cache needs to be cleared.
cache.col_info_[n_col].second = tmp_length;
for (int j = 0; j < cache.elem_num(); j++)
{
cin >> tmp_elem;
cache.SetElem(n_col, j, tmp_elem);
}
string c = col_name[n_col];
if (cache.col_len(n_col))// if this colomn's type is varchar
if (!trim(c))
{
error("insert syntax wrong: varchar to be inserted illegal");
freopen("CON", "r", stdin);
cache.ClearError(cache.elem_num()+1, n_col);
return;
}
if (tmp_length && col_name[n_col].size() > tmp_length)
{
error("Elem to be inserted illegal: too long");
freopen("CON", "r", stdin); cur_tb = "";
cache.ClearError(cache.elem_num()+1, n_col);
return;
}//if returns here, the data which has been inserted was deprecated. for the elem_num_ remain unchanged.
cache.elem_info_[n_col].resize(cache.elem_num()+1);
cache.elem_info_[n_col][cache.elem_num()].first = c;
}
cur_tb = table_name;// renew only when everything is right.
}
freopen("CON", "r", stdin);
cache.SetElemNum(cache.elem_num()+1);
cache.isDelete.resize(cache.elem_num_ * 2);
#ifdef _TEST5_
WATCH();
#endif
freopen(file_path.c_str(), "w", stdout);
cache.PrintInfo();
freopen("CON", "w", stdout);
cout << "Query OK, 1 row affected ";
time_cnt::end(); cout << endl;
}
为了保证插入的变量的合法性,确保插入的varchar
型变量有合法的分号和适宜的长度,我们在这里添加了一个trim函数,保证在操作字符串的同时,能够返回字符串是否合法的信息。感觉这算一个亮点:
同样是一个find
算法和string
容器的好例子,感觉拿来作分支练习也不错。
bool trim(string & str)
{
int pos1 = str.find_first_of("\""), pos2 = str.find_first_of("\'");
if (pos1 == string::npos && pos2 == string::npos)
return false;
else if (pos1 != string::npos && pos2 != string::npos)
return false;
else if (pos1 != string::npos && pos1 == 0)
{
//erase the first and last "
str.erase(pos1, 1);
pos1 = str.find_first_of("\"");
if (pos1 != string::npos && pos1 == str.size()-1)
str.erase(pos1, 1);
else return false;
//to see if there other non-alnum chars
for (auto it : str)
if (!isalnum(it))
return false;
}
else if (pos2 != string ::npos && pos2 == 0)
{
//erase the first and last "
str.erase(pos2, 1);
pos2 = str.find_last_of("\'");
if (pos2 != string::npos && pos2 == str.size()-1)
str.erase(pos2, 1);
else return false;
//to see if there other non-alnum chars
for (auto it : str)
if (!isalnum(it))
return false;
}
return true;
}
删delete
删除主要是通过标志位。这个标志位的添加导致了我们的输出函数的循环上限遭受重大的危机。同时添加的函数也要考虑到这个添加带来的越界情况。
void Delete(string table_name, Clause where)
{
if (!cur_db.size())
{ error("1046 (3D000): No database selected"); return; }
string file_path = cur_db + "\\" + table_name;
if (!exist(file_path))
{ error("table ", table_name, " doesn't exists"); return; }
#ifdef _TEST4_
cout << "in delete: " << "table_name: " << table_name << " cur_tb: " << cur_tb << endl;
#endif
if (cur_tb != table_name)
{
freopen(file_path.c_str(), "r", stdin);
cache.InitRead();
freopen("CON", "r", stdin);
// cur_tb = table_name;//unnecessary daylight
}
int bc = cache.FindCol(where.name);
if (bc == -1) { error("in Delete(): col to judge (delete) invalid "); return; }
if (cache.col_len(bc) && !trim(where.value))
{ error("in Delete()'s where: varchar value wrong"); return; }
bool (*p)(const string&, const string&);
if (where.op == "=")
p = equal_to;
else if (where.op == ">")
p = greater;
else if (where.op == ">=")
p = greater_equal;
else if (where.op == "<")
p = less;
else if (where.op == "<=")
p = less_equal;
else if (where.op == "!=")
p = not_equal_to;
else
error("unknown where operator type when Select()");
int row_cnt = 0;
for (auto it : cache.elem_info_[bc])
if (p(it.first,where.value))
cache.EraseElem(it.second), row_cnt++;
freopen(file_path.c_str(), "w", stdout);
cache.PrintInfo();
freopen("CON", "w", stdout);
cur_tb = "";//because the elem in cache is quite different from that in the file, we'd better do this.
for (auto it : cache.isDelete)
it = false;
cout << "Query OK, " << row_cnt << " rows affected ";
time_cnt::end(); cout << endl;
}
改update
操作都大同小异,在某种程度上,我们或许可以说,这是几个函数中最简单的了。唯一要保证虽然满足条件,但目标元素的值已经是newvalue的情况是不用改的。
void Update(string table_name, string col_name, string newvalue, Clause where)
{
if (!cur_db.size())
{ error("1046 (3D000): No database selected"); return; }
string file_path = cur_db + "\\" + table_name;
if (!exist(file_path))
{ error("in Update(): Table ", table_name, " doesn't exists"); return; }
if (cur_tb != table_name)
{
freopen(file_path.c_str(), "r", stdin);
cache.InitRead();
freopen("CON", "r", stdin);
cur_tb = table_name;
}
#ifdef _TEST4_
cout << "in update: " << "table_name: " << table_name << "cur_tb: " << cur_tb << endl;
#endif
int nc = cache.FindCol(col_name);
if (nc == -1)
{ error("in Update(): col to update invalid"); return;}
if (cache.col_len(nc) && !trim(newvalue))
{ error("#ERROR in Update(): varchar newvalue wrong"); return; }
int bc = cache.FindCol(where.name);
if (bc == -1) { error("#ERROR: col to judge (update) invalid "); return; }
if (cache.col_len(bc) && !trim(where.value))
{ error("in Update()'s where clause: varchar value wrong"); return; }
bool (*p)(const string&, const string&);
if (where.op == "=")
p = equal_to;
else if (where.op == ">")
p = greater;
else if (where.op == ">=")
p = greater_equal;
else if (where.op == "<")
p = less;
else if (where.op == "<=")
p = less_equal;
else if (where.op == "!=")
p = not_equal_to;
else
error("unknown where operator type when Select()");
int row_cnt = 0;
for (auto it : cache.elem_info_[bc])
if (p(it.first, where.value))
if (cache.GetElem(nc, it.second).first != newvalue)
cache.SetElem(nc, it.second, newvalue), row_cnt++;
freopen(file_path.c_str(), "w", stdout);
cache.PrintInfo();
freopen("CON", "w", stdout);
cout << "Query OK, " << row_cnt << " rows affected ";
time_cnt::end();
//TimeCount();
cout << endl;
cout << "Rows matched: " << row_cnt << " Changed: 0 Warnings: 0 ";
cout << endl << endl;
}
查select
这个查找是最能够反映我们设计的多维vector结构的优越性的了,以O(n)的代价查找,以(列数-1)个O(1)的代价迅速对应起很多列。这也是唯一需要进行两部查找的函数。不仅要找where子句中判据,也要找选取的列是哪些列。
void Select(string table_name, vector<string> item, Clause where)
{
if (!cur_db.size())
{ error("1046 (3D000): No database selected"); return; }
string file_path = cur_db + "\\" + table_name;
if (!exist(file_path))
{ error("table ", table_name, " doesn't exists"); return; }
#ifdef _TEST4_
std::clog << "current table name is " << cur_tb << endl;
#endif
if (cur_tb != table_name)
{
freopen(file_path.c_str(), "r", stdin);
cache.InitRead();
freopen("CON", "r", stdin); //here we didn't get the stdin back, which caused a bad loops
cur_tb = table_name;//evething has been put in, then renew;(somehow different from the Insert()): possibly because we need to output once a time
}
#ifdef _LOC_
std::clog << "current table name is " << cur_tb << endl;
#endif
//select columns to be put
vector<int> item_col(cache.col_num()); int tmp, sel_cnt = 0;
#ifdef _LOC_
std::clog << item[0] << endl;
#endif
if (item[0] == "*")
for (int i = 0; i < cache.col_num(); i++) item_col[i] = i, sel_cnt++;
else for (int i = 0; i < item.size(); i++)
{
tmp = cache.FindCol(item[i]);
if (tmp != -1)
item_col[i] = tmp, sel_cnt++;
}
if (sel_cnt == 0)
{
cout << "empty set" << endl;
return;
}
#ifdef _LOC_
if (item[0] == "*")
for (int i = 0; i < item_col.size(); i++)
cout << "item_col[" << i << "] = " << item_col[i]<< endl;
std::clog << sel_cnt << " col(s) selected" << endl;
#endif
vector<int> col_len(sel_cnt);
for (int i = 0; i < sel_cnt; i++)
{
if (!cache.col_info_[item_col[i]].second)
col_len[i] = 10;
else col_len[i] = (cache.col_info_[item_col[i]].first.size() > cache.col_info_[item_col[i]].second
? cache.col_info_[item_col[i]].first.size() : cache.col_info_[item_col[i]].second) + 1;
}
//select rows to be put
if (!where.op.size()) // no more rules
{
PrintHead(col_len, item_col);
for (int i = 0; i < cache.elem_num(); i++)
PrintLine(col_len, item_col, i);
PrintTail(col_len);
if (cache.elem_num())
cout << cache.elem_num() << " rows in set ";
else cout << "empty set ";
time_cnt::end();
cout << endl;
}
else if (where.op.size())// where branch
{
int col_base, hasnt = 1;
col_base = cache.FindCol(where.name);
if (col_base == -1) { error("Invalid name in where clause."); return; } //assignment and judge......
if (cache.col_len(col_base) && !trim(where.value))
{ error("in Select(): where varchar value wrong"); return; }
for (auto it : where.value)
if (!isalnum(it))
{ error("Invalid value in where clause."); return; }
bool (*p)(const string&, const string&);//choose the right mode to act as the cmp function
if (where.op == "=")
p = equal_to;
else if (where.op == ">")
p = greater;
else if (where.op == ">=")
p = greater_equal;
else if (where.op == "<")
p = less;
else if (where.op == "<=")
p = less_equal;
else if (where.op == "!=")
p = not_equal_to;
else
error("unknown where operator type when Select()");
int row_cnt = 0;
for (auto it : cache.elem_info_[col_base])
if (p(it.first, where.value))
{
if (hasnt)
PrintHead(col_len, item_col);
PrintLine(col_len, item_col, it.second);
row_cnt++;
hasnt = 0;
}
if (hasnt)
cout << "Empty set ";
else
{
PrintTail(col_len);
cout << row_cnt << "row(s) in set ";
}
time_cnt::end(); cout << endl;
}
else error("select wrong in both modes");
#ifdef _TEST5_
WATCH();
#endif
}
还是要附上我们在其中调用的几个输出函数。尽管平淡无奇。
/**
* @author: fhn
* @date: 5/4
* @description: print the upper chart line of a table and the gauge outfit.
* @version:
*/
void PrintHead(vector<int> col_len, vector<int> item_col)
{
cout << " ┌";
for (int i = 0; i < col_len.size(); i++)
{
for (int j = 0; j < col_len[i]; j++) cout << "─";
if (i < col_len.size() - 1)
cout << "┬";
else cout << "┐";
}
cout << endl;
cout << " │";
for (int i = 0; i < col_len.size(); i++)
{
int tmp = col_len[i]-cache.col_info_[item_col[i]].first.size();
for (int j = 0; j < tmp/2; j++) cout << ' ';
if (tmp%2) cout << ' ';
cout << cache.col_info_[item_col[i]].first;
for (int j = 0; j < tmp/2; j++) cout << ' ';
cout << "│";
}
cout << endl;
}
/**
* @author: fhn
* @date: 5/4
* @description: in select(), print the body info of table. upper chart line and a row of info
* @version: v2.0 :
* v2.1: add the quotation mark of varchar
*/
void PrintLine(vector<int> col_len, vector<int> item_col, int row_index)
{
cout << " ├";
for (int i = 0; i < col_len.size(); i++)
{
for (int j = 0; j < col_len[i]; j++) cout << "─";
if (i < col_len.size() - 1)
cout << "┼";
else cout << "┤";
}
cout << endl;
cout << " │";
for (int i = 0; i < col_len.size(); i++)
{
int tmp = col_len[i]-(cache.GetElem(item_col[i], row_index)).first.size();
if (cache.col_len(item_col[i]))// if it's varchar
tmp -= 2;
for (int j = 0; j < tmp/2; j++) cout << ' ';
if (tmp%2) cout << ' ';
if (cache.col_len(item_col[i]))// if it's varchar
cout << '\"' << cache.GetElem(item_col[i], row_index).first << '\"';
else cout << cache.GetElem(item_col[i], row_index).first;
for (int j = 0; j < tmp/2; j++) cout << ' ';
cout << "│";
}
cout << endl;
}
void PrintTail(vector<int> col_len)
{
cout << " └";
for (int i = 0; i < col_len.size(); i++)
{
for (int j = 0; j < col_len[i]; j++) cout << "─";
if (i < col_len.size() - 1)
cout << "┴";
else cout << "┘";
}
cout << endl;
}
效率分析
设列为 m m m,行为 n n n,需要操作个数为 k i k_i ki
操作类型 | 缓存中效率 | 未读时效率 |
---|---|---|
增加元素 | O ( m ) O(m) O(m) | O ( m ∗ n ) O(m*n) O(m∗n)常数较小 |
删除元素 | O ( n ) + k O ( 1 ) O(n)+kO(1) O(n)+kO(1) | O ( n ) + k O ( 1 ) O(n)+kO(1) O(n)+kO(1) |
改动元素 | O ( n ) + k O ( 1 ) O(n)+kO(1) O(n)+kO(1) | O ( n ) + k O ( 1 ) O(n)+kO(1) O(n)+kO(1) |
查询元素 | O ( n ) + k 1 O ( k 2 ) O(n)+k_1O(k_2) O(n)+k1O(k2) | O ( n ) + k 1 O ( k 2 ) O(n)+k_1O(k_2) O(n)+k1O(k2) |
整体在查询方面效率不如B+树等数据结构。始终需要遍历。但是这种稳定的结构带来的好处就是
- 当我们需要进行多列操作时,就可以在一次操作执行完成之后以常数级别的时间迅速对应其他列。
- 总体复杂度稳定在
O(n)
的量级,且常数普遍较小。尤其在缓存中、且查询列数较多的情况下比较明显。
debug与其他思考
函数指针与代码优化
本来不用函数指针的话,我们在三个带有where子句的函数中都要使用大量的分支来保证我们的比较法则是正确的。但使用了函数指针之后,我们只需要在头文件中添加相应的比较函数,从而在相应的函数中只需要对函数指针的选取使用分支。
例如:
if (where.op == "=")
p = equal_to;
else if (where.op == ">")
p = greater;
else if (where.op == ">=")
p = greater_equal;
else if (where.op == "<")
p = less;
else if (where.op == "<=")
p = less_equal;
else if (where.op == "!=")
p = not_equal_to;
else
error("unknown where operator type when Select()");
中途退出的特判问题
这个问题在insert中犹为突出,先前已经提到了。或许我们应该牺牲常数,换一点和谐而封闭性更好的代码。freopen
、vector size之类的家伙并不好惹。
多向量结构与标志位
多个向量要兼容全面地考虑,严防越界,这是一个很令人厌恶的bug。
函数与封装
小型的程序,为了编写的顺畅,或许还可以先写成不封装的形式,最后进行封装。但是我们的大程序的难处或许就在于必须要实现设想好这一系列函数的关系。
- 尽可能使用void函数,这将减小我们的兼容问题。
- 参数应该事先想好,但或许正是因为不容易想好,大作业才难写罢
文件流的再思考
文件流的异动,或许和变量声明一样,应该尽可能晚地生成开始操作。
异常
- 在初次构建程序的时候,记录可能出现问题的地方,但不作苛刻的测试,将设计、编码和测试分离开来。
- 这个记录是很重要的,否则在正式的边界测试的时候,很难想到如此多而刁钻的边界情况。
能不使用异常其实可以少用。因为暂时我们对异常的了解实在有限,一个异常流就可以把人搞死。