C++实现文件搜索小工具
文件快速定位器:可支持 1) 文件名(全部汉字或部分汉字),2) 文件名拼音,3) 文件名首字母,4)汉字和拼音, 汉字和首字母混在一起的搜索。 功能:输入要搜索文件,在终端上显示出相匹配的文件名和文件所在路径。
一、项目背景
背景:Linux环境下有find命令,用来查找文件便捷高效,而windows下文件框的默认搜索是搜索时再进行暴力查找,非常的慢,不能进行全局搜索,对于长时间不用记不清名字的文件来说,查找非常困难。
因此此文中的文件快速定位器是在搜索软件Everything的基础上结合QQ,百度等搜索工具的部分功能的基础上来实现的。
调查:如下图everything搜索软件界面和实现功能,它将文档信息检索以后,先存到数据库,查找时在数据库进行搜索,速度就快了很多(数据库中文件存储的底层数据结构为红黑树或者哈希表),软件实现了按文件名关键字的查找。
而QQ等软件中的搜索可支持汉字搜索,拼音搜索,首字母搜索,并做到关键字高亮处理。
实现:
文件快速定位器:在windows下实现指定路径下文件的汉字搜索、拼音搜索、首字母搜索并高亮显示搜索出来的关键字。即:在输入要查找的文件关键字,在终端上打印出包含关键字的文件名和文件路径,并高亮显示关键字。
汉字搜索:第二次课
拼音搜索:diercike
首字母搜索: DECK
二、项目框架
1.数据扫描模块(扫描+监控)
1)扫描:获取指定路径下的所有文档
把给定路径下的文档的名称的路径扫描出来,放到容器里面等待处理。
使用C语言下#include <io.h>中的接口_findfirst _findnext来查找文件,_findfirst _findnext的使用方法、接口参数设置和返回值详细参考博文:https://blog.csdn.net/damant/article/details/50833845
2)监控:本地与数据库文件对比
2.数据持久化
1)数据库使用
使用轻量级的数据库sqlit 3,数据库简介 https://www.runoob.com/sqlite/sqlite-intro.html
数据库官网:https://www.sqlite.org/index.html
数据库的打开、关闭、创建表格、SQL语句执行
创建一张供搜索用的表即可
create table if not exists %s (id INTEGER PRIMARY KEY, path text, name text,
name_pinyin text, name_initials text);
2)数据管理
数据库初始化、建表、增加或删除数据
3.搜索
汉字、拼音和首字母搜索。采用数据库Like语句实现模糊匹配检索。
高亮显示。自主实现高亮算法
4.其他套用模块
trace日志模块(记录错误,找到出错点 返回错误码)
高亮打印一段字符串
汉字转拼音全拼(UTF-8)
汉字转拼音首字母(UTF-8)
三、数据扫描模块
1.扫描指定路径下文件(_findfirst _findnext)
static void DirectoryList(const string& path, vector<string>& dirs, vector<string>& files)
{
_finddata_t file; //存放文件信息的结构体的指针
const string path_ = path + "\\*.*"; //此符号为遍历path下的所有文档,路径格式
intptr_t handle = _findfirst(path_.c_str(), &file); //返回查找用的句柄
if (handle == -1)
{
cout << "_findfirst: " << path << "error" << endl;
return;
}
do
{
//_A_ARCH(存档)_A_HIDDEN(隐藏)_A_NORMAL(正常)_A_RDONLY(只读)_A_SUBDIR(文件夹)
//_A_SYSTEM(系统)
if (file.attrib & _A_SUBDIR) //判断此目录为文件夹可递归进去
{
if ((strcmp(file.name, ".") != 0)
&& (strcmp(file.name, "..") != 0))
{ //存放文件夹等目录文件
dirs.push_back(file.name);
}
}
else
{ //存放最终文件
files.push_back(file.name);
}
} while (_findnext(handle, &file) == 0);
}
2.扫描模块管理
//ScanManager设计成单例,整个程序只有一个扫描模块
//多线程执行扫描
class ScanManager
{
public:
void Scan(const string& path); //扫描对比 文件系统和数据库的文件
void StartScan()
{
while (1)
{
Scan("F:\\bite学习"); //设定扫描目录
std::this_thread::sleep_for(std::chrono::seconds(5)); //5s扫一次
}
}
static ScanManager* CreateIntance()
{
static ScanManager scanmgr;
static std::thread thd(&StartScan, &scanmgr);
thd.detach();
return &scanmgr;
}
private:
ScanManager() //构造函数私有化
{
}
ScanManager(const ScanManager&);
//vector<string> entrys; //多线程多个目录遍历扫
};
扫描模块设定成单例模式,使整个项目只有一个扫描模块,线程持续重复扫描指定目录。
扫描模块具体实现
void ScanManager::Scan(const string& path)
{
//比对 文件系统和数据库
vector<string> localdirs; //目录
vector<string> localfiles; //文件
DirectoryList(path, localdirs, localfiles); //获取path下所有文件信息
std::set<string> localset;
localset.insert(localfiles.begin(), localfiles.end());
localset.insert(localdirs.begin(), localdirs.end());
//写入set里内部排序比对
std::set<string> dbset; //DataManager
DataManager::GetInstance()->GetDoc(path, dbset); //查看数据库里的数据
auto localit = localset.begin();
auto dbit = dbset.begin();
while (localit != localset.end() && dbit != dbset.end())
{
if (*localit < *dbit) //本地右,数据没有 -> 给数据库增加数据
{
//数据库新增数据
DataManager::GetInstance()->InsertDoc(path, *localit);
++localit;
}
else if (*localit > *dbit) //本地没有,数据库有 -> 删除数据库中的数据
{
//数据库中数据删除
DataManager::GetInstance()->DeleteDoc(path, *dbit);
++dbit;
}
else
{
++localit;
++dbit;
}
}
while (localit != localset.end())
{
//数据库新增数据
DataManager::GetInstance()->InsertDoc(path, *localit);
++localit;
}
while (dbit != dbset.end())
{
//删除数据库中数据
DataManager::GetInstance()->DeleteDoc(path, *dbit);
++dbit;
}
//递归扫描对比子目录数据
for (const auto& subdirs : localdirs)
{
string subpath = path;
subpath += '\\';
subpath += subdirs; //搜索path下的子目录绝对路径
Scan(subpath); //递归比对子目录的所有文件
}
}
把本地和数据库中的文件信息放入set(自动排序)里面,根据两个set里的数据迭代器对比,本地有数据库没有,数据库新增数据;本地没有数据库有,数据库删除数据; 递归对比所有子目录下的文件。
四、数据持久化
把数据放入数据库(本项目选用轻量级数据库sqlite 3),在数据库的基础上处理数据。
1.数据库的使用
1)添加外部库文件(sqlite 3)
使用外部编辑下载好的sqlite3.h文件
#include "./sqlite-amalgamation-3280000/sqlite3.h" //链接sqlite数据库
首先,把库文件放到工程目录下,然后,在VS 2013工程下的解决方案资源管理器中 ->工程名 -> 属性 -> 链接器/常规 -> 附加库目录(添加外部库的路径) -> 点击输入项 -> 附加依赖项(添加上sqlite3.h文件)。
操作数据库:
class SqliteManager {
public:
SqliteManager()
:_db(nullptr)
{}
~SqliteManager()
{
Close();
}
void Open(const string& path);
void Close();
void ExecuteSql(const string& sql);
//采用RAII 实现 打开数据库后的自动关闭
void GetTable(const string& sql, int& row, int& col, char**& ppRet);
SqliteManager(const SqliteManager&) = delete;
SqliteManager& operator=(const SqliteManager&) = delete;
private:
sqlite3* _db; // 数据库对象
};
//RAII
//来代替数据库的主动释放
class AutoGetTable
{
public:
AutoGetTable(SqliteManager& sm, const string& sql, int& row, int& col, char**& ppRet)
{
sm.GetTable(sql, row, col, ppRet);
_ppRet = ppRet;
}
~AutoGetTable()
{
sqlite3_free_table(_ppRet);
}
AutoGetTable(const AutoGetTable&) = delete; //防止两个对象释放一个
AutoGetTable& operator=(const AutoGetTable&) = delete; //防拷贝,模拟c++11 uniqueptr 的实现
private:
char** _ppRet;
};
实现数据库的打开、执行、关闭。 使用RAII来主动创建数据库,并free数据库,避免出现没存泄露。
2.数据库数据管理
数据库中数据的管理包括:数据库初始化(打开),获取数据库中的数据,数据库中插入删除数据,关键字搜索,关键字高亮。
//为方便加锁,设计成单例模式
class DataManager
{
public:
static DataManager* GetInstance()
{
static DataManager datamgr;
datamgr.Init();
return &datamgr;
}
void Init(); //建表,打开数据库
void GetDoc(const string& path, std::set<string>& dbset); //查找path下的所有子文档
void InsertDoc(const string& path, const string& name);
void DeleteDoc(const string& path, const string& name);
void Search(const string& key, vector<std::pair<string, string>>& docinfos);
void SetHightLight(const string& str, const string& key, string& prefix,
string& highlight, string& suffix);
private:
DataManager() //构造函数私有
{}
SqliteManager _dbmgr;
std::mutex _mtx;
};
设计成单例模式,保证同一时间只进行几次数据库操作,防止数据丢失或多余。
五、关键字搜索高亮处理
1.关键字搜索
包括:汉字搜索、汉字拼音全拼搜索、汉字拼音首字母搜索。
使用SQL语句中的like模糊匹配功能实现: 由于编译环境和windows系统都 采用UTF-8中文编码,与数据库编码方式不同,所以汉字存入数据库中换出现乱码现象,而编码转换过于复杂,因此,此处搜索部分对汉字的搜索都通过把汉字转换成汉字全拼或拼音首字母来进行like模糊匹配。
//拼音匹配搜索
select name, path from %s where name_pinyin like '%%%s%%';
//拼音首字母like匹配搜索
select name, path from %s where name_initials like '%%%s%%';
搜索实现代码如下:
//搜索开始
void DataManager::Search(const string& key, vector<std::pair<string, string>>& docinfos)
{
char sql[256] = { '\0' };
{ //拼音搜索
string pinyin = ChineseConvertPinYinAllSpell(key);
//首字母搜索
string initials = ChineseConvertPinYinInitials(key);
sprintf(sql, "select name, path from %s where name_pinyin or name_initials like '%%%s%%' or '%%%s%%'",
TB_NAME, pinyin.c_str(), initials.c_str());
int row, col;
char** ppRet;
AutoGetTable agt(_dbmgr, sql, row, col, ppRet);
for (int i = 1; i <= row; i++)
{
docinfos.push_back(std::make_pair(ppRet[i * col], ppRet[i * col + 1]));
}
}
}
2.关键字高亮
对一串字符串中的某段关键字字符串高亮显示,实现高亮算法,并借助高亮实现函数来实现。
1)针对查找关键字为汉字的高亮
//1. key 是原串字串
{ //原串中查找关键字的位置
size_t ht_start = str.find(key);
if (ht_start != string::npos)
{
prefix = str.substr(0, ht_start);//关键字前的串
highlight = key; //高亮关键字
suffix = str.substr(ht_start + key.size(), string::npos);//关键字后的串
return;
}
}
2)针对查找关键字为汉字的拼音全拼的高亮
//2. key是搜索部分串的全拼
{
string key_ap = ChineseConvertPinYinAllSpell(key);
string str_ap = ChineseConvertPinYinAllSpell(str); //ap 代表 allspell
size_t ht_index = 0;
size_t ap_index = 0;
size_t ht_start = 0, ht_len = 0;
size_t ap_start = str_ap.find(key); //高亮部分的起点
if (ap_start != string::npos)
{
size_t ap_end = ap_start + key_ap.size(); //高亮部分的终点
while (ap_index < ap_end)
{
if (ap_index == ap_start)
{
ht_start = ht_index;
}
if (str[ht_index] >= 0 && str[ht_index] <= 127)
{
++ht_index;
++ap_index;
}
else
{
char chinese[3] = { '\0' };
chinese[0] = str[ht_index];
chinese[1] = str[ht_index + 1];
ht_index += 2; //跳过一个汉字的长度
//gbk 编码汉字占两个字节
string ap_str = ChineseConvertPinYinAllSpell(chinese);
ap_index += ap_str.size(); //跳过一个汉字拼音的长度
}
}
ht_len = ht_index - ht_start;
prefix = str.substr(0, ht_start);
highlight = str.substr(ht_start, ht_len);
suffix = str.substr(ht_start + ht_len, string::npos);
return;
}
}
算法具体实现:
3)针对查找关键字为汉字的拼音全拼的首字母的高亮
//3. key是拼音首字母
{
string init_str = ChineseConvertPinYinInitials(str);
string init_key = ChineseConvertPinYinInitials(key);
size_t init_start = init_str.find(init_key);
if (init_start != string::npos)
{
size_t init_end = init_start + init_key.size();
size_t init_index = 0, ht_index = 0;
size_t ht_start = 0, ht_len = 0;
while (init_index < init_end)
{
if (init_index == init_start)
{
ht_start = ht_index;
}
//字符
if (str[ht_index] >= 0 && str[ht_index] <= 127)
{
++ht_index;
++init_index;
}
else //汉字
{
ht_index += 2;
++init_index;
}
}
ht_len = ht_index - ht_start;
prefix = str.substr(0, ht_start);
highlight = str.substr(ht_start, ht_len);
suffix = str.substr(ht_start + ht_len, string::npos);
return;
}
}
ERROE_LOG("spilt highlight no match. str:%s, key:%s\n", str.c_str(), key.c_str());
prefix = str;
}
算法具体实现:
六、套用模块
1.trace日志:
//trace日志 找到出错点,返回错误码
static std::string GetFileName(const std::string& path)
{
char ch = '/';
#ifdef _WIN32
ch = '\\';
#endif
size_t pos = path.rfind(ch);
if (pos == std::string::npos)
return path;
else
return path.substr(pos + 1);
}
//用于调试追溯的trace log
inline static void __TraceDebug(const char* filename, int line, const char* function, const char* format, ...)
{
#ifdef __TRACE__
//输出调用函数的信息
fprintf(stdout, "[TRACE][%s:%d:%s]:", GetFileName(filename).c_str(), line, function);
//输出用户打的trace信息
va_list args;
va_start(args, format);
vfprintf(stdout, format, args);
va_end(args);
fprintf(stdout, "\n");
#endif
}
inline static void __ErrorDebug(const char* filename, int line, const char* function, const char* format, ...)
{
#ifdef __DEBUG__
//输出调用函数的信息
fprintf(stdout, "[ERROR][%s:%d:%s]:", GetFileName(filename).c_str(), line, function);
//输出用户打的trace信息
va_list args;
va_start(args, format);
vfprintf(stdout, format, args);
va_end(args);
fprintf(stdout, " errmsg:%s, errno:%d\n", strerror(errno), errno);
#endif
}
#define TRACE_LOG(...) \
__TraceDebug(__FILE__, __LINE__, __FUNCTION__, __VA_ARGS__);
#define ERROE_LOG(...) \
__ErrorDebug(__FILE__, __LINE__, __FUNCTION__, __VA_ARGS__);
2.高亮打印一段字符串str
static void ColourPrintf(const char* str)
{
// 0-黑 1-蓝 2-绿 3-浅绿 4-红 5-紫 6-黄 7-白 8-灰 9-淡蓝 10-淡绿
// 11-淡浅绿 12-淡红 13-淡紫 14-淡黄 15-亮白
//颜色:前景色 + 背景色*0x10
//例如:字是红色,背景色是白色,即 红色 + 亮白 = 4 + 15*0x10
WORD color = 5 + 6 * 0x10;
WORD colorOld;
HANDLE handle = ::GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_SCREEN_BUFFER_INFO csbi;
GetConsoleScreenBufferInfo(handle, &csbi);
colorOld = csbi.wAttributes;
SetConsoleTextAttribute(handle, color);
printf("%s", str);
SetConsoleTextAttribute(handle, colorOld);
}
3.汉字转拼音全拼
/*
* CSDN:http://blog.csdn.net/csnd_ayo
*/
static string ChineseConvertPinYinAllSpell(const std::string& dest_chinese)
{
static const int spell_value[] = { -20319, -20317, -20304, -20295, -20292, -20283,
-20265, -20257, -20242, -20230, -20051, -20036, -20032, -20026,
-20002, -19990, -19986, -19982, -19976, -19805, -19784, -19775, -19774, -19763,
-19756, -19751, -19746, -19741, -19739, -19728,
-19725, -19715, -19540, -19531, -19525, -19515, -19500, -19484, -19479, -19467,
-19289, -19288, -19281, -19275, -19270, -19263,
-19261, -19249, -19243, -19242, -19238, -19235, -19227, -19224, -19218, -19212,
-19038, -19023, -19018, -19006, -19003, -18996,
-18977, -18961, -18952, -18783, -18774, -18773, -18763, -18756, -18741, -18735,
-18731, -18722, -18710, -18697, -18696, -18526,
-18518, -18501, -18490, -18478, -18463, -18448, -18447, -18446, -18239, -18237,
-18231, -18220, -18211, -18201, -18184, -18183,
-18181, -18012, -17997, -17988, -17970, -17964, -17961, -17950, -17947, -17931,
-17928, -17922, -17759, -17752, -17733, -17730,
-17721, -17703, -17701, -17697, -17692, -17683, -17676, -17496, -17487, -17482,
-17468, -17454, -17433, -17427, -17417, -17202,
-17185, -16983, -16970, -16942, -16915, -16733, -16708, -16706, -16689, -16664,
-16657, -16647, -16474, -16470, -16465, -16459,
-16452, -16448, -16433, -16429, -16427, -16423, -16419, -16412, -16407, -16403,
-16401, -16393, -16220, -16216, -16212, -16205,
-16202, -16187, -16180, -16171, -16169, -16158, -16155, -15959, -15958, -15944,
-15933, -15920, -15915, -15903, -15889, -15878,
-15707, -15701, -15681, -15667, -15661, -15659, -15652, -15640, -15631, -15625,
-15454, -15448, -15436, -15435, -15419, -15416,
-15408, -15394, -15385, -15377, -15375, -15369, -15363, -15362, -15183, -15180,
-15165, -15158, -15153, -15150, -15149, -15144,
-15143, -15141, -15140, -15139, -15128, -15121, -15119, -15117, -15110, -15109,
-14941, -14937, -14933, -14930, -14929, -14928,
-14926, -14922, -14921, -14914, -14908, -14902, -14894, -14889, -14882, -14873,
-14871, -14857, -14678, -14674, -14670, -14668,
-14663, -14654, -14645, -14630, -14594, -14429, -14407, -14399, -14384, -14379,
-14368, -14355, -14353, -14345, -14170, -14159,
-14151, -14149, -14145, -14140, -14137, -14135, -14125, -14123, -14122, -14112,
-14109, -14099, -14097, -14094, -14092, -14090,
-14087, -14083, -13917, -13914, -13910, -13907, -13906, -13905, -13896, -13894,
-13878, -13870, -13859, -13847, -13831, -13658,
-13611, -13601, -13406, -13404, -13400, -13398, -13395, -13391, -13387, -13383,
-13367, -13359, -13356, -13343, -13340, -13329,
-13326, -13318, -13147, -13138, -13120, -13107, -13096, -13095, -13091, -13076,
-13068, -13063, -13060, -12888, -12875, -12871,
-12860, -12858, -12852, -12849, -12838, -12831, -12829, -12812, -12802, -12607,
- 12597, -12594, -12585, -12556, -12359, -12346,
-12320, -12300, -12120, -12099, -12089, -12074, -12067, -12058, -12039, -11867,
-11861, -11847, -11831, -11798, -11781, -11604,
-11589, -11536, -11358, -11340, -11339, -11324, -11303, -11097, -11077, -11067,
-11055, -11052, -11045, -11041, -11038, -11024,
-11020, -11019, -11018, -11014, -10838, -10832, -10815, -10800, -10790, -10780,
-10764, -10587, -10544, -10533, -10519, -10331,
-10329, -10328, -10322, -10315, -10309, -10307, -10296, -10281, -10274, -10270,
-10262, -10260, -10256, -10254
};
// 395个字符串,每个字符串长度不超过6
static const char spell_dict[396][7] = { "a", "ai", "an", "ang", "ao", "ba", "bai",
"ban", "bang", "bao", "bei", "ben", "beng", "bi", "bian", "biao",
"bie", "bin", "bing", "bo", "bu", "ca", "cai", "can", "cang", "cao", "ce", "ceng",
"cha", "chai", "chan", "chang", "chao", "che", "chen",
"cheng", "chi", "chong", "chou", "chu", "chuai", "chuan", "chuang", "chui", "chun",
"chuo", "ci", "cong", "cou", "cu", "cuan", "cui",
"cun", "cuo", "da", "dai", "dan", "dang", "dao", "de", "deng", "di", "dian",
"diao", "die", "ding", "diu", "dong", "dou", "du", "duan",
"dui", "dun", "duo", "e", "en", "er", "fa", "fan", "fang", "fei", "fen", "feng",
"fo", "fou", "fu", "ga", "gai", "gan", "gang", "gao",
"ge", "gei", "gen", "geng", "gong", "gou", "gu", "gua", "guai", "guan", "guang",
"gui", "gun", "guo", "ha", "hai", "han", "hang",
"hao", "he", "hei", "hen", "heng", "hong", "hou", "hu", "hua", "huai", "huan",
"huang", "hui", "hun", "huo", "ji", "jia", "jian",
"jiang", "jiao", "jie", "jin", "jing", "jiong", "jiu", "ju", "juan", "jue", "jun",
"ka", "kai", "kan", "kang", "kao", "ke", "ken",
"keng", "kong", "kou", "ku", "kua", "kuai", "kuan", "kuang", "kui", "kun", "kuo",
"la", "lai", "lan", "lang", "lao", "le", "lei",
"leng", "li", "lia", "lian", "liang", "liao", "lie", "lin", "ling", "liu", "long",
"lou", "lu", "lv", "luan", "lue", "lun", "luo",
"ma", "mai", "man", "mang", "mao", "me", "mei", "men", "meng", "mi", "mian",
"miao", "mie", "min", "ming", "miu", "mo", "mou", "mu",
"na", "nai", "nan", "nang", "nao", "ne", "nei", "nen", "neng", "ni", "nian",
"niang", "niao", "nie", "nin", "ning", "niu", "nong",
"nu", "nv", "nuan", "nue", "nuo", "o", "ou", "pa", "pai", "pan", "pang", "pao",
"pei", "pen", "peng", "pi", "pian", "piao", "pie",
"pin", "ping", "po", "pu", "qi", "qia", "qian", "qiang", "qiao", "qie", "qin",
"qing", "qiong", "qiu", "qu", "quan", "que", "qun",
"ran", "rang", "rao", "re", "ren", "reng", "ri", "rong", "rou", "ru", "ruan",
"rui", "run", "ruo", "sa", "sai", "san", "sang",
"sao", "se", "sen", "seng", "sha", "shai", "shan", "shang", "shao", "she", "shen",
"sheng", "shi", "shou", "shu", "shua",
"shuai", "shuan", "shuang", "shui", "shun", "shuo", "si", "song", "sou", "su",
"suan", "sui", "sun", "suo", "ta", "tai",
"tan", "tang", "tao", "te", "teng", "ti", "tian", "tiao", "tie", "ting", "tong",
"tou", "tu", "tuan", "tui", "tun", "tuo",
"wa", "wai", "wan", "wang", "wei", "wen", "weng", "wo", "wu", "xi", "xia", "xian",
"xiang", "xiao", "xie", "xin", "xing",
"xiong", "xiu", "xu", "xuan", "xue", "xun", "ya", "yan", "yang", "yao", "ye", "yi",
"yin", "ying", "yo", "yong", "you",
"yu", "yuan", "yue", "yun", "za", "zai", "zan", "zang", "zao", "ze", "zei", "zen",
"zeng", "zha", "zhai", "zhan", "zhang",
"zhao", "zhe", "zhen", "zheng", "zhi", "zhong", "zhou", "zhu", "zhua", "zhuai",
"zhuan", "zhuang", "zhui", "zhun", "zhuo",
"zi", "zong", "zou", "zu", "zuan", "zui", "zun", "zuo"
};
std::string pinyin;
// 循环处理字节数组
const int length = dest_chinese.length();
for (int j = 0, chrasc = 0; j < length;) {
// 非汉字处理
if (dest_chinese.at(j) >= 0 && dest_chinese.at(j) < 128) {
pinyin += dest_chinese[j];
// 偏移下标
j++;
continue;
}
// 汉字处理
chrasc = dest_chinese[j] * 256 + dest_chinese[j + 1] + 256;
if (chrasc > 0 && chrasc < 160) {
// 非汉字
pinyin += dest_chinese.at(j);
// 偏移下标
j++;
}
else {
// 汉字
for (int i = (sizeof(spell_value) / sizeof(spell_value[0]) - 1); i >= 0; --i) {
// 查找字典
if (spell_value[i] <= chrasc) {
pinyin += spell_dict[i];
break;
}
}
// 偏移下标 (汉字双字节)
j += 2;
}
} // for end
return pinyin;
}
4.汉字转拼音首字母
static std::string ChineseConvertPinYinInitials(const std::string& name)
{
// 仅生成拼音首字母内容
static int secPosValue[] = {
1601, 1637, 1833, 2078, 2274, 2302, 2433, 2594, 2787, 3106, 3212,
3472, 3635, 3722, 3730, 3858, 4027, 4086, 4390, 4558, 4684, 4925, 5249
};
static const char* initials[] = {
"a", "b", "c", "d", "e", "f", "g", "h", "j", "k", "l", "m", "n", "o",
"p", "q", "r", "s", "t", "w", "x", "y", "z"
};
static const char* secondSecTable =
"CJWGNSPGCGNE[Y[BTYYZDXYKYGT[JNNJQMBSGZSCYJSYY[PGKBZGY[YWJKGKLJYWKPJQHY[W[DZLSGMRYPYWWCCKZNKYYGTTNJJNYKKZYTCJNMCYLQLYPYQFQRPZSLWBTGKJFYXJWZLTBNCXJJJJTXDTTSQZYCDXXHGCK[PHFFSS[YBGXLPPBYLL[HLXS[ZM[JHSOJNGHDZQYKLGJHSGQZHXQGKEZZWYSCSCJXYEYXADZPMDSSMZJZQJYZC[J[WQJBYZPXGZNZCPWHKXHQKMWFBPBYDTJZZKQHY"
"LYGXFPTYJYYZPSZLFCHMQSHGMXXSXJ[[DCSBBQBEFSJYHXWGZKPYLQBGLDLCCTNMAYDDKSSNGYCSGXLYZAYBNPTSDKDYLHGYMYLCXPY[JNDQJWXQXFYYFJLEJPZRXCCQWQQSBNKYMGPLBMJRQCFLNYMYQMSQYRBCJTHZTQFRXQHXMJJCJLXQGJMSHZKBSWYEMYLTXFSYDSWLYCJQXSJNQBSCTYHBFTDCYZDJWYGHQFRXWCKQKXEBPTLPXJZSRMEBWHJLBJSLYYSMDXLCLQKXLHXJRZJMFQHXHWY"
"WSBHTRXXGLHQHFNM[YKLDYXZPYLGG[MTCFPAJJZYLJTYANJGBJPLQGDZYQYAXBKYSECJSZNSLYZHSXLZCGHPXZHZNYTDSBCJKDLZAYFMYDLEBBGQYZKXGLDNDNYSKJSHDLYXBCGHXYPKDJMMZNGMMCLGWZSZXZJFZNMLZZTHCSYDBDLLSCDDNLKJYKJSYCJLKWHQASDKNHCSGANHDAASHTCPLCPQYBSDMPJLPZJOQLCDHJJYSPRCHN[NNLHLYYQYHWZPTCZGWWMZFFJQQQQYXACLBHKDJXDGMMY"
"DJXZLLSYGXGKJRYWZWYCLZMSSJZLDBYD[FCXYHLXCHYZJQ[[QAGMNYXPFRKSSBJLYXYSYGLNSCMHZWWMNZJJLXXHCHSY[[TTXRYCYXBYHCSMXJSZNPWGPXXTAYBGAJCXLY[DCCWZOCWKCCSBNHCPDYZNFCYYTYCKXKYBSQKKYTQQXFCWCHCYKELZQBSQYJQCCLMTHSYWHMKTLKJLYCXWHEQQHTQH[PQ[QSCFYMNDMGBWHWLGSLLYSDLMLXPTHMJHWLJZYHZJXHTXJLHXRSWLWZJCBXMHZQXSDZP"
"MGFCSGLSXYMJSHXPJXWMYQKSMYPLRTHBXFTPMHYXLCHLHLZYLXGSSSSTCLSLDCLRPBHZHXYYFHB[GDMYCNQQWLQHJJ[YWJZYEJJDHPBLQXTQKWHLCHQXAGTLXLJXMSL[HTZKZJECXJCJNMFBY[SFYWYBJZGNYSDZSQYRSLJPCLPWXSDWEJBJCBCNAYTWGMPAPCLYQPCLZXSBNMSGGFNZJJBZSFZYNDXHPLQKZCZWALSBCCJX[YZGWKYPSGXFZFCDKHJGXDLQFSGDSLQWZKXTMHSBGZMJZRGLYJB"
"PMLMSXLZJQQHZYJCZYDJWBMYKLDDPMJEGXYHYLXHLQYQHKYCWCJMYYXNATJHYCCXZPCQLBZWWYTWBQCMLPMYRJCCCXFPZNZZLJPLXXYZTZLGDLDCKLYRZZGQTGJHHGJLJAXFGFJZSLCFDQZLCLGJDJCSNZLLJPJQDCCLCJXMYZFTSXGCGSBRZXJQQCTZHGYQTJQQLZXJYLYLBCYAMCSTYLPDJBYREGKLZYZHLYSZQLZNWCZCLLWJQJJJKDGJZOLBBZPPGLGHTGZXYGHZMYCNQSYCYHBHGXKAMTX"
"YXNBSKYZZGJZLQJDFCJXDYGJQJJPMGWGJJJPKQSBGBMMCJSSCLPQPDXCDYYKY[CJDDYYGYWRHJRTGZNYQLDKLJSZZGZQZJGDYKSHPZMTLCPWNJAFYZDJCNMWESCYGLBTZCGMSSLLYXQSXSBSJSBBSGGHFJLYPMZJNLYYWDQSHZXTYYWHMZYHYWDBXBTLMSYYYFSXJC[DXXLHJHF[SXZQHFZMZCZTQCXZXRTTDJHNNYZQQMNQDMMG[YDXMJGDHCDYZBFFALLZTDLTFXMXQZDNGWQDBDCZJDXBZGS"
"QQDDJCMBKZFFXMKDMDSYYSZCMLJDSYNSBRSKMKMPCKLGDBQTFZSWTFGGLYPLLJZHGJ[GYPZLTCSMCNBTJBQFKTHBYZGKPBBYMTDSSXTBNPDKLEYCJNYDDYKZDDHQHSDZSCTARLLTKZLGECLLKJLQJAQNBDKKGHPJTZQKSECSHALQFMMGJNLYJBBTMLYZXDCJPLDLPCQDHZYCBZSCZBZMSLJFLKRZJSNFRGJHXPDHYJYBZGDLQCSEZGXLBLGYXTWMABCHECMWYJYZLLJJYHLG[DJLSLYGKDZPZXJ"
"YYZLWCXSZFGWYYDLYHCLJSCMBJHBLYZLYCBLYDPDQYSXQZBYTDKYXJY[CNRJMPDJGKLCLJBCTBJDDBBLBLCZQRPPXJCJLZCSHLTOLJNMDDDLNGKAQHQHJGYKHEZNMSHRP[QQJCHGMFPRXHJGDYCHGHLYRZQLCYQJNZSQTKQJYMSZSWLCFQQQXYFGGYPTQWLMCRNFKKFSYYLQBMQAMMMYXCTPSHCPTXXZZSMPHPSHMCLMLDQFYQXSZYYDYJZZHQPDSZGLSTJBCKBXYQZJSGPSXQZQZRQTBDKYXZK" "HHGFLBCSMDLDGDZDBLZYYCXNNCSYBZBFGLZZXSWMSCCMQNJQSBDQSJTXXMBLTXZCLZSHZCXRQJGJYLXZFJPHYMZQQYDFQJJLZZNZJCDGZYGCTXMZYSCTLKPHTXHTLBJXJLXSCDQXCBBTJFQZFSLTJBTKQBXXJJLJCHCZDBZJDCZJDCPRNPQCJPFCZLCLZXZDMXMPHJSGZGSZZQLYLWTJPFSYASMCJBTZKYCWMYTCSJJLJCQLWZMALBXYFBPNLSFHTGJWEJJXXGLLJSTGSHJQLZFKCGNNNSZFDEQ" "FHBSAQTGYLBXMMYGSZLDYDQMJJRGBJTKGDHGKBLQKBDMBYLXWCXYTTYBKMRTJZXQJBHLMHMJJZMQASLDCYXYQDLQCAFYWYXQHZ";
const char* cName = name.c_str();
std::string result;
int H = 0, L = 0, W = 0, j = 0;
size_t stringlen = ::strlen(cName);
for (int i = 0; i < stringlen; ++i) {
H = (unsigned char)(cName[i + 0]);
L = (unsigned char)(cName[i + 1]);
if (H < 0xA1 || L < 0xA1) {
result += cName[i];
continue;
}
W = (H - 160) * 100 + L - 160;
if (W > 1600 && W < 5590) {
bool has = false;
for (j = 22; j >= 0; j--) {
if (W >= secPosValue[j]) {
result += initials[j];
i++;
has = true;
break;
}
}
continue;
}
i++;
W = (H - 160 - 56) * 94 + L - 161;
if (W >= 0 && W <= 3007)
result += secondSecTable[W];
else {
result += (unsigned char)H;
result += (unsigned char)L;
}
}
return result;
}
七、项目总结及日后完善点
1.项目开发工具:VS2013
测试平台:windows 10
2.功能:实现通过输出关键字,来查找指定目录下的所有含关键字的文件名,并最终在终端上打印出文件名和路径,此处支持汉字搜索、汉字拼音全拼搜索、汉字拼音首字母搜索,并在搜索出的文件名中高亮显示关键字。
3.项目开发模块:扫描(文件扫描和监控)模块、数据库管理模块、搜索模块、高亮显示算法模块、汉字转拼音、汉字转首字母模块、trace日志等。
4.项目开发亮点:sqlite3使用、采用RAII操作数据库、单例模式、高亮匹配算法、C++11的thread/mutex使用。
5.项目的不足:1)扫描大目录时,效率低,导致实际数据和搜索数据不一致。2)由于编码问题,使用拼音模糊搜索,搜索时出现小bug,搜索结果出现问题。
6.项目改进:对大目录进行多线程分片扫描,并加入监控模块。 解决编码问题。 多次测试使搜索结果更加准确等。
八、项目完整代码
此处附上GitHub代码链接,需要参考的伙伴,可以去看看。
感谢各位大佬指正与批评!