机密信息扫描研究&开源工具detect_sercet源代码研究&go语言实现扫描功能&相关技术记录

这些非人类特权凭据通常被称为“机密信息”,指的是一条私密的信息,充当在工具、应用程序、容器、DevOps和云原生环境中解锁受保护资源或敏感信息的密钥。

一些最常见的机密信息类型包括:
特权帐户凭据
密码
证书
SSH密钥
API密钥
加密密钥

SecretRadar的实现思路主要分为三个层面,第一层我们采用传统敏感信息识别技术通过丰富的规则集来保证模型基础能力的稳定和可靠,同时确保了模型良好的可扩展性,以此来支持后续用户自定义的能力。但是这种方法非常依赖固化的长度、前缀、变量名等,匹配效果上容易造成漏报。因此针对难以固定规则捕捉的场景,在第二层我们采用了信息熵算法。信息熵可以用来衡量数据集的信息量大小,也就是其不确定程度。所以数据集的信息熵越大,无序程度就越高。通过计算信息熵,可以有效识别随机生成的密文信息,从而提升模型的召回能力,补足基于规则手段的漏报问题。同样信息熵算法也有其局限性,伴随召回的提升是误报率的增加。因此在第三层我们采用了模板聚类的方法,进行了过滤优化。针对信息熵结果集聚合提取常见关键字,并结合上下文分析,来完成二次过滤。同时通过问题的修复情况,建立二分类数据集,完成算法优化。进而从词法识别迭代为语义识别。

常见的机密信息正则表达式

access token         [1-9][0-9]+-[0-9a-zA-Z]{40}
                     EAACEdEose0cBA[0-9A-Za-z]+
API key              AIza[0-9A-Za-z\-_]{35}
OAuth ID             [0-9]+-[0-9A-Za-z_]{32}\.apps\.googleusercontent.com
API key              sk_live_[0-9a-z]{32}
Standard API Key     sk_live_[0-9a-zA-Z]{24}
Restricted API Key   rk_live_[0-9a-zA-Z]{24}
Access Token         sq0atp-[0-9a-zA-Z\-_]{22}
OAuth Secret         sq0csp-[0-9a-zA-Z\-_]{43}  
Access Token         access_token\$production\$[0-9a-z]{16}\$[0-9a-f]{32}
Auth Token           amzn\.mws\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
API Key              SK[0-9a-fA-F]{32}
API Key              key-[0-9a-zA-Z]{32}
API Key              [0-9a-f]{32}-us[0-9]{1,2}

对称加密算法:
AES (Advanced Encryption Standard)
DES (Data Encryption Standard)
3DES (Triple Data Encryption Standard)
Blowfish
RC4 (Rivest Cipher 4)
RC5 (Rivest Cipher 5)
非对称加密算法:
RSA (Rivest, Shamir, Adleman)
DSA (Digital Signature Algorithm)
ECC (Elliptic Curve Cryptography)
ElGamal
哈希函数:
MD5 (Message Digest Algorithm 5)
SHA-1 (Secure Hash Algorithm 1)
SHA-256, SHA-384, SHA-512 (Secure Hash Algorithm 2)
HMAC (Hash-based Message Authentication Code)
密钥交换算法:
Diffie-Hellman Key Exchange
ECDH (Elliptic Curve Diffie-Hellman)
数字签名算法:
RSA
DSA
ECC
ECDSA (Elliptic Curve Digital Signature Algorithm)

扫描加密值

# 扫描MD5
md5_matches = re.findall(r"\b[A-Fa-f0-9]{32}\b", text)
if md5_matches:
    encrypted_values.extend(md5_matches)

# 扫描SHA1
sha1_matches = re.findall(r"\b[A-Fa-f0-9]{40}\b", text)
if sha1_matches:
    encrypted_values.extend(sha1_matches)

# 扫描SHA256
sha256_matches = re.findall(r"\b[A-Fa-f0-9]{64}\b", text)
if sha256_matches:
   encrypted_values.extend(sha256_matches)

# 扫描SHA512
    sha512_matches = re.findall(r"\b[A-Fa-f0-9]{128}\b", text)
    if sha512_matches:
        encrypted_values.extend(sha512_matches)

# 扫描MD5Crypt
md5crypt_matches = re.findall(r"\b\$1\$.{0,8}\$[A-Za-z0-9/.]{22}\b", text)
if md5crypt_matches:
   encrypted_values.extend(md5crypt_matches)

使用机器学习做敏感信息扫描示例

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

# 机密信息样本
sensitive_samples = [
    "这是一条机密信息。",
    "请确保保密性。",
    "访问此信息需要授权。",
    "保密文件,请勿泄露。"
]

# 非机密信息样本
non_sensitive_samples = [
    "这是一条公开信息。",
    "这是一封普通邮件。",
    "公共文件,可随意共享。",
    "这是一篇公开发表的文章。"
]
# 构建文本特征向量
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(sensitive_samples + non_sensitive_samples)
# 构建标签
y = ["sensitive"] * len(sensitive_samples) + ["non-sensitive"] * len(non_sensitive_samples)
# 训练分类器
classifier = MultinomialNB()
classifier.fit(X, y)
# 预测新样本
def predict_text(text):
    X_test = vectorizer.transform([text])
    prediction = classifier.predict(X_test)
    return prediction[0]

# 预测示例文本
sample_text = "这是一条机密信息,请妥善保管。"
prediction = predict_text(sample_text)

# 输出结果
if prediction == "sensitive":
    print("发现机密信息。")
else:
    print("未发现机密信息。")

涉及的扫描文件类型:

类别举例
配置文件yaml,config.txt、.xml、.php
文本文件txt
备份文件*.rar *.zip *.7z *.tar.gz *.bak *.swp *.txt *.html
文档pdf
图片JPEG PNG BMP

开源工具detect_secret研究

相关规则

平台密码类型正则表达式
Amazon AWSAPI密钥AKIA[0-9A-Z]{16}
GoogleAPI密钥AIza[0-9A-Za-z-_]{35}
AzureAPI密钥AccountKey=[a-zA-Z0-9+/=]{88}
Google认证ID[0-9]±[0-9A-Za-z_]{32}.apps.google
PayPal访问Tokenaccess_token p r o d u c t i o n production production[0-9a-z]{16}$[0-9a-f]{32}
Facebook访问Token[1-9][0-9]±[0-9a-zA-Z]{40}
RSA私钥非对称私钥-----BEGIN RSA PRIVATE KEY----- [rn]+(?:w+:.+)* * (?:[0-9a-zA-Z+/=]{64,76}[rn]+)+ [0-9a-zA-Z+/=]+[rn]+ -----END RSA PRIVATE KEY----

detect-secrets 正则表达式整理

Square OAuth Secret:
sq0csp-[0-9A-Za-z\\\-_]{43}

Stripe Access Key:
(?:r|s)k_live_[0-9a-zA-Z]{24}

Twilio API Key:
AC[a-z0-9]{32}
SK[a-z0-9]{32}

Cloudant Credentials:
(?:cloudant|cl|clou)(?:_|-|)(?:api|)(?:key|pwd|pw|password|pass|token)(?:"|'|)(?:\]|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:"|'|)([0-9a-f]{64})(?:"|'|)
(?:cloudant|cl|clou)(?:_|-|)(?:api|)(?:key|pwd|pw|password|pass|token)(?:"|'|)(?:]|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:"|'|)([a-z]{24})(?:"|'|)
(?:https?\:\/\/)[\w\-]+\:([0-9a-f]{64})\@[\w\-]+\.cloudant\.com
(?:https?\:\/\/)[\w\-]+\:([a-z]{24})\@[\w\-]+\.cloudant\.com

SoftLayer Credentials:
(?i)(?:softlayer|sl)(?:_|-|)(?:api|)(?:_|-|)(?:key|pwd|password|pass|token)(?:"|'|)(?:\]|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:"|'|)([a-z0-9]{64})(?:"|'|)
(?:http|https)://api.softlayer.com/soap/(?:v3|v3.1)/([a-z0-9]{64})

Slack Token:
xox(?:a|b|p|o|s|r)-(?:\d+-)+[a-z0-9]+ 
https://hooks\.slack\.com/services/T[a-zA-Z0-9_]+/B[a-zA-Z0-9_]+/[a-zA-Z0-9_]+

Private Key:
BEGIN DSA PRIVATE KEY 
BEGIN EC PRIVATE KEY
BEGIN OPENSSH PRIVATE KEY
BEGIN PGP PRIVATE KEY BLOCK
BEGIN PRIVATE KEY
BEGIN RSA PRIVATE KEY
BEGIN SSH2 ENCRYPTED PRIVATE KEY
PuTTY-User-Key-File-2

SendGrid API Key
SG\.[a-zA-Z0-9_-]{22}\.[a-zA-Z0-9_-]{43}

Facebook Token
[1-9][0-9]+-[0-9a-zA-Z]{40}

PayPal Token
access_token$production$[0-9a-z]{16}$[0-9a-f]{32}

NPM tokens
\/\/.+\/:_authToken=\s*((npm_.+)|([A-Fa-f0-9-]{36})).*

IBM Cloud
(?:"|\'|)(?:ibm(?:_|-|)cloud(?:_|-|)iam|cloud(?:_|-|)iam|ibm(?:_|-|)cloud|ibm(?:_|-|)iam|ibm|iam|cloud|)(?:_|-|)(?:api|)(?:_|-|)(?:key|pwd|password|pass|token)(?:"|\'|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:"|'|)(?: *)([a-zA-Z0-9_\-]{44})(?:"|'|)

相关代码知识
1 @lru_cache
Python 内置模块 functools 的一个高阶函数 @lru_cache 是一个为函数提供缓存功能的装饰器,缓存 maxsize 组传入参数,在下次以相同参数调用时直接返回上一次的结果。用以节约高开销或 I/O 函数的调用时间。
2 isinstance()
isinstance() 函数来判断一个对象是否是一个已知的类型,类似 type()。
3 @staticmethod
@staticmethod是一个Python中的装饰器(decorator),用于标记一个静态方法。
静态方法是一种在类中定义的方法,它与实例无关,因此可以在不创建类实例的情况下调用。与普通方法不同,静态方法没有self参数,因此它不能访问实例属性和方法。
3 os.path.realpath()
Python中的方法用于通过消除路径中遇到的任何符号链接来获取指定文件名的规范路径。
4 union 联合注解 my_list: 整体数据类型[union[元素数据类型1,元素数据类型2…,元素数据类型n]]
使用场景: 变量中的参数,需要多种数据类型
5 os.walk(path)
返回的是一个三元组,以传入的参数path为起点,得到这个起点路径(root),这个起点下的所有的文件夹list(dirs), 这个起点下的所有的文件list(files) 三元组为(root,dirs,files)
6 subprocess 模块的 check_output 函数可以用于执行一个shell命令,并返回命令的输出内容。
同Popen相比较,check_output 侧重于获取命令执行后的输出内容,因此适合于执行能够快速获得相应的命令,因为check_output会阻塞程序,直到命令执行结束返回结果,为此还增加了一个timeout参数来防止超时。check_output的返回值的类型是bytes, 如果想用str, 可以使用decode方法进行解码。
7. python并行
with mp.Pool(processes=num_processes) as pool:
8. 字符串格式化
Python 为我们提供了另一种简洁优雅的实现方式,也是官方更加推荐的方式:使用 str.format() 来实现字符串的格式化:
print “User:{} has completed Action:{} at Time:{}”.format(user_name, action_name, current_time)
str.format
既能够用于简单的场景,也能够胜任复杂的字符串替换,而无需繁琐的字符串连接操作。
9. 用例图三大关系: 扩展 – 包含(依赖) – 泛化(父子类)
用户 --系统边界

go语言实现机密信息扫描功能

package main

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"path/filepath"
	"regexp"

	"gopkg.in/yaml.v3"
)

var (
	GlobalPluginMap map[string]Plugin
)

type Plugin interface {
	matchMethod(line string) []string //返回匹配列表
	verifyMethod()                    //返回验证结果
}

//全局配置类
type ScanPolicy struct {
	DetectPlugin []RegexpPlugin `yaml:"detect_plugin"`
	///其他策略配置项
}

func (rp RegexpPlugin) matchMethod(line string) []string {
	matchLis := []string{}
	for _, matchRegex := range rp.DenyList {
		matchSecretRegexp, _ := regexp.Compile(matchRegex)
		secretLis := matchSecretRegexp.FindAllString(line, -1)
		for _, submatch := range secretLis {
			matchLis = append(matchLis, submatch)
		}
	}
	return matchLis
}

func (s RegexpPlugin) verifyMethod() {
	return
}

func getAllFilenames(directory string) ([]string, error) {
	var filenames []string
	files, err := ioutil.ReadDir(directory)
	if err != nil {
		return nil, err
	}
	for _, file := range files {
		if file.IsDir() {
			subdirectory := filepath.Join(directory, file.Name())
			subFilenames, err := getAllFilenames(subdirectory)
			if err != nil {
				return nil, err
			}
			filenames = append(filenames, subFilenames...)
		} else {
			filename := filepath.Join(directory, file.Name())
			filenames = append(filenames, filename)
		}
	}
	return filenames, nil
}


func readScanPolicy() *ScanPolicy {
	s := ScanPolicy{}
	file, err := ioutil.ReadFile("src/plugin-config.yml")
	if err != nil {
		log.Fatalf("plugin_config.yml read error: %v\n", err)
	}
	err = yaml.Unmarshal(file, &s)
	if err != nil {
		log.Fatalf("plugin_config.yml Unmarshal error: %v\n", err)
	}
	return &s
}

func initPluginMap(currentPolicy *ScanPolicy) {
	GlobalPluginMap = make(map[string]Plugin)
	for _, detectPlugin := range currentPolicy.DetectPlugin {
		newRegexpPlugin := RegexpPlugin{detectPlugin.SecretType, detectPlugin.DenyList}
		GlobalPluginMap[detectPlugin.SecretType] = newRegexpPlugin
	}
}

func scanTask(dictoryName string) {
	fileLis, err := getAllFilenames("D:\\testScanSecrets")
	if err != nil {
	}
	for _, file := range fileLis {
		lines, _ := ReadFileLines(file)
		for line_number, line := range lines {
			for _, plugin := range GlobalPluginMap {
				if len(plugin.matchMethod(line)) != 0 {
					fmt.Println(line_number, plugin.matchMethod(line))
				}
			}
		}
	}
}

func scanLines(lines []string) {
	for line_number, line := range lines {
		for _, plugin := range GlobalPluginMap {
			if len(plugin.matchMethod(line)) != 0 {
				fmt.Println(line_number, plugin.matchMethod(line))
			}
		}
	}
}

func main() {
	currentPolicy := readScanPolicy()
	initPluginMap(currentPolicy)
	scanTask("D:\\testScanSecrets")
	//a := RegexpPlugin{"测试A", []string{"((2(5[0-5]|[0-4]\\d))|[0-1]?\\d{1,2})(\\.((2(5[0-5]|[0-4]\\d))|[0-1]?\\d{1,2})){3}", "(?:\\s|=|:|\"|^)AKC[a-zA-Z0-9]{10,}(?:\\s|\"|$)"}}
}

插件结构:

关键字插件

import (
	"path/filepath"
	"regexp"
	"strings"
)

var (
	DENYLIST                                                                         = []string{"api_?key", "auth_?key", "service_?key", "account_?key", "db_?key", "database_?key", "priv_?key", "private_?key", "client_?key", "db_?pass", "database_?pass", "key_?pass", "password", "passwd", "pwd", "secret", "contraseña", "contrasena"}
	DENYLIST_REGEX_WITH_PREFIX                                                       string
	FOLLOWED_BY_COLON_EQUAL_SIGNS_REGEX                                              *regexp.Regexp
	FOLLOWED_BY_QUOTES_AND_SEMICOLON_REGEX                                           *regexp.Regexp
	FOLLOWED_BY_EQUAL_SIGNS_REGEX                                                    *regexp.Regexp
	PRECEDED_BY_EQUAL_COMPARISON_SIGNS_QUOTES_REQUIRED_REGEX                         *regexp.Regexp
	FOLLOWED_BY_COLON_QUOTES_REQUIRED_REGEX                                          *regexp.Regexp
	FOLLOWED_BY_EQUAL_SIGNS_QUOTES_REQUIRED_REGEX                                    *regexp.Regexp
	FOLLOWED_BY_ARROW_FUNCTION_SIGN_QUOTES_REQUIRED_REGEX                            *regexp.Regexp
	FOLLOWED_BY_COLON_REGEX                                                          *regexp.Regexp
	FOLLOWED_BY_OPTIONAL_ASSIGN_QUOTES_REQUIRED_REGEX                                *regexp.Regexp
	FOLLOWED_BY_EQUAL_SIGNS_OPTIONAL_BRACKETS_OPTIONAL_AT_SIGN_QUOTES_REQUIRED_REGEX *regexp.Regexp
	GOLANG_DENYLIST_REGEX_TO_GROUP                                                   []*regexp.Regexp
	QUOTES_REQUIRED_DENYLIST_REGEX_TO_GROUP                                          []*regexp.Regexp
	CONFIG_DENYLIST_REGEX_TO_GROUP                                                   []*regexp.Regexp
	C_PLUS_PLUS_REGEX_TO_GROUP                                                       []*regexp.Regexp
	COMMON_C_DENYLIST_REGEX_TO_GROUP                                                 []*regexp.Regexp
)

const (
	CLOSING                 = "[]'\"]{0,2}"
	AFFIX_REGEX             = "\\w*"
	OPTIONAL_WHITESPACE     = "\\s*"
	OPTIONAL_NON_WHITESPACE = "[^\\s]{0,50}?"
	QUOTE                   = "['\"`]"
	SQUARE_BRACKETS         = "(\\[[0-9]*\\])"
	SECRET                  = "([^\\v'\"]*)(\\w+)[^\\v'\"]*[^\\v,'\"`]"
)

// 关键词插件
type KeywordPlugin struct {
	SecretType  string   `yaml:"secret_type"`
	KeywordList []string `yaml:"keyword_list"`
}

func NewKeywordPlugin(secretType string) *KeywordPlugin {
	GenerateMulitRegex()
	return &KeywordPlugin{
		SecretType:  secretType,
		KeywordList: DENYLIST,
	}
}

func (kp KeywordPlugin) MatchSecrets(line string, fileName string) []string {
	matchRegexGroup := GetRegexByFileType(fileName)
	matchList := []string{}
	for _, matchRegex := range matchRegexGroup {
		match := matchRegex.MatchString(line)
		if match {
			matchList = append(matchList, line)
			break
		}
	}
	return matchList
}

func (kp KeywordPlugin) VerifySecrets() {
	return
}

func GenerateMulitRegex() {
	denyListStr := strings.Join(DENYLIST, "|")
	DENYLIST_REGEX := "(" + denyListStr + ")" + AFFIX_REGEX
	DENYLIST_REGEX_WITH_PREFIX = AFFIX_REGEX + "(" + denyListStr + ")"

	//匹配规则单元
	FOLLOWED_BY_COLON_EQUAL_SIGNS_REGEX, _ = regexp.Compile(DENYLIST_REGEX + "(" + CLOSING + ")?" + OPTIONAL_WHITESPACE + ":=" + OPTIONAL_WHITESPACE + "(" + QUOTE + "?)" + "(" + SECRET + ")")
	PRECEDED_BY_EQUAL_COMPARISON_SIGNS_QUOTES_REQUIRED_REGEX, _ = regexp.Compile("(" + QUOTE + ")" + "(" + SECRET + ")" + "(" + QUOTE + ")" + OPTIONAL_WHITESPACE + "[!=]{2,3}" + OPTIONAL_WHITESPACE + DENYLIST_REGEX_WITH_PREFIX)
	FOLLOWED_BY_EQUAL_SIGNS_REGEX, _ = regexp.Compile(DENYLIST_REGEX + "(" + CLOSING + ")?" + OPTIONAL_WHITESPACE + "(={1,3}|!==?)" + OPTIONAL_WHITESPACE + "(" + QUOTE + "?)" + "(" + SECRET + ")")
	FOLLOWED_BY_QUOTES_AND_SEMICOLON_REGEX, _ = regexp.Compile(DENYLIST_REGEX + OPTIONAL_NON_WHITESPACE + OPTIONAL_WHITESPACE + "(" + QUOTE + ")" + "(" + SECRET + ")" + ";")
	FOLLOWED_BY_COLON_QUOTES_REQUIRED_REGEX, _ = regexp.Compile(DENYLIST_REGEX + "(" + CLOSING + ")?:(" + OPTIONAL_WHITESPACE + ")" + "(" + QUOTE + ")" + "(" + SECRET + ")")
	FOLLOWED_BY_EQUAL_SIGNS_QUOTES_REQUIRED_REGEX, _ = regexp.Compile(DENYLIST_REGEX + "(" + CLOSING + ")?" + OPTIONAL_WHITESPACE + "(={1,3}|!==?)" + OPTIONAL_WHITESPACE + "(" + QUOTE + ")" + "(" + SECRET + ")")
	FOLLOWED_BY_ARROW_FUNCTION_SIGN_QUOTES_REQUIRED_REGEX, _ = regexp.Compile(DENYLIST_REGEX + "(" + CLOSING + ")?" + OPTIONAL_WHITESPACE + "=>? " + OPTIONAL_WHITESPACE + "(" + QUOTE + ")" + "(" + SECRET + ")")
	FOLLOWED_BY_COLON_REGEX, _ = regexp.Compile(DENYLIST_REGEX + "(" + CLOSING + ")?:" + OPTIONAL_WHITESPACE + "(" + QUOTE + "?)" + "(" + SECRET + ")")
	FOLLOWED_BY_OPTIONAL_ASSIGN_QUOTES_REQUIRED_REGEX, _ = regexp.Compile(DENYLIST_REGEX + "(.assign)?\\((\")(" + SECRET + ")")
	FOLLOWED_BY_EQUAL_SIGNS_OPTIONAL_BRACKETS_OPTIONAL_AT_SIGN_QUOTES_REQUIRED_REGEX, _ := regexp.Compile(DENYLIST_REGEX + "(" + SQUARE_BRACKETS + ")?" + OPTIONAL_WHITESPACE + "[!=]{1,2}" + OPTIONAL_WHITESPACE + "(@)?\"" + "(" + SECRET + ")")
	//匹配规则组
	GOLANG_DENYLIST_REGEX_TO_GROUP = []*regexp.Regexp{FOLLOWED_BY_COLON_EQUAL_SIGNS_REGEX, PRECEDED_BY_EQUAL_COMPARISON_SIGNS_QUOTES_REQUIRED_REGEX, FOLLOWED_BY_QUOTES_AND_SEMICOLON_REGEX, FOLLOWED_BY_EQUAL_SIGNS_REGEX}
	QUOTES_REQUIRED_DENYLIST_REGEX_TO_GROUP = []*regexp.Regexp{PRECEDED_BY_EQUAL_COMPARISON_SIGNS_QUOTES_REQUIRED_REGEX, FOLLOWED_BY_QUOTES_AND_SEMICOLON_REGEX, FOLLOWED_BY_COLON_QUOTES_REQUIRED_REGEX, FOLLOWED_BY_ARROW_FUNCTION_SIGN_QUOTES_REQUIRED_REGEX, FOLLOWED_BY_EQUAL_SIGNS_QUOTES_REQUIRED_REGEX}
	CONFIG_DENYLIST_REGEX_TO_GROUP = []*regexp.Regexp{FOLLOWED_BY_COLON_REGEX, PRECEDED_BY_EQUAL_COMPARISON_SIGNS_QUOTES_REQUIRED_REGEX, FOLLOWED_BY_EQUAL_SIGNS_REGEX, FOLLOWED_BY_QUOTES_AND_SEMICOLON_REGEX}
	C_PLUS_PLUS_REGEX_TO_GROUP = []*regexp.Regexp{FOLLOWED_BY_OPTIONAL_ASSIGN_QUOTES_REQUIRED_REGEX, FOLLOWED_BY_EQUAL_SIGNS_QUOTES_REQUIRED_REGEX}
	COMMON_C_DENYLIST_REGEX_TO_GROUP = []*regexp.Regexp{FOLLOWED_BY_EQUAL_SIGNS_OPTIONAL_BRACKETS_OPTIONAL_AT_SIGN_QUOTES_REQUIRED_REGEX}
}

func GetRegexByFileType(fileName string) []*regexp.Regexp {
	fileExtension := filepath.Ext(fileName)
	switch fileExtension {
	case ".go":
		return GOLANG_DENYLIST_REGEX_TO_GROUP
	case ".cls", ".java", ".js", ".swift", ".tf", ".py", ".pyi":
		return QUOTES_REQUIRED_DENYLIST_REGEX_TO_GROUP
	case ".yml", ".yaml", "eyaml", ".cnf", ".conf", ".cfg", ".cf", ".ini", ".properties", ".toml", ".xml":
		return CONFIG_DENYLIST_REGEX_TO_GROUP
	case ".cpp":
		return C_PLUS_PLUS_REGEX_TO_GROUP
	case ".m", ".c", ".cs":
		return COMMON_C_DENYLIST_REGEX_TO_GROUP
	case ".sh", ".sql", ".txt":
		return []*regexp.Regexp{}
	}
	return nil
}

信息熵插件

package plugin

import (
	"regexp"
	"secret-scan/src/infrastructure/logger"
	"secret-scan/src/infrastructure/util"
)

// 信息熵插件
type EntropyPlugin struct {
	SecretType string         `yaml:"secret_type"`
	Charset    string         `yaml:"charset"`
	Regex      *regexp.Regexp `yaml:"regex"`
	Limit      float64        `yaml:"limit"`
}

func NewEntropyPlugin(secretType string, charset string, limit float64) *EntropyPlugin {
	if limit < 0 || limit > 8 {
		logger.Logger.Error("Limit parameter value error")
		return nil
	}
	regex, _ := regexp.Compile(`([\'"])([` + charset + `]+)([\'"])`)
	return &EntropyPlugin{
		SecretType: secretType,
		Charset:    charset,
		Regex:      regex,
		Limit:      limit,
	}
}

func (ep EntropyPlugin) MatchSecrets(line string, fileName string) []string {
	matchList := []string{}
	match := ep.Regex.FindStringSubmatch(line)
	if len(match) > 0 && match[1] == match[3] {
		shannonentropy := util.CalculateShannonEntropy(match[2])
		if shannonentropy > ep.Limit {
			matchList = append(matchList, match[2])
		}
	}
	return matchList
}

func (ep EntropyPlugin) VerifySecrets() {
	return
}

正则表达式插件

package plugin

import (
	"regexp"
	"secret-scan/src/infrastructure/logger"
)

// 正则表达式插件
type RegexpPlugin struct {
	SecretType string           `yaml:"secret_type"`
	DenyList   []*regexp.Regexp `yaml:"deny_list"`
}

func NewRegexpPlugin(secretType string, ruleList []string) *RegexpPlugin {
	DENYLIST := []*regexp.Regexp{}
	for _, rule := range ruleList {
		denyRegexp, err := regexp.Compile(rule)
		if err != nil {
			logger.Logger.Error("complie regexp err", err)
		}
		DENYLIST = append(DENYLIST, denyRegexp)
	}
	return &RegexpPlugin{
		SecretType: secretType,
		DenyList:   DENYLIST,
	}
}

func (rp RegexpPlugin) MatchSecrets(line string, fileName string) []string {
	matchList := []string{}
	for _, matchSecretRegexp := range rp.DenyList {
		secretList := matchSecretRegexp.FindAllString(line, -1)
		for _, submatch := range secretList {
			matchList = append(matchList, submatch)
		}
	}
	return matchList
}

func (rp RegexpPlugin) VerifySecrets() {
	return
}

机密信息集合类:

package potentialSecret

import (
	"encoding/csv"
	"fmt"
	"os"
	"secret-scan/src/infrastructure/logger"
	"strconv"
	"time"
)

type PotentialSecretCollection struct {
	Target string
	Result []PotentialSecret
}

func (psc PotentialSecretCollection) SaveResultToCSV(outputDir string) {
	//创建output目录
	timestampStr := fmt.Sprintf("%d", time.Now().Unix())
	if outputDir == "" {
		err := os.Mkdir("./output", 0700)
		if err != nil {
			logger.Logger.Error("Output dir create error.", err)
		}
		outputDir = "./output/scan_result_" + timestampStr + ".csv"
	} else {
		outputDir = outputDir + "/scan_result_" + timestampStr + ".csv"
	}
	file, err := os.Create(outputDir)
	if err != nil {
		logger.Logger.Error("File create error", err)
		return
	}
	defer file.Close()
	writer := csv.NewWriter(file)
	err = writer.Write([]string{"文件路径", "机密类型", "机密值", "文件行"})
	for _, secret := range psc.Result {
		err = writer.Write([]string{secret.FilePath, secret.SecretType, secret.Secret, strconv.Itoa(secret.LineNumber)})
		if err != nil {
			logger.Logger.Error("File write error", err)
			return
		}
	}
	writer.Flush()
	if err := writer.Error(); err != nil {
		return
	}
	logger.Logger.Info("CSV file is saved!")
}

机密信息类:

package potentialSecret

//潜在隐藏信息
type PotentialSecret struct {
	SecretType string
	FilePath   string
	Secret     string
	LineNumber int
}

go 递归遍历目录

err := filepath.WalkDir("targetDict", func(path string, info os.DirEntry, err error) error {
	if err != nil {
		fmt.Printf("访问路径 %s 时出错:%v\n", path, err)
		return nil
	}
	if info.IsDir() {
		fmt.Printf("目录:%s\n", path)
	} else {
		fmt.Printf("文件:%s\n", path)
	}
	return nil
})
if err != nil {
	fmt.Printf("遍历目录 %s 出错:%v\n", "targetDict" , err)
}

go 计算字符串香农熵

func CalculateShannonEntropy(input string) float64 {
	frequencies := make(map[rune]int)
	for _, char := range input {
		frequencies[char]++
	}
	entropy := 0.0
	totalChars := len(input)
	for _, count := range frequencies {
		probability := float64(count) / float64(totalChars)
		entropy -= probability * math.Log2(probability)
	}
	return entropy
}

corba实现命令行

import "github.com/spf13/cobra"
main
if err := cmd.RootCmd.Execute(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

// 创建一个命令
var RootCmd = &cobra.Command{
	Use:   "command_name",
	Short: "short des",
	Long:  "long des",
	Run: func(cmd *cobra.Command, args []string) {
		fmt.Println("Hello, World!")
	},
}

var ssss string
// 添加子命令
func init() {
    //添加变量标识
	subCmd.Flags().StringVarP(&ssss, "", "", ssss, "Used to s")
	//添加子命令
	RootCmd.AddCommand(subCmd)
}

使用 go run main.go command_name --ssss="变量值"

go启用多协程,并控制总协程数量

resultChan := make(chan string)
maxConcurrency := 5 // 最大并发协程数量
var wg sync.WaitGroup
semaphore := make(chan struct{}, maxConcurrency)

            wg.Add(1)
			semaphore <- struct{}{} // 获取信号量,限制并发数量
			go func() {
				scanFile(path, &wg, resultChan)
				<-semaphore  // 释放信号量,允许其他协程执行
			}()
			
wg.Wait()

读取不同文件类型的迭代行

// 根据不同文件类型 获取迭代行
func ReadFileLines(fileName string) ([]string, error) {
	extension := filepath.Ext(fileName)
	switch extension {
	case ".zip":
		zipFileContent, err := ReadZipFileLines(fileName)
		return zipFileContent, err
	case ".gz":
		tarGZFileContent, err := ReadTarGZFileLines(fileName)
		return tarGZFileContent, err
	default:
		commonFileContent, err := ReadCommonFileLines(fileName)
		return commonFileContent, err
	}
}

// 读取通用类文件类型行
func ReadCommonFileLines(filename string) ([]string, error) {
	file, err := os.Open(filename)
	if err != nil {
		logger.Logger.Error("can not open file error: %s", err)
		return nil, err
	}
	defer file.Close()
	scanner := bufio.NewScanner(file)
	scanner.Buffer(make([]byte, MaxTokenSize), MaxTokenSize)
	var lines []string
	for scanner.Scan() {
		line := scanner.Text()
		lines = append(lines, line)
	}
	if err := scanner.Err(); err != nil {
		logger.Logger.Error("read file error: %s", err)
		return nil, err
	}
	return lines, nil
}

// 读取压缩包文件类型行
func ReadZipFileLines(filePath string) ([]string, error) {
	zipFile, err := zip.OpenReader(filePath)
	if err != nil {
		log.Fatal(err)
	}
	defer zipFile.Close()

	var lines []string
	// 遍历ZIP文件中的文件
	for _, zipFileInfo := range zipFile.Reader.File {
		srcFile, _ := zipFileInfo.Open()
		// 读取文件内容
		scanner := bufio.NewScanner(srcFile)
		scanner.Buffer(make([]byte, MaxTokenSize), MaxTokenSize)
		for scanner.Scan() {
			line := scanner.Text()
			lines = append(lines, line)
		}
		if err := scanner.Err(); err != nil {
			logger.Logger.Error("read file error: %s", err)
			return nil, err
		}
	}
	return lines, nil
}

// 读取tar.gz压缩包文件行
func ReadTarGZFileLines(filePath string) ([]string, error) {
	// 打开tar.gz文件
	file, err := os.Open(filePath)
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	// 创建gzip.Reader
	gzipReader, err := gzip.NewReader(file)
	if err != nil {
		log.Fatal(err)
	}
	defer gzipReader.Close()

	// 创建tar.Reader
	tarReader := tar.NewReader(gzipReader)

	// 逐个读取tar文件中的文件
	var lines []string
	for {
		header, err := tarReader.Next()
		if err != nil {
			if err == io.EOF {
				break
			}
			logger.Logger.Error("read tar header error:", err)
		}
		if header.Typeflag == tar.TypeReg {
			data := make([]byte, header.Size)
			_, err := tarReader.Read(data)
			if err != nil {
				logger.Logger.Error("read file error:", err)
			}
			lines = append(lines, string(data))
		}
	}
	return lines, nil
}

规则配置文件代码

var (
	GlobalRegexpPluginDenylist map[string][]string
	GlobalEntropyPluginParm    map[string][2]string
	GlobalPluginMap            map[string]plugin.Plugin
)

// 初始化插件配置
func InitPluginConfig() {
	initGlobalRegexpPluginDenylist()
	initGlobalEntropyPluginParam()
	registerGlobalPlugin()
}

func initGlobalRegexpPluginDenylist() {
	GlobalRegexpPluginDenylist = make(map[string][]string)
	GlobalRegexpPluginDenylist = map[string][]string{
		"Artifactory Credentials":          {"(?:\\s|=|:|\"|^)AKC[a-zA-Z0-9]{10,}(?:\\s|\"|$)", "(?:\\s|=|:|\"|^)AP[\\dABCDEF][a-zA-Z0-9]{8,}(?:\\s|\"|$)"},
		"AWS Access Key":                   {"AKIA[0-9A-Z]{16}", "aws.{0,20}?(?:key|pwd|pw|password|pass|token).{0,20}?['\"]([0-9a-zA-Z/+]{40})['\"]"},
		"Azure Storage Account access key": {"AccountKey=[a-zA-Z0-9+\\/=]{88}"},
		"Discord Bot Token":                {"[MNO][a-zA-Z\\d_-]{23,25}\\.[a-zA-Z\\d_-]{6}\\.[a-zA-Z\\d_-]{27}"},
		"GitHub Token":                     {"(ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36}"},
		"JSON Web Token":                   {"eyJ[A-Za-z0-9-_=]+\\.[A-Za-z0-9-_=]+\\.?[A-Za-z0-9-_.+/=]*?"},
		"Mailchimp Access Key":             {"[0-9a-z]{32}-us[0-9]{1,2}"},
		"NPM tokens":                       {"\\/\\/.+\\/:_authToken=\\s*((npm_.+)|([A-Fa-f0-9-]{36})).*"},
		"PayPal Token":                     {"access_token$production$[0-9a-z]{16}$[0-9a-f]{32}"},
		"Facebook Token":                   {"[1-9][0-9]+-[0-9a-zA-Z]{40}"},
		"SendGrid API Key":                 {"SG\\.[a-zA-Z0-9_-]{22}\\.[a-zA-Z0-9_-]{43}"},
		"Slack Token":                      {"xox(?:a|b|p|o|s|r)-(?:\\d+-)+[a-z0-9]+", "https://hooks\\.slack\\.com/services/T[a-zA-Z0-9_]+/B[a-zA-Z0-9_]+/[a-zA-Z0-9_]+"},
		"Private Key":                      {"BEGIN DSA PRIVATE KEY", "BEGIN EC PRIVATE KEY", "BEGIN OPENSSH PRIVATE KEY", "BEGIN PGP PRIVATE KEY BLOCK", "BEGIN PRIVATE KEY", "BEGIN RSA PRIVATE KEY", "BEGIN SSH2 ENCRYPTED PRIVATE KEY", "PuTTY-User-Key-File-2"},
		"Basic Auth Credentials":           {"://[^:/\\?\\#\\[\\]@!\\$\\&'\\(\\)\\*\\+,;=\\s]+:([^:/\\?\\#\\[\\]@!\\$\\&'\\(\\)\\*\\+,;=\\s]+)@"},
		"Square OAuth Secret":              {"sq0csp-[0-9A-Za-z\\\\\\-_]{43}"},
		"Stripe Access Key":                {"(?:r|s)k_live_[0-9a-zA-Z]{24}"},
		"Twilio API Key":                   {"AC[a-z0-9]{32}", "SK[a-z0-9]{32}"},
		"IBM Cloud IAM Key":                {"(?i)(?:\"|'|)(?:ibm(?:_|-|)cloud(?:_|-|)iam|cloud(?:_|-|)iam|ibm(?:_|-|)cloud|ibm(?:_|-|)iam|ibm|iam|cloud|)(?:_|-|)(?:api|)(?:_|-|)(?:key|pwd|password|pass|token)(?:\"|'|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:\"|'|)(?: *)([a-zA-Z0-9_\\-]{44})(?:\"|'|)"},
		"IBM COS HMAC Credentials":         {"(?i)(?:secret[-_]?(?:access)?[-_]?key)(?:\"|\\'|)(?:\\]|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:\"|\\'|)([a-f0-9]{48})(?:\"|\\'|)"},
		"SoftLayer Credentials":            {"(?i)(?:softlayer|sl)(?:_|-|)(?:api|)(?:_|-|)(?:key|pwd|password|pass|token)(?:\"|\\'|)(?:\\]|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:\"|\\'|)([a-z0-9]{64})(?:\"|\\'|)", "(?:http|https)://api.softlayer.com/soap/(?:v3|v3.1)/([a-z0-9]{64})"},
		"Cloudant Credentials":             {"(?:cloudant|cl|clou)(?:_|-|)(?:api|)(?:key|pwd|pw|password|pass|token)(?:\"|\\'|)(?:\\]|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:\"|\\'|)([0-9a-f]{64})(?:\"|\\'|)", "(?:cloudant|cl|clou)(?:_|-|)(?:api|)(?:key|pwd|pw|password|pass|token)(?:\"|\\'|)(?:\\]|)(?: *)(?:=|:|:=|=>| +|::)(?: *)(?:\"|\\'|)([a-z]{24})(?:\"|\\'|)", "(?:https?\\:\\/\\/)[\\w\\-]+\\:([0-9a-f]{64})\\@[\\w\\-]+\\.cloudant\\.com", "(?:https?\\:\\/\\/)[\\w\\-]+\\:([a-z]{24})\\@[\\w\\-]+\\.cloudant\\.com"},
		"Personal Information":             {"^[1-9]\\d{5}(19|20)\\d{2}((0[1-9])|(1[0-2]))(([0-2][1-9])|10|20|30|31)\\d{3}[Xx\\d]$", "^[A-Za-z0-9\u4e00-\u9fa5.]+@[a-zA-Z0-9_-]+(.[a-zA-Z0-9_-]+)+$"},
		"IP Address":                       {"((2(5[0-5]|[0-4]\\d))|[0-1]?\\d{1,2})(\\.((2(5[0-5]|[0-4]\\d))|[0-1]?\\d{1,2})){3}"},
	}
}

func initGlobalEntropyPluginParam() {
	GlobalEntropyPluginParm = make(map[string][2]string)
	GlobalEntropyPluginParm = map[string][2]string{
		"Base64 High Entropy String": {"\\w-=+", "4.5"},
		"Hex High Entropy String":    {"0123456789abcdefABCDEF", "3.0"},
	}
}

func registerGlobalPlugin() {
	//注册正则类插件
	GlobalPluginMap = make(map[string]plugin.Plugin)
	for secretType, denyList := range GlobalRegexpPluginDenylist {
		newRegexpPlugin := plugin.NewRegexpPlugin(secretType, denyList)
		GlobalPluginMap[secretType] = newRegexpPlugin
	}
	//注册信息熵插件
	for secretType, param := range GlobalEntropyPluginParm {
		limit, _ := strconv.ParseFloat(param[1], 64)
		GlobalPluginMap[secretType] = plugin.NewEntropyPlugin(secretType, param[0], limit)
	}
	//注册关键字插件
	GlobalPluginMap["Keyword"] = plugin.NewKeywordPlugin("Keyword")
}

func GetEnablePlugins() map[string]plugin.Plugin {
	return GlobalPluginMap
}

参数 配置文件代码

package policy

import (
	"secret-scan/src/infrastructure/util"
	"strings"
	"sync"

	"golang.org/x/sync/semaphore"
)

var GlobalScanPolicy ScanPolicy
var once sync.Once

// 过滤策略类--参数项
type ScanPolicy struct {
	FileFilter      FileFilterPolicy   `yaml:"file_filter"`
	SecretFilter    SecretFilterPolicy `yaml:"secret_filter"`
	ScanConcurrency int                `yaml:"scan_concurrency"`
	FilingData      map[string]bool
	ScanAllFiles    bool
}

func InitScanPolicy() ScanPolicy {
	once.Do(func() {
		FilingDataMap := make(map[string]bool)
		GlobalScanPolicy = ScanPolicy{FilingData: FilingDataMap}
	})
	return GlobalScanPolicy
}

type FileFilterPolicy struct {
	DisableFileName []string `yaml:"disable_file_name"`
}

type SecretFilterPolicy struct {
	Ignore []string `yaml:"ignore"`
}

func (sp ScanPolicy) SetFileFilters(filterFiles string) {
	filterFileLis := strings.Split(filterFiles, ",")
	for _, filterFile := range filterFileLis {
		sp.FileFilter.DisableFileName = append(sp.FileFilter.DisableFileName, filterFile)
	}
}

func (sp ScanPolicy) SetSecretFilters(filterSecrets string) {
	filterSecretLis := strings.Split(filterSecrets, ",")
	for _, filterSecret := range filterSecretLis {
		sp.SecretFilter.Ignore = append(sp.SecretFilter.Ignore, filterSecret)
	}
}

func (sp ScanPolicy) SetScanConcurrency(scanConcurrency int) {
	sp.ScanConcurrency = scanConcurrency
}

func (sp ScanPolicy) SetFilingFile(filePath string) {
	if filePath != "" {
		records, _ := util.ReadCSVFile(filePath)
		for _, row := range records {
			tempString := ""
			for _, col := range row {
				tempString += col
			}
			secretSha256 := util.GetContentHashSHA256(tempString)
			sp.FilingData[secretSha256] = true
		}
	}
}

func (sp ScanPolicy) SetScanAllFiles(scanAllFiles bool) {
	sp.ScanAllFiles = scanAllFiles
}

func (sp ScanPolicy) GetScanConcurrency() *semaphore.Weighted {
	return semaphore.NewWeighted(int64(sp.ScanConcurrency))
}

扫描主体流程

package application

import (
	"context"
	"os"
	"path/filepath"
	"secret-scan/src/domain/plugin"
	"secret-scan/src/domain/policy"
	"secret-scan/src/domain/potentialSecret"
	"secret-scan/src/infrastructure/conf"
	"secret-scan/src/infrastructure/logger"
	"secret-scan/src/infrastructure/util"
	"strconv"
	"sync"
)

type ScanService struct {
}

func (ss ScanService) ScanDirectory(directoryName string) potentialSecret.PotentialSecretCollection {
	scanResults := []potentialSecret.PotentialSecret{}
	enablePlugins := conf.GetEnablePlugins()
	var wg sync.WaitGroup
	sem := policy.GlobalScanPolicy.GetScanConcurrency()
	err := filepath.WalkDir(directoryName, func(path string, info os.DirEntry, err error) error {
		if err != nil {
			logger.Logger.Error("visit dir %s error:%v\n", path, err)
			return nil
		}
		if !info.IsDir() {
			wg.Add(1)
			go func() {
				defer wg.Done()
				sem.Acquire(context.Background(), 1)
				defer sem.Release(1)
				scanFileResults := ss.ScanFile(path, enablePlugins)
				for _, secret := range scanFileResults {
					scanResults = append(scanResults, secret)
				}
			}()
		}
		return nil
	})
	if err != nil {
		logger.Logger.Error("walk dir %s error:%v\n", directoryName, err)
	}
	wg.Wait()
	potentialSecretCollection := potentialSecret.PotentialSecretCollection{directoryName, scanResults}
	return potentialSecretCollection
}

func (ss ScanService) ScanFile(filePath string, enablePlugins map[string]plugin.Plugin) []potentialSecret.PotentialSecret {
	if ss.isFileFilterOut(filePath) {
		return nil
	}
	lines, _ := util.ReadFileLines(filePath)
	potentialSecretLis := ss.ScanLines(lines, filePath, enablePlugins)
	return potentialSecretLis
}

func (ss ScanService) ScanLines(lines []string, fileName string, enablePlugins map[string]plugin.Plugin) []potentialSecret.PotentialSecret {
	potentialSecretLis := []potentialSecret.PotentialSecret{}
	for line_number, line := range lines {
		for name, plugin := range enablePlugins {
			secretList := plugin.MatchSecrets(line, fileName)
			if len(secretList) != 0 {
				for _, secret := range secretList {
					if ss.isSecretFilterOut(fileName, name, secret, line_number+1) {
						continue
					}
					//logger.Logger.Info("类别:", name, "匹配内容:", secret, "所在行:", line_number+1, "所在文件:", fileName)
					newPotentialSecret := potentialSecret.PotentialSecret{name, fileName, secret, line_number + 1}
					potentialSecretLis = append(potentialSecretLis, newPotentialSecret)
				}
			}
		}
	}
	return potentialSecretLis
}

// 过滤文件
func (ss ScanService) isFileFilterOut(filePath string) bool {
	fileName := filepath.Base(filePath)
	if !util.IsDefaultCodeFile(fileName) && !policy.GlobalScanPolicy.ScanAllFiles {
		return true
	}
	for _, disableFileName := range policy.GlobalScanPolicy.FileFilter.DisableFileName {
		if fileName == disableFileName {
			return true
		}
	}
	return false
}

// 过滤机密
func (ss ScanService) isSecretFilterOut(fileName string, secretType string, secret string, line_number int) bool {
	secretSha256 := util.GetContentHashSHA256(fileName + secretType + secret + strconv.Itoa(line_number))
	if policy.GlobalScanPolicy.FilingData[secretSha256] {
		return true
	}
	for _, ignoreSecret := range policy.GlobalScanPolicy.SecretFilter.Ignore {
		if secret == ignoreSecret {
			return true
		}
	}
	return false
}

测试用例守护检测功能

import (
	"secret-scan/src/domain/plugin"
	"testing"
)

func TestAWSPlugin(t *testing.T) {
	testData := map[string]bool{"AKIAZZZZZZZZZZZZZZZZ": true,
		"akiazzzzzzzzzzzzzzzz": false,
		"AKIAZZZ":              false,
		"aws_access_key = \"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\"":  true,
		"aws_access_key = \"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYa\"": false,
		"aws_access_key = \"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKE\"":   false,
	}
	newRexgexpPlugin := plugin.NewRegexpPlugin("AWS Access Key", []string{"AKIA[0-9A-Z]{16}", "aws.{0,20}?(?:key|pwd|pw|password|pass|token).{0,20}?['\"]([0-9a-zA-Z/+]{40})['\"]"})
	passedNum := GetPluginPassedNum(testData, newRexgexpPlugin)
	if len(testData) != passedNum {
		t.Errorf("Not all test cases passed, all: %d, passed: %d", len(testData), passedNum)
	}
}

func GetPluginPassedNum(testData map[string]bool, newPlugin plugin.Plugin) int {
	passedItem := 0
	for value, excepted := range testData {
		result := newPlugin.MatchSecrets(value, "")
		if (len(result) > 0 && excepted) || (len(result) == 0 && !excepted) {
			passedItem += 1
		}
	}
	return passedItem
}

总结:
1 yaml文件中的字符串 以[ 开头,字符串整体需要加’’ 如[1 2 3] --> ‘[1 2 3]’
否则报错:
2 获取文件 迭代行 报错及处理办法
报错信息: bufio.Scanner: token too long

// Increase the buffer size of the scanner
const maxTokenSize = 1024 * 1024 // 1 MB
scanner := bufio.NewScanner(file)
scanner.Buffer(make([]byte, maxTokenSize), maxTokenSize)

设计方案:
在这里插入图片描述
具体实现:
在这里插入图片描述

工具功能介绍:

1 功能介绍:
   1. 支持目录扫描
   2. 支持扫描并发协程数量设置
   3. 支持文件名,机密信息的过滤
   4. 支持开源工具detect-sercet内置规则的检测,包括:
        正则插件(21类 包括主流国外厂商平台的认证密钥等)
        信息熵插件(2类,包括base64, 十六进制)
        关键字插件(5大类文件的检测,如配置类文件, go语言,c类语言等)
        其他插件(可扩展,如个人凭证类的信息识别)
    5. 日志记录
    6. 支持扫描结果文件 自定义路径输出
    7. 支持扫描zip压缩包,  tar.gz压缩包
    8. 上千条 测试用例集    守护功能

2 工具优势:
    1. 速度更快
    2. 覆盖面更广(开源工具规则+其他)
    3. 结果更直观(csv表格展示扫描结果)
    4. 误报率更低(备案功能   +    机器学习  自学习模型降低误报率)

工具使用说明:

  1. 扫描目录 file_dir 执行编译包,后接参数组合即可
./main scan file_dir
  1. 参数选择(可混合使用)
参数1:   filing_file                            (备案文件路径)
参数2:  filter_files                          (过滤文件名)
参数3:  filter_secrets                        (过滤机密信息值)
参数4:  scan_concurrency                      (扫描目录并发数量)
参数5:   output_dir                             (扫描结果输出目录,以.csv格式结尾)  
参数6:  scan_all_files                         (是否扫描全文件,默认只扫描代码类相关文件)  

./main scan file_dir --filter_files fileA, fileB
./main scan file_dir --filter_secrets xxa, aab  
  1. 配置文件
配置文件路径:  config/scan-config.yml
配置文件格式:
file_filter:
  disable_file_name:
    - xxx.txt
secret_filter:
  ignore:
    - xxx
scan_concurrency: 5
  1. 输出文档
日志记录                 输出文件路径:output/secret-scan.log
扫描结果CSV文件          默认输出文件路径:output/scan_result.csv

go实现日志记录

package logger

import (
	"bytes"
	"fmt"
	"io"
	"os"
	"strings"

	"github.com/sirupsen/logrus"
	"gopkg.in/natefinch/lumberjack.v2"
)

const logFile = "..\\output\\secret-scan.log"

var Logger *logrus.Logger

func InitLogger(level logrus.Level) {
	Logger = logrus.New()

	writer := io.MultiWriter(os.Stdout, &lumberjack.Logger{
		Filename:   logFile,
		Compress:   true,
		MaxBackups: 2,
		MaxSize:    5,
		MaxAge:     7,
	})
	Logger.SetOutput(writer)

	Logger.SetLevel(level)
	Logger.SetReportCaller(true)
	Logger.SetFormatter(&logFormatter{})
}

type logFormatter struct{}

func (m *logFormatter) Format(entry *logrus.Entry) ([]byte, error) {
	var b *bytes.Buffer
	if entry.Buffer != nil {
		b = entry.Buffer
	} else {
		b = &bytes.Buffer{}
	}

	timestamp := entry.Time.Format("2006-01-02 15:04:05.999")
	var newLog string

	if entry.HasCaller() {
		newLog = fmt.Sprintf("%-8s %s   :%d     %s\n",
			"["+strings.ToUpper(entry.Level.String())+"]", timestamp, entry.Caller.Line, entry.Message)
	} else {
		newLog = fmt.Sprintf("%-8s %s    %s\n", "["+strings.ToUpper(entry.Level.String())+"]", timestamp, entry.Message)
	}

	b.WriteString(newLog)
	return b.Bytes(), nil
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值