C#中可以使用正则表达式来过滤html字符

最新推荐文章于 2021-06-07 13:13:10 发布

码农2007

最新推荐文章于 2021-06-07 13:13:10 发布

阅读量2.2k

点赞数

分类专栏： C# Asp.net Ajax 文章标签：正则表达式 html c# string regex asp.net

本文链接：https://blog.csdn.net/dream_like/article/details/2468994

版权

C# Asp.net Ajax 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

在C#中可以使用正则表达式来过滤html字符，比如，在验证用户输入时，为了保证安全性，就需要过滤html字符。

using System.Text.RegularExpressions;

Regex.Replace(htmlcode ,"<[^>]+>","");

解释一下：< 代表以 "< "开头

[^>] 其中[^...] 就是匹配任何字符，但不许匹配^之后紧跟的字符，也就是如果"<>" 出现在字符串中，是不会去过滤的，因为它部属于html标记.

然后就是那个 + 号，加号的意思就是匹配前面的至少一个搜索项

最后是 >，表示html标记以>结尾。

从客户端(Control_Message_SendBox1:dgrdSendBox:_ctl3:_ctl1="<div id="de" onclick...")中检测到有潜在危险的 Request.Form 值。

解决办法：

<pages validateRequest="false" />
也可以在webconfig加上
<pages validateRequest="false"/>

嵌入页面代码
<iframe frameborder="no" scrolling="no" width="100%" height="25" src="a.htm"
tabIndex="0">
</iframe>

替换，在HTML中，多个普通空格会作为一个空格来识别，所以用代码替换，具体看下面代码：

string Context = Content.Text.ToString();
Context=Context.Replace("<","<");         //过滤HTML代码
Context=Context.Replace(">",">");
Context=Context.Replace("/r","<BR>"); //回车
Context=Context.Replace("   ","  "); //空格
Context=Context.Replace("/t","        "); //水平   Tab

写了一个类，用来过滤ASP.NET中用户输入
写的比较差。现在检查结果是通过返回值得形式给使用者的，其实还是用抛出异常的方式提示用户比较好，这样不用一次一次判断每一个函数的返回值，只需要一个try{}中包含所有的检查函数的调用，用一个catch捕获就可以了。

code

using System;
using System.Security.Cryptography;
using System.Text;
using System.Text.RegularExpressions;
using System.Xml;
using System.Web;

namespace InputSecurityCheck
{
/// <summary>
/// UserInputCheck 是一个用来检查用户输入有效性的类
/// </summary>
public class UserInputCheck
{
public UserInputCheck()
{
   //
   // TODO: 在此处添加构造函数逻辑
   //
}

/// <summary>
/// 利用正则表达式匹配字符串的函数
/// </summary>
/// <param name="uncheckedString">待检查的字符串</param>
/// <param name="pattern">正则表达式</param>
/// <returns>
/// 匹配   返回 true
/// 不匹配返回 false
/// </returns>
public static bool CheckString(string uncheckedString,string pattern)
{
   string strpattern = pattern;
   Regex regex = new Regex(strpattern,RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline | RegexOptions.IgnoreCase);
   Match match = regex.Match(uncheckedString);
   if (match.Success)
   {
    return true;
   }
   else
   {
    return false;
   }
}
/// <summary>
/// 检查字符串是否为纯数字
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 是返回 true
/// 否返回 false
/// </returns>
public static bool IsNumeric(string strUnChecked)
{
   return CheckString(strUnChecked,@"^/d+$");
}
/// <summary>
/// 检查字符串中是存在有可能导致Sql Injection问题的字符,包括' " ; -
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 存在非法字符返回 true
/// 不存在       返回 false
/// </returns>
public static bool IsNonlicetChar(string strUnChecked)
{
   return !CheckString(strUnChecked,@"^[^""';-]+$");
}
/// <summary>
/// 检查字符串是否是纯英文字母
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 全部是英文字符返回 true
/// 存在非英文字符以外的字符返回 false
/// </returns>
public static bool IsEnglishChar(string strUnChecked)
{
   return CheckString(strUnChecked,@"^[A-Za-z]+$");
}
/// <summary>
/// 检查IP地址的有效性
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 有效返回 true
/// 无效返回 false
/// </returns>
public static bool IsIpAdderessFormat(string strUnChecked)
{
   return CheckString(strUnChecked,@"^([01]?/d/d?|2[0-4]/d|25[0-5])/.([01]?/d/d?|2[0-4]/d|25[0-5])/.([01]?/d/d?|2[0-4]/d|25[0-5])/.([01]?/d/d?|2[0-4]/d|25[0-5])$");
}
/// <summary>
/// 检查字符串是否只包含英文字母和数字
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 是返回 true
/// 否返回 false
/// </returns>
public static bool IsCharNumberAndUnderLine(string strUnChecked)
{
   return CheckString(strUnChecked,@"^[A-Za-z0-9_]+$");
}
}
}

//vb

一、清楚内容中的Javsscript 代码

1 Function ClearJSCode(originCode)
2
3   Dim reg
4
5   set reg = New RegExp
6
7   reg.Pattern = " <SCRIPT[^<]*</SCRIPT> "
8   reg.IgnoreCase = True
9   reg.Global = True
10
11   clearJSCode = reg.Replace(originCode, "" )
12
13 End Function
14

二、清除内容中的HTML代码

1 Function ClearHTMLCode(originCode)
2
3     Dim reg
4     set reg = new RegExp
5
6     reg.Pattern = " <[^>]*> "
7     reg.IgnoreCase = True
8     reg.Global = True
9
10     ClearHTMLCode = reg.Replace(originCode, "" )
11
12 End Function
13

码农2007

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
C#中可以使用正则表达式来过滤html字符

在C#中可以使用正则表达式来过滤html字符，比如，在验证用户输入时，为了保证安全性，就需要过滤html字符。using System.Text.RegularExpressions;Regex.Replace(htmlcode ,"]+>","");解释一下：[^>] 其中[^...] 就是匹配任何字符，但不许匹配^之后紧跟的字符，也就是如果"然后就是那个 + 号，加号的
复制链接

扫一扫