正则表达式匹配替换网址

最新推荐文章于 2023-06-17 22:13:18 发布

hjjk888

最新推荐文章于 2023-06-17 22:13:18 发布

阅读量812

点赞数

分类专栏：解决问题积累 java基础

本文链接：https://blog.csdn.net/hjjk888/article/details/84009714

版权

java基础同时被 2 个专栏收录

43 篇文章 0 订阅

订阅专栏

解决问题积累

10 篇文章 0 订阅

订阅专栏

有这么一个需求，

网页里所有的src="/web/inde.jsp"src后面引号里面的网址全部添加为“www.baidu.com/web.inde.jsp”
另外<link >下的href也是同样。。
遇到里面是http: www. https:这样的不能替换

正则式为
s*(<LINK\\s+.+?href=|src=)\\s*['|\"]\\s*((?!http:|https:|www\\.).+?)['|\"]
关键点为消除正则的惰性和负向预查

[quote]
说明正则表达式匹配结果 $1
普通表达式 windows 98|2000|2003 windows 98 windows 2000 windows 2003
后向引用，获取匹配 windows (98|2000|2003) windows 98 windows 2000 windows 2003 98,2000,2003
非获取匹配 windows (?:98|2000|2003) windows 98 windows 2000 windows 2003
正向预查，非获取匹配 windows (?=98|2000) windows 98 windows 2000 windows 2003
负向预查，非获取匹配 windows (?!98|2000) windows 98 windows 2000 windows 2003 [/quote]


<LINK rel="stylesheet" href="style_new.css" type="text/css">
 <TD width="59" height="45"><IMG src="www.aaa/top-images/internaluse_sticker.gif" width="59" height="45" border="0"></TD>
  <TD width="170" height="45"><A href="/"><IMG src="http:\\aaa/top-images/title.gif" border="0"></A></TD>
 <TD><A href="/fscripts/link.asp?url=/global/"><IMG src="/top-images/eng_site.gif" width="112" height="22" border="0" alt="English"></A></TD>

改为


<LINK rel="stylesheet" href="www.baidu.comstyle_new.css" type="text/css">
<TD width="59" height="45"><IMG src="www.aaa/top-images/internaluse_sticker.gif" width="59" height="45" border="0"></TD>
  <TD width="170" height="45"><A href="/"><IMG src="http:\\aaa/top-images/title.gif" border="0"></A></TD>
<TD><A href="/fscripts/link.asp?url=/global/"><IMG src="www.baidu.com/top-images/eng_site.gif" width="112" height="22" border="0" alt="English"></A></TD>

代码如下


public String parseTool(String file,String replaceText){

			Scanner scanner;
			BufferedWriter rf;
			String repText=replaceText;
			String resultTxt="e:/result.txt";//输出文件路径
			try {
			//"\\s[?=<LINK\\s+.*\\s+href=|?=src=]+\\s*['|\"]\\s*(((?!http:)(?!https:)(?!www\\.)).+?)[\'|\"]";
				scanner = new Scanner(new FileInputStream(file));
				String reg="\\s*(<LINK\\s+.+?href=|src=)\\s*['|\"]\\s*((?!http:|https:|www\\.).+?)['|\"]";

				//String reg="\\s*src=[\'|\"]([^www\\.].+?)[\'|\"]";//
				String str;
				Pattern pattern;
				Matcher matcher;
				int status=0;
				String str0;

				rf =new  BufferedWriter( new FileWriter(resultTxt));
				while(scanner.hasNextLine()){
					str=scanner.nextLine();
					str0=str;
					pattern = Pattern.compile(reg);
					matcher = pattern.matcher(str);
						while (matcher.find()) {
						   status=1;
						   //System.out.println (matcher.group(2));
						   str=str.replaceAll(matcher.group(2),repText+matcher.group(2));
						}//while-find
						if(status==1){
							 rf.write(str);
						}else if(status==0){
							 rf.write(str0);
						}
						 rf.write("\r\n");
					//status=0;

				}//while-hasnext

				   rf.flush();//输出
				   rf.close(); //关闭输出流
			} catch (FileNotFoundException e) {

				e.printStackTrace();
				return "FileNotFinded.";
			} catch (IOException e) {
				e.printStackTrace();
				return "IOError.";
			}
			   return "ok.";
	}
public static void main(String []args){
			String result;
			result=new mainClass().parseTool("e:/to.txt","www.baidu.com");

			//result=new mainClass().parseTest("e:/to.txt","www.baidu.com");
			System.out.println("--"+result);

	}