JAVA过滤标签,提取网页中纯文本(网页代码不能直接复制解决办法)
1.F12打开网页控制台,选中对应代码块的父级标签
2.利用jquery获取元素
3.直接复制上面图片内文字到idea中(idea提前编写过滤网页标签代码,如下图)
public class JSDemo {
private String text;
public void setText(String text){
this.text =text;
}
public String getText(){
if (this.text != null && this.text != "") {
String msg = this.text.replaceAll("<.+?>|&.+?;",""); //执行替换成空字符
msg = msg.replaceAll("\\s/g",""); //去掉所有的空格(中文空格、英文空格都会被替换)
msg = msg.replaceAll("/[\r\n]/g",""); //去掉所有的换行符
return msg; //获文本文字内容的前100个字符
} else {
return "";
}
}
}
4,编写测试类,如下图(注意测试类要加public,否则引入@test注解后,左边没有运行按钮)
public class JSDemoTest {
@Test
public void test01(){
String text = "<code class=\"language-bat\"><span class=\"c1\">rem 执行后你可能需要把固定在任务栏上的图标取消固定,关闭程序再打开后才会看到效果,重建图标缓存需要一些时间,耐心等待</span>\n" +
"<span class=\"c1\">rem 强制杀死 Windows 资源管理器</span>\n" +
"taskkill /f /im explorer.exe\n" +
"<span class=\"c1\">rem 清理系统图标缓存数据库</span>\n" +
"attrib -h -s -r <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\IconCache.db\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\IconCache.db\"</span>\n" +
"attrib /s /d -h -s -r <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\*\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\thumbcache_32.db\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\thumbcache_96.db\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\thumbcache_102.db\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\thumbcache_256.db\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\thumbcache_1024.db\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\thumbcache_idx.db\"</span>\n" +
"<span class=\"k\">del</span> /f <span class=\"s2\">\"</span><span class=\"nv\">%userprofile%</span><span class=\"s2\">\\AppData\\Local\\Microsoft\\Windows\\Explorer\\thumbcache_sr.db\"</span>\n" +
"<span class=\"c1\">rem 清理 系统托盘记忆的图标</span>\n" +
"<span class=\"k\">echo</span> y<span class=\"p\">|</span>reg delete <span class=\"s2\">\"HKEY_CLASSES_ROOT\\Local Settings\\Software\\Microsoft\\Windows\\CurrentVersion\\TrayNotify\"</span> /v IconStreams\n" +
"<span class=\"k\">echo</span> y<span class=\"p\">|</span>reg delete <span class=\"s2\">\"HKEY_CLASSES_ROOT\\Local Settings\\Software\\Microsoft\\Windows\\CurrentVersion\\TrayNotify\"</span> /v PastIconsStream\n" +
"<span class=\"c1\">rem 启动 Windows 资源管理器</span>\n" +
"<span class=\"k\">start</span> explorer</code>";
JSDemo jsDemo = new JSDemo();
jsDemo.setText(text);
System.out.println(jsDemo.getText());;
}
}
5.控制台成功打印出源代码,直接复制即可