Java selenium无界面爬取需要登录的页面+验证码截取+验证码识别python tensorflow

最新推荐文章于 2024-07-01 23:31:16 发布

张少飞

最新推荐文章于 2024-07-01 23:31:16 发布

阅读量2.2k

点赞数

本文链接：https://blog.csdn.net/zsf5201314z/article/details/78771115

版权

1、使用PhantomJSDriver 作为无界面浏览器插件，首先需要进入该页面，使用下面的方法进行截图

//selenium截图
File screenshotAs = ((TakesScreenshot) phantomJSDriver).getScreenshotAs(OutputType.FILE);

2、找到图片所在位置用于进一步截图
List<WebElement> findElement = phantomJSDriver.findElements(By.xpath(imageHtmlTagXpath));
WebElement webElement = findElement.get(0);
int x = webElement.getLocation().getX();
int y = webElement.getLocation().getY();
int width = webElement.getSize().getWidth();
int height = webElement.getSize().getHeight();

//截取图片并通过python tensorflow对验证码进行识别
String imageAndRecognize = ImageDownLoadTool.cut(screenshotAs.getPath(), x, y, width, height);

/**
* 对图片裁剪，并把裁剪完蛋新图片保存。
*
* @return
*/
public static String cut(String fisName, int x, int y, int width, int height)
throws IOException {
String uuidImage = UUID.randomUUID().toString();
String imagePath = "D:\\images\\instanceImageCode\\" + uuidImage
+ ".jpeg";
FileInputStream is = null;
ImageInputStream iis = null;

try {
// 读取图片文件
Image src = Toolkit.getDefaultToolkit().getImage(fisName);
BufferedImage image = toBufferedImage(src);// Image to BufferedImage
BufferedImage out = image.getSubimage(x, y, width, height);
Graphics graphics = image.getGraphics();
graphics.drawImage(image, 0, 0, null);
graphics.dispose();
ImageIO.write(out, "jpeg", new File(imagePath));

} finally {
if (is != null)
is.close();
if (iis != null)
iis.close();
}

String pyFilePath = "D:\\PythonWorkPlace\\captcha_recognize-master\\java_recognize.py";
Object result = JavaInvokePython.getInstance()
.invokePythonScriptByStream(pyFilePath, imagePath);
if (result != null) {
String str = (String) result;
str = str.replaceAll("[\'\\[\\]]", "");
return str;
}
return imagePath;

}

有两种方法可以从java调用python脚本

第一种（可以多线程进行调用）

public Object invokePythonScriptByStream(String pythonFilePath,
String... filePath) {
String result = "";
String[] arg1 = new String[] { "python ", pythonFilePath };
String[] addAll = ArrayUtils.addAll(arg1, filePath);
try {
ProcessBuilder pb = new ProcessBuilder(addAll);
Process process = pb.start();
process.waitFor();
InputStreamReader ir = new InputStreamReader(
process.getInputStream());
LineNumberReader input = new LineNumberReader(ir);
result = input.readLine();
input.close();
ir.close();
process.waitFor();
} catch (Exception e) {
System.out.println("python调用异常" + e.getMessage());
}
System.out.println("python调用成功" + result);
return result;
}

第二种：不能进行多线程调用

public Object invokePythonScriptByStream(String pythonFilePath,
String... filePath) {
String result = "";
String[] arg1 = new String[] { "python ", pythonFilePath };
String[] addAll = ArrayUtils.addAll(arg1, filePath);
try {
Process process = Runtime.getRuntime().exec(addAll);
process.waitFor();
InputStreamReader ir = new InputStreamReader(
process.getInputStream());
LineNumberReader input = new LineNumberReader(ir);
result = input.readLine();
input.close();
ir.close();
process.waitFor();
} catch (Exception e) {
System.out.println("python调用异常" + e.getMessage());
}
System.out.println("python调用成功" + result);
return result;
}

调用示例：

public static void main1(String[] args) {
int count = 0;
File fl = new File(
"D:\\PythonWorkPlace\\captcha_recognize-master\\data\\test_data\\");
String[] files = fl.list();
File f = null;
for (String file : files) {
String filename = "";
f = new File(fl, file);
filename = f.getAbsolutePath();
System.out.println(filename);
count++;
Object result = JavaInvokePython.getInstance()
.invokePythonScriptByStream(pyFilePath, filename);
if (result != null) {
String str = (String) result;
str = str.replaceAll("[\'\\[\\]]", "");
if (filename.indexOf(str) != -1) {
System.out.println("匹配成功");
}
}
}
}

张少飞

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Java selenium无界面爬取需要登录的页面+验证码截取+验证码识别python tensorflow

1、使用PhantomJSDriver 作为无界面浏览器插件，首先需要进入该页面，使用下面的方法进行截图//selenium截图File screenshotAs = ((TakesScreenshot) phantomJSDriver).getScreenshotAs(OutputType.FILE); 2、找到图片所在位置用于进一步截图List findElemen
复制链接

扫一扫