【图片识别】不应用tess4j的图片验证码识别

依旧是爬虫的问题拓展吧

以我爬取http://www.digifilm.com.cn/index.php/index/index.html这个网站上的数据来说吧

下载文件需要登陆,登陆需要验证码。

首先要了解这个登陆的原理。是先随机生成了一个4位数字的带干扰线的图片,然后把图片对应的数字存到了session中,进行验证的时候,根据填写的数字和session中的验证码进行比较,一致则认为验证码正确。

所以 画个重点,需要session。

爬虫获取session是很基础的吧 应用Jsoup进行访问该网页

Response resultImageResponse = Jsoup.connect("http://www.digifilm.com.cn/index.php/public/checklogin.html").ignoreContentType(true).execute();
Map<String, String> cookies = resultImageResponse.cookies();

然后其他操作记得都把这个cookies带上,就算把session绑定了。

然后进行下一步,图片识别首先得有图片。下载图片一定记得带上刚刚的cookies,否则你的图片和session没有关联,肯定是识别不上的,就好比是你用火狐打开了网页,然后把图片地址复制,粘贴,用谷歌打开,然后刷新换了验证码,再去火狐上输入,这是百分百没有用会报验证码错误的。

附上一个可以直接用的下载代码,当然也可以根据需要改成把下载路径传过来,都随意无所谓的,并不重要

 public static String downloadImg(String url, Map<String, String> cookies) throws IOException 
	 {
	        Connection connect = Jsoup.connect(url);
	        connect.cookies(cookies);// 携带cookies爬取图片
	        connect.timeout(5 * 10000);
	        Connection.Response response = connect.ignoreContentType(true).execute();
	        byte[] img = response.bodyAsBytes();
	        // 读取文件存储位置
	        String directory = "f://test1//";
	        savaImage(img, directory, "yzm.png");
	        return "f://test1//yzm.png";
	    }
public static void savaImage(byte[] img, String filePath, String fileName) {
	        BufferedOutputStream bos = null;
	        FileOutputStream fos = null;
	        File file = null;
	        File dir = new File(filePath);
	        try {
	            // 判断文件目录是否存在
	            if (!dir.exists()) {
	                dir.mkdir();
	            }
	           
	            file = new File(filePath + "\\" + fileName);
	            fos = new FileOutputStream(file);
	            bos = new BufferedOutputStream(fos);
	            bos.write(img);
//	            System.out.println("验证码已经下载到:"+filePath);
	        } catch (FileNotFoundException e) {
	            e.printStackTrace();
	        } catch (IOException e) {
	            e.printStackTrace();
	        } finally {
	            if (bos != null) {
	                try {
	                    bos.close();
	                } catch (IOException e) {
	                    // TODO Auto-generated catch block
	                    e.printStackTrace();
	                }
	            }
	            if (fos != null) {
	                try {
	                    fos.close();
	                } catch (IOException e) {
	                    // TODO Auto-generated catch block
	                    e.printStackTrace();
	                }
	            }
	        }

	    }

当然,不这么麻烦,直接使用Jsoup下载文件也是可以的。

Response resultImageResponse = Jsoup.connect(url).cookies(cookies).ignoreContentType(true).execute(); 
FileOutputStream out = (new FileOutputStream(new java.io.File("f://test//yzm.png")));
out.write(resultImageResponse.bodyAsBytes());             
out.close();

有了图片,就到了要说的重点了,图片识别。

图片识别因为之前没有接触过,所以首先选择了百度。。然后度娘的搜索结果,大部分是Tess4j,个别还有OpenCV,用法倒是很简单,如果是直接使用Windows系统的就移步百度使用那些就可以了,因为现成的引个jar包就直接识别了,很稳。

但是使用linux系统的话,就会接下来继续百度 怎么在linux上安装tess4j。。。emmmm 可以推荐个我觉得靠谱的链接,虽然反正我没装上。http://www.cnblogs.com/dajianshi/p/4932882.html

然后linux装不上tess4j的那一堆东西,就很尴尬,要是不想给领导留下,你花一周的时间就跟我说做不了,你个废物的形象,就得研究一下图像识别具体的原理了。

图像识别实现起来首先第一步是读图,把图片转为二维矩阵,或者说是黑白图,要识别的部分转为黑的,背景转为白色。然后把图片进行切分,这个需要结合图片来实现,多打印几个矩形分析就可以。如果是不规则位置的,那就得自己写算法实现切分。

依旧以我的网站为例,验证码下过来是22*50的,长不管,宽50是5+8+2+8+2+8+2+8+7实现的。前5后7是背景边框不用管,中间的2是分隔不用管 8是数字矩阵的实际宽度。

然后打印下来的数字矩阵,对照每一个0-9,会发现连倾斜都没有 哦吼吼的。

接下来想到的就是把这22*8的4个数字 再进行切分 成10*8,再与标准数字矩阵进行比较,最相似的就是识别结果。这个相似也有很多种算法吧,我看网上有什么cos什么什么的,我用了个比较简单的思路,对比比较矩阵和标准矩阵,相同记数加1,最后除以80,取最大结果作为识别结果。

整理代码如下:

标准矩阵

public static final String[][] ziro = new String[][] 
	{
		{" "," "," ","*","*"," "," "," "},
		{" "," ","*","*","*","*"," "," "},
		{" ","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," "," ","*","*"},
		{"*","*"," "," "," "," ","*","*"},
		{"*","*"," "," "," "," ","*","*"},
		{"*","*"," "," "," "," ","*","*"},
		{" ","*","*"," "," ","*","*"," "},
		{" "," ","*","*","*","*"," "," "},
		{" "," "," ","*","*"," "," "," "},
	};
	public static final String[][] one = new String[][] 
	{
		{" "," "," ","*","*"," "," "," "},
		{" "," ","*","*","*"," "," "," "},
		{" ","*","*","*","*"," "," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" ","*","*","*","*","*","*"," "},
	};
	public static final String[][] two = new String[][] 
	{
		{" "," ","*","*","*","*"," "," "},
		{"*","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," "," ","*","*"},
		{" "," "," "," "," "," ","*","*"},
		{" "," "," "," "," ","*","*"," "},
		{" "," "," "," ","*","*"," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" "," ","*","*"," "," "," "," "},
		{" ","*","*"," "," "," "," "," "},
		{"*","*","*","*","*","*","*","*"},
	};
	public static final String[][] three = new String[][] 
	{
		{" ","*","*","*","*","*"," "," "},
		{"*","*"," "," "," ","*","*"," "},
		{" "," "," "," "," "," ","*","*"},
		{" "," "," "," "," ","*","*"," "},
		{" "," "," ","*","*","*"," "," "},
		{" "," "," "," "," ","*","*"," "},
		{" "," "," "," "," "," ","*","*"},
		{" "," "," "," "," "," ","*","*"},
		{"*","*"," "," "," ","*","*"," "},
		{" ","*","*","*","*","*"," "," "},
	};
	public static final String[][] four = new String[][] 
	{
		{" "," "," "," "," ","*","*"," "},
		{" "," "," "," ","*","*","*"," "},
		{" "," "," ","*","*","*","*"," "},
		{" "," ","*","*"," ","*","*"," "},
		{" ","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," ","*","*"," "},
		{"*","*","*","*","*","*","*","*"},
		{" "," "," "," "," ","*","*"," "},
		{" "," "," "," "," ","*","*"," "},
		{" "," "," "," "," ","*","*"," "},
	};
	public static final String[][] five = new String[][] 
	{
		{"*","*","*","*","*","*","*"," "},
		{"*","*"," "," "," "," "," "," "},
		{"*","*"," "," "," "," "," "," "},
		{"*","*"," ","*","*","*"," "," "},
		{"*","*","*"," "," ","*","*"," "},
		{" "," "," "," "," "," ","*","*"},
		{" "," "," "," "," "," ","*","*"},
		{"*","*"," "," "," "," ","*","*"},
		{" ","*","*"," "," ","*","*"," "},
		{" "," ","*","*","*","*"," "," "},
	};
	public static final String[][] six = new String[][] 
	{
		{" "," ","*","*","*","*"," "," "},
		{" ","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," "," ","*"," "},
		{"*","*"," "," "," "," "," "," "},
		{"*","*"," ","*","*","*"," "," "},
		{"*","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," "," ","*","*"},
		{"*","*"," "," "," "," ","*","*"},
		{" ","*","*"," "," ","*","*"," "},
		{" "," ","*","*","*","*"," "," "},
	};
	public static final String[][] seven = new String[][] 
	{
		{"*","*","*","*","*","*","*","*"},
		{" "," "," "," "," "," ","*","*"},
		{" "," "," "," "," "," ","*","*"},
		{" "," "," "," "," ","*","*"," "},
		{" "," "," "," ","*","*"," "," "},
		{" "," "," ","*","*"," "," "," "},
		{" "," ","*","*"," "," "," "," "},
		{" ","*","*"," "," "," "," "," "},
		{"*","*"," "," "," "," "," "," "},
		{"*","*"," "," "," "," "," "," "},
	};	
	public static final String[][] eight = new String[][] 
	{
		{" "," ","*","*","*","*"," "," "},
		{" ","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," "," ","*","*"},
		{" ","*","*"," "," ","*","*"," "},
		{" "," ","*","*","*","*"," "," "},
		{" ","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," "," ","*","*"},
		{"*","*"," "," "," "," ","*","*"},
		{" ","*","*"," "," ","*","*"," "},
		{" "," ","*","*","*","*"," "," "},
	};
	public static final String[][] nine = new String[][] 
	{
		{" "," ","*","*","*","*"," "," "},
		{" ","*","*"," "," ","*","*"," "},
		{"*","*"," "," "," "," ","*","*"},
		{"*","*"," "," "," "," ","*","*"},
		{" ","*","*"," "," ","*","*","*"},
		{" "," ","*","*","*"," ","*","*"},
		{" "," "," "," "," "," ","*","*"},
		{" ","*"," "," "," "," ","*","*"},
		{" ","*","*"," "," ","*","*"," "},
		{" "," ","*","*","*","*"," "," "},
	};
	public static final String[][][] nums = new String[][][] {ziro,one,two,three,four,five,six,seven,eight,nine};

识别图片为二维矩阵

public static String cleanImage(File sfile)throws IOException
	{
		BufferedImage bufferedImage = ImageIO.read(sfile);
		int h = bufferedImage.getHeight();
		int w = bufferedImage.getWidth();
	
		// 灰度化
		int[][] gray = new int[w][h];
		for (int x = 0; x < w; x++)
		{
			for (int y = 0; y < h; y++)
			{
				int argb = bufferedImage.getRGB(x, y);
				// 图像加亮(调整亮度识别率非常高)
				int r = (int) (((argb >> 16) & 0xFF) * 1.1 + 30);
				int g = (int) (((argb >> 8) & 0xFF) * 1.1 + 30);
				int b = (int) (((argb >> 0) & 0xFF) * 1.1 + 30);
				if (r >= 255)
				{
					r = 255;
				}
				if (g >= 255)
				{
					g = 255;
				}
				if (b >= 255)
				{
					b = 255;
				}
				gray[x][y] = (int) Math.pow((Math.pow(r, 2.2) * 0.2973 + Math.pow(g, 2.2)* 0.6274 + Math.pow(b, 2.2) * 0.0753), 1 / 2.2);
			}
		}
		// 二值化
		int threshold = ostu(gray, w, h);
		BufferedImage binaryBufferedImage = new BufferedImage(w, h,BufferedImage.TYPE_BYTE_BINARY);
		for (int x = 0; x < w; x++)
		{
			for (int y = 0; y < h; y++)
			{
				if (gray[x][y] > threshold)
				{
					gray[x][y] |= 0x00FFFF;
				} else
				{
					gray[x][y] &= 0xFF0000;
				}
				binaryBufferedImage.setRGB(x, y, gray[x][y]);
			}
		}
		//打印矩阵
		for (int y = 0; y < h; y++)
		{
			for (int x = 0; x < w; x++)
			{
				if (isBlack(binaryBufferedImage.getRGB(x, y)))
				{
					System.out.print("*");
				} else
				{
					System.out.print(" ");
				}
			}
			System.out.println("");
		}
		return getNum(binaryBufferedImage,h,w);
	}
	
	
	public static boolean isBlack(int colorInt)
	{
		Color color = new Color(colorInt);
		if (color.getRed() + color.getGreen() + color.getBlue() <= 300)
		{
			return true;
		}
		return false;
	}
 
	public static int ostu(int[][] gray, int w, int h)
	{
		int[] histData = new int[w * h];
		for (int x = 0; x < w; x++)
		{
			for (int y = 0; y < h; y++)
			{
				int red = 0xFF & gray[x][y];
				histData[red]++;
			}
		}
		int total = w * h;
		float sum = 0;
		for (int t = 0; t < 256; t++) 
		{
			sum += t * histData[t];
		}
		float sumB = 0;
		int wB = 0;
		int wF = 0;
	
		float varMax = 0;
		int threshold = 0;
	
		for (int t = 0; t < 256; t++)
		{
			wB += histData[t]; // Weight Background
			if (wB == 0)
				continue;
			wF = total - wB; // Weight Foreground
			if (wF == 0)
				break;
	
			sumB += (float) (t * histData[t]);
	
			float mB = sumB / wB; // Mean Background
			float mF = (sum - sumB) / wF; // Mean Foreground
	
			// Calculate Between Class Variance
			float varBetween = (float) wB * (float) wF * (mB - mF) * (mB - mF);
	
			// Check if new maximum found
			if (varBetween > varMax)
			{
				varMax = varBetween;
				threshold = t;
			}
		}
		return threshold;
	}

切分为4个数字并进行具体比较

/**
	 * 根据矩阵识别数字
	 * 切分为 5 8 2 8 2 8 2 8 7
	 * 前5个没有用 8为数字 2为间隙 7为后面的空白部分
	 * @param binaryBufferedImage
	 * @param w 
	 * @param h 
	 */
	private static String getNum(BufferedImage binaryBufferedImage, int h, int w)
	{
		String result = "";
		//第一个数字
		String[][] toCompare = new String[h][8];
		for (int y = 0; y < h; y++)
		{
			for (int x = 5; x < 13; x++)
			{
				if (isBlack(binaryBufferedImage.getRGB(x, y)))
				{
					toCompare[y][x-5] = "*";
				} else
				{
					toCompare[y][x-5] = " ";
				}
			}
		}
		//把这个数字和0-9的数组进行比较
		result += compare(toCompare);
		for (int y = 0; y < h; y++)
		{
			for (int x = 15; x < 23; x++)
			{
				if (isBlack(binaryBufferedImage.getRGB(x, y)))
				{
					toCompare[y][x-15] = "*";
				} else
				{
					toCompare[y][x-15] = " ";
				}
			}
		}
		result += compare(toCompare);
		for (int y = 0; y < h; y++)
		{
			for (int x = 25; x < 33; x++)
			{
				if (isBlack(binaryBufferedImage.getRGB(x, y)))
				{
					toCompare[y][x-25] = "*";
				} else
				{
					toCompare[y][x-25] = " ";
				}
			}
		}
		result += compare(toCompare);
		for (int y = 0; y < h; y++)
		{
			for (int x = 35; x < 43; x++)
			{
				if (isBlack(binaryBufferedImage.getRGB(x, y)))
				{
					toCompare[y][x-35] = "*";
				} else
				{
					toCompare[y][x-35] = " ";
				}
			}
		}
		result += compare(toCompare);
		return result;
	}
	/**
	 * 比较0-9数组
	 * @param toCompare
	 */
	private static String compare(String[][] original) 
	{
		String[][] toCompare = new String[10][8];
		//确定开始
		int st =0;
		for(int y=1 ;y<original.length-1;y++) 
		{
			if((original[y][3].equals("*") && original[y][4].equals("*"))||(original[y][5].equals("*") && original[y][6].equals("*"))) 
			{
				st = y;
				break;
			}
		}
		for(int y=st;y<st+10;y++) 
		{
			for(int x=0;x<original[y].length;x++) 
			{
				toCompare[y-st][x] = original[y][x];
			}
		}
		String result = getCompareWithNumbers(toCompare);
		return result;
	}
	private static  String  getCompareWithNumbers(String[][] toCompare)
	{
		double similar = 0.0;
		int res = 0;
		for(int numIndex = 0;numIndex<Numbers.nums.length;numIndex++) 
		{
			int count = 0;
			for(int x=0;x<10;x++) 
			{
				for(int y=0;y<8;y++) 
				{
					if(toCompare[x][y].equals(Numbers.nums[numIndex][x][y])) 
					{
						count ++;
					}
				}
			}
			double thisSimilar = count/80.0;
			if(thisSimilar > similar) 
			{
				similar = thisSimilar;
				res = numIndex;
			}
		}
//		System.out.println("识别结果:"+res);
		return res+"";
	}

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
文件为压缩包,包括多个java识别图片文字的项目,国内网站上资料比较少,项目都整理出了测试类及说明,都包含着源代码及jar包、安装包、训练库。希望用得到的人少走弯路,资源都精心整理测试过,高分是必须的。 javaocr项目 纯java程序,国内网站几乎没有见到相关资料。 此项目提供了图形界面的测试类: net.sourceforge.javaocr.main.java javaocr2项目 与以上javaorc属同一项目的不同分支,纯java实现,以下3个测试类请参考。 个人对此项目感觉不错,虽然存在些问题,但是只要训练库好,英文和数字识别率很高。 我用此项目把某网站的数字验证码都截了图做成了训练库,识别率95%以上,用browserTest简单实现了程序自动登录与各种操作功能。 测试类: TestDemo.java MyDemo.java MyDemo2.java Longan-master项目 此项目也是纯java实现,识别英文和数字效果还不错,国内网站几乎没有任何相关资料。 测试类: com.zarkonnen.longan.Main 使用开发工具,请设置程序输入参数为:-o c:/111.txt c:/entest.png 代表输出文件和识别图片 Java_imagetotext项目 安装tesseract-ocr-setup-3.02.02.exe文件,通过java调用tesseract.exe文件识别图片,输出识别结果 测试类: com.chillyfacts.com.my_main.java 根据实际情况修改输入图片、输出文件名、tesseract.exe文件所在路径 Tess4J-3.4.7项目 知名度最大的java识别程序了。 我的环境是windows7 64位 jdk1.8 64位 各种折腾,痛苦的尝试了各版本,基本都是报模块加载失败,安装上vc_redist.x64_2015.exe后执行成功,但是在xp32位还是没有成功。 测试类: TesseractExample.java 测试英文识别 ChOcr.java 测试中文识别 tesseract3.03项目 测试类: de.vorb.tesseract.example.BasicExample 需要加载libtesseract303.dll 运行后一直报异常,没有搞定,也不打算深究了,遗憾!有人成功了,请分享为谢。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值