java 操作图片叠加图片覆盖

最新推荐文章于 2023-08-17 15:42:28 发布

RanGe*

最新推荐文章于 2023-08-17 15:42:28 发布

阅读量1.8k

点赞数

分类专栏： Word Java

本文链接：https://blog.csdn.net/weixin_41601114/article/details/114383356

版权

本文介绍了一个需求，即从PDF中提取OCR识别的文字，并将其添加到Word文档中，同时将处理后的图片设置为Word的背景。通过转换hOCR文件为HTML，使用Java的Jsoup库解析HTML获取标签信息，最终在Word文档中创建文本框放置文字。此外，还提及了Java Swing图形界面和操作Word文档的相关内容。

摘要由CSDN通过智能技术生成

需求说明, 需要处理一个pdf文件, 将ocr识别出来的文字, 添加到word中, 将图片作为word背景图片, 也就实现了pdf转word功能.

import java.awt.AlphaComposite;
import java.awt.Graphics2D;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.imageio.ImageIO;

public class WaterPic {
   
    
    public static void main(String[] args){
   
    	// main方法里添加一些数据, 用于标记文本的位置, 也是图片中需要扣掉部分的坐标
        Map<String , Integer> map0 = new HashMap<String, Integer>();
        map0.put("width",869);
        map0.put("height", 254);
        map0.put("horizontal", 77);
        map0.put("vertical", 424);
        Map<String , Integer> map1 = new HashMap<String, Integer>();
        map1.put("width",786);
        map1.put("height", 100);
        map1.put("horizontal", 159);
        map1.put("vertical", 703);
        Map<String , Integer> map2 = new HashMap<String, Integer>();
        map2.put("width",686);
        map2.put("height", 149);
        map2.put("horizontal", 260);
        map2.put("vertical", 826);
        Map<String , Integer> map3 = new HashMap<String, Integer>();
        map3.put("width",797);
        map3.put("height", 129);
        map3.put("horizontal", 148);
        map3.put("vertical", 998);
        Map<String , Integer> map4 = new HashMap<String, Integer>();
        map4.put("width",870);
        map4.put("height", 99);
        map4.put("horizontal", 73);
        map4.put("vertical", 1128);
        List<Map<String,Integer>> list = new ArrayList<Map<String, Integer>>();
        list.add(map0);
        list.add(map1);
        list.add(map2);
        list.add(map3);
        list.add(map4);
		// 循环遍历
		WaterPic w= new WaterPic();
        for( Map<String, Integer> m:list){
   
            w.watermark("C:/Users/Administrator/Desktop/fileSource/elang.png", m.get("horizontal"), m.get("vertical"), m.get("width"), m.get("height"), 1f);
        }
    }

    /**
     * 
     * @Title: 构造图片
     * @Description: 生成水印并返回java.awt.image.BufferedImage
     * @param file
     *            源文件(图片)
     * @param waterFile
     *            水印文件(图片)
     * @param x
     *            距离左上角的X偏移量
     * @param y
     *            距离左上角的Y偏移量
     * @param alpha
     *            透明度, 选择值从0.0~1.0: 完全透明~完全不透明
     * @return BufferedImage
     * @throws IOException
     */
    public void watermark(String sourceFilePath, int x, int y, int width, int height, float alpha)  {
   
        File file = new File(sourceFilePath);
        // block.png是一个白色图片, 空白的
        File waterFile = new File("C:/Users/Administrator/Desktop/fileSource/block.png");
        try {
   
	        // 获取底图
	        BufferedImage buffImg = ImageIO.read(file);
	        // 获取叠加层图
	        BufferedImage waterImg = ImageIO.read(waterFile);
	        // 创建Graphics2D对象，用在底图对象上绘图
	        Graphics2D g2d = buffImg.createGraphics();
	        // 在图形和图像中实现混合和透明效果
	        g2d.setComposite(AlphaComposite.getInstance(AlphaComposite.SRC_ATOP, alpha));
	        // 绘制
	        g2d.drawImage(waterImg, x, y, width, height, null);
	        g2d.dispose();// 释放图形上下文使用的系统资源
	        // 保存图片
	        int temp = sourceFilePath.lastIndexOf(".") + 1;
            ImageIO.write(buffImg, sourceFilePath.substring(temp), new File(sourceFilePath));
        } catch (IOException e1) {
   
            e1.printStackTrace();
        }
    }
}

抠图之前是这样的
在这里插入图片描述
扣完之后大伙再看

在这里插入图片描述
然后需要做的就是将这个图片作为word的背景, 然后在word中操作, 将每段文字作为文本框放置到word中

首先获取到图片转ocr之后的hocr文件, 将格式改为html文件, 直接改后缀名就可以
html文件长这样

  <div class='ocr_page' id='page_1' title='image "/data/translate/mupdf/fileSource/elang.png"; bbox 0 0 1002 1417; ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 72 0 956 406">
    <p class='ocr_par' id='par_1_1' lang='rus' title="bbox 72 0 956 406">
     <span class='ocr_line' id='line_1_1' title="bbox 72 0 956 406; baseline 0 1011; x_size 169.33334; x_descenders 42.333336; x_ascenders 42.333332"><span class='ocrx_word' id='word_1_1' title='bbox 72 0 956 406; x_wconf 95'> </span> 
     </span>
    </p>
   </div>
   <div class='ocr_carea' id='block_1_2' title="bbox 77 424 946 678">
    <p class='ocr_par' id='par_1_2' lang='rus' title="bbox 77 424 946 678">
     <span class='ocr_line' id='line_1_2' title="bbox 77 424 946 448; baseline 0 -7; x_size 21; x_descenders 3; x_ascenders 5">
		 <span class='ocrx_word' id='word_1_2' title='bbox 77 425 246 445; x_wconf 96'>Дисциплины:</span> 
		 <span class='ocrx_word' id='word_1_3' title='bbox 257 430 295 446; x_wconf 95'>два</span> 
		 <span class='ocrx_word' id='word_1_4' title='bbox 309 430 450 448; x_wconf 95'>иностранных</span> 
		 <span class='ocrx_word' id='word_1_5' title='bbox 461 430 522 444; x_wconf 96'>языка</span> 
		 <span class='ocrx_word' id='word_1_6' title='bbox 534 424 667 446; x_wconf 95'>(английский,</span> 
		 <span class='ocrx_word' id='word_1_7' title='bbox 680 424 789 445; x_wconf 94'>немецкий,</span> 
		 <span class='ocrx_word' id='word_1_8' title='bbox 802 425 946 448; x_wconf 96'>французский,</span> 
     </span>
     <span class='ocr_line' id='line_1_3' title="bbox 79 450 946 473; baseline 0.003 -6; x_size 21; x_descenders 4; x_ascenders 5">
		 <span class='ocrx_word' id='word_1_9' title='bbox 79 451 202 470; x_wconf 95'>испанский,</span> 
		 <span class='ocrx_word' id='word_1_10' title='bbox 212 451 306 472; x_wconf 95'>датский,</span> 
		 <span class='ocrx_word' id='word_1_11' title='bbox 317 452 449 473; x_wconf 95'>норвежский,</span> 
		 <span class='ocrx_word' id='word_1_12' title='bbox 459 451 568 473; x_wconf 94'>шведский,</span> 
		 <span class='ocrx_word' id='word_1_13' title='bbox 578 450 692 469; x_wconf 95'>китайский,</span> 
		 <span class='ocrx_word' id='word_1_14' title='bbox 701 450 806 472; x_wconf 95'>турецкий,</span> 
		 <span class='ocrx_word' id='word_1_15' title='bbox 815 456 878 468; x_wconf 96'>языки</span> 
		 <span class='ocrx_word' id='word_1_16' title='bbox 888 456 946 473; x_wconf 96'>стран</span> 
     </span>
     <span class='ocr_line' id='line_1_4' title="bbox 78 477 946 500; baseline 0.001 -7; x_size 23; x_descenders 6; x_ascenders 4">
		 <span class='ocrx_word' id='word_1_17' title='bbox 78 481 250 494; x_wconf 96'>постсоветского</span> 
		 <span class='ocrx_word' id='word_1_18' title='bbox 259 477 373 500; x_wconf 96'>зарубежья</span> 
		 <span class='ocrx_word' id='word_1_19' title='bbox 382 483 393 495; x_wconf 92'>и</span> 
		 <span class='ocrx_word' id='word_1_20' title='bbox 401 479 442 500; x_wconf 92'>др.),</span> 
		 <span class='ocrx_word' id='word_1_21' title='bbox 452 479 539 500; x_wconf 96'>История</span> 
		 <span class='ocrx_word' id='word_1_22' title='bbox 548 480 723 498; x_wconf 96'>международных</span> 
		 <span class='ocrx_word' id='word_1_23' title='bbox 731 477 857 497; x_wconf 93'>отношений,</span> 
		 <span class='ocrx_word' id='word_1_24' title='bbox 866 478 946 494; x_wconf 91'>Консти-</span> 
     </span>
     <span class='ocr_line' id='line_1_5' title="bbox 77 504 946 526; baseline 0.002 -8; x_size 23.95467; x_descenders 5.6516395; x_ascenders 5.5"><span class='ocrx_word' id='word_1_25' title='bbox 77 506 200 524; x_wconf 92'>туционное</span> <span class='ocrx_word' id='word_1_26' title='bbox 213 507 279 524; x_wconf 96'>право</span> <span class='ocrx_word' id='word_1_27' title='bbox 292 504 424 525; x_wconf 96'>зарубежных</span> <span class='ocrx_word' id='word_1_28' title='bbox 436 509 500 526; x_wconf 91'>стран,</span> <span class='ocrx_word' id='word_1_29' title='bbox 513 504 607 525; x_wconf 96'>Мировая</span> <span class='ocrx_word' id='word_1_30' title='bbox 619 506 733 519; x_wconf 96'>экономика</span> <span class='ocrx_word' id='word_1_31' title='bbox 746 506 757 518; x_wconf 96'>и</span> <span class='ocrx_word' id='word_1_32' title='bbox 770 507 946 525; x_wconf 96'>международные</span> 
     </span>
     <span class='ocr_line' id='line_1_6' title="bbox 77 528 946 551; baseline 0.001 -7; x_size 23.95467; x_descenders 5.6516395; x_ascenders 5.5"><span class='ocrx_word' id='word_1_33' title='bbox 77 532 251 544; x_wconf 94'>экономические</span> <span class='ocrx_word' id='word_1_34' title='bbox 264 533 391 548; x_wconf 92'>отношения,</span> <span class='ocrx_word' id='word_1_35' title='bbox 403 531 568 546; x_wconf 96'>Экономическая</span> <span class='ocrx_word' id='word_1_36' title='bbox 580 532 714 548; x_wconf 94'>дипломатия,