pdf位置坐标java,使用itext从pdf获取所选区域的坐标

I'm trying to remove text from a particular section of a PDF. If I know the X,Y co-ordinates of the area, I'm able to remove the text. But I'm unable to get the co-ordinates of the selected area from PDF. Kindly help me.

解决方案

In this question, you ask about removing content from a specific area. Now you are asking how to determine this specific area, but your question is incomplete: you are not telling us any of the criteria to select the area.

It seems that you are trying to do something that is called redaction. This is explained in the StackOverflow question: How to create and apply redactions?

In the answer to that question, I explain how to create redaction annotations programmatically. However, redaction is usually done manually, using Adobe Acrobat:

Sksda.png

The arrow shows the functionality you need: Tools > Protection > Mark for Redaction

If you only need the coordinates and no redaction annotation, you could introduce another annotation that allows you to mark a rectangle manually and then use iText to extract the coordinates. For instance: if the rectangle is a form field, then it's really easy to get the coordinates. If the content you want to remove is a value of the form field, it's even easier to remove that content: you just remove the field.

If there is no way to retrieve these coordinates manually, then you may be facing something that is impossible: for instance: if you don't know anything about the content of the area you want to remove, how on earth are you going to teach a program what it needs to remove?

If you do know what content you're looking for, you have to parse for that content. That question has been asked and answered before: Get the exact Stringposition in PDF

Update:

In the comments, you explain that you convert the PDF page to an image, that you render the image in a Java Swing application so that a user can select a rectangle. This rectangle is stored as a java.awt.Image.

This leads to the following potential problems due to the fact that the coordinate system in Java is different from the coordinate system in PDF.

The Y-axis is different: In PDF, the size of the page is described in rectangles that we call page boundaries. The most important page boundaries are the MediaBox (mandatory) and the CropBox (optional). The MediaBox contains the coordinates of the lower-left corner and the upper-right corner of the rectangle that defines your page. In the coordinate system, the Y-axis points upwards. The Y coordinate of the lower-left corner is lower than the Y coordinate of the upper-right corner. In Java, it's the other way around: the Y coordinate at the top of an object is 0 and the Y-axis points downwards: the higher the Y value, the lower the object at this Y value.

There may be an offset: In most cases, the lower-left corner of the MediaBox has the coordinate X = 0, Y = 0. This isn't always the case. It may be necessary to take into account an offset.

The resolution can be different: The default user unit corresponds with a point. For instance: an A4 page measures 595 by 842 user units. There are 72 points in every inch. When you create an image, you don't necessarily measure in points. Maybe you measure in pixels. Maybe you create an image with 300 pixels per inch (300 dpi).

All these reasons can cause the rectangle you get from your Swing app to be different from the coordinates you need to use in PDF. You need to take all of this into account, otherwise, you'll keep on facing you "it doesn't work" problem. This is not an iText problem, this is a Math problem.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值