I am using Apache PDFBox and Java to parse the PDFs and get all the information from it. Extracting text is working fine for English only. For other languages I get only some special-characters. For example extracting the Arabic character ش will give the String :"? on printing. It is working fine when I change the "Region and language" of my computer from English to Arabic. So I think extracting the Unicode of the characters will solve this problem. Please help me to get the Unicode of the characters from PDF or suggest me some solutions to solve this problem.
解决方案
The private String escape(String chars) converts characters to unicode.