Chinese PoS Segmentation Technical Notes

1. Run java byte program

command: java XXXX

There is no need to include the ".class" affix to the program name when calling this java command. If so, an error will be thrown.


2. Character encoding convert in Linux

no.1 Check file encoding:

file --mime-encoding filename

no.2 Check system availalbe encoding sets

iconv -l

no.3 convert

iconv -f old_encoding -t new_encoding filename 


3. Java IO bufferedwriter

The write action performed on the objective file will not be committed until buffered object was properly closed, which means calling close() method on a bufferedwriter object is a must.


4. Java IO R/W utf-8 text file

Constructor OutputStreamWriter accept encoding parameter, which means it can be used to wrap FileOutputStream constructor and handle different kinds of encodings


5. ICTCLAS2015 user defined dictionary path

Data/FieldDict.pdat, Data/FieldDict.pos


6. ICTCLAS2015 user dictionary must be encoded in ANSI form to be correctly imported into Data/FieldDict.pdat and Data/FieldDict.pos files


7. python ctypes.c_char_p()

This function requires the memory address of the object to be successfully called, note in ctypes, the memory address is represented by the (python) id of an object


8. HanLp segment.seg() function will, by default, remove all the "\n" characters in the text strings. The method of how to change this setting is unknown.


9. Java:

If a package statement is not used then the class, interfaces, enumerations and annotation types will be put into an unnamed package 


10. Python Requests Module Timeout

request.get("XXX", timeout = 1)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值