===
> tdm <- TermDocumentMatrix(doc.corpus)
Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :'i, j, v' different lengths
In addition: Warning messages:
1: In parallel::mclapply(x, termFreq, control) :
all scheduled cores encountered errors in user code
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :
NAs introduced by coercion
原来是读入数据时的解码类型不对。
msg <- readLines(file, encoding = "latin1")
=== tm 中的stemming不准确