我想用PHP从word文档中提取文本内容.
我在Microsoft Word for Mac 2011中创建了一个新的单词文档.
编辑:也通过在Windows 7中的Microsoft Word中创建相同的文档进行测试.
文件的内容是
The quick brown fox jumps over the lazy dog
我把它保存到磁盘作为Word 97-2004文档(.doc).
$source = "word.doc";
$phpWord = \PhpOffice\PhpWord\IOFactory::load($source, 'MsDoc');
$text = '';
$sections = $phpWord->getSections();
foreach ($sections as $s) {
$els = $s->getElements();
foreach ($els as $e) {
if (get_class($e) === 'PhpOffice\PhpWord\Element\Text') {
$text .= $e->getText();
} elseif (get_class($e) === 'PhpOffice\PhpWord\Section\TextBreak') {
$text .= " \n";
} else {
throw new Exception('Unknown class type ' . get_class($e));
}
}
}
print $text;
此代码的输出只是文本的一部分:
The quick brown fox j
代码有问题,还是某种兼容性问题?
编辑:
如果我添加一个var_dump($els);之前($els为$e){输出是这样的:
array(1) {
[0]=>
object(PhpOffice\PhpWord\Element\Text)#1265 (14) {
["text":protected]=>
string(21) "The quick brown fox j"
["fontStyle":protected]=>
object(PhpOffice\PhpWord\Style\Font)#1267 (25) {
["aliases":protected]=>
array(1) {
["line-height"]=>
string(10) "lineHeight"
}
["type":"PhpOffice\PhpWord\Style\Font":private]=>
string(4) "text"
["name":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["hint":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["size":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["color":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["bold":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["italic":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["underline":"PhpOffice\PhpWord\Style\Font":private]=>
string(4) "none"
["superScript":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["subScript":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["strikethrough":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["doubleStrikethrough":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["smallCaps":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["allCaps":"PhpOffice\PhpWord\Style\Font":private]=>
bool(false)
["fgColor":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["scale":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["spacing":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["kerning":"PhpOffice\PhpWord\Style\Font":private]=>
NULL
["paragraph":"PhpOffice\PhpWord\Style\Font":private]=>
object(PhpOffice\PhpWord\Style\Paragraph)#1266 (26) {
["aliases":protected]=>
array(1) {
["line-height"]=>
string(10) "lineHeight"
}
["basedOn":"PhpOffice\PhpWord\Style\Paragraph":private]=>
string(6) "Normal"
["next":"PhpOffice\