也许我的问题写得不好。我有一张桌子,我需要从一个网站上刮下来。我需要表中的信息,但必须清理前面提到的一些部分。我最终的解决方案是这个,而且很有效。它仍然有一些工作与手动更换,但那是因为愚蠢的“他们使用英寸”。;-)
解决方案:
\\ find the table in the sourcecode
foreach($techdata->find('table') as $table){
\\ filter out the rows
foreach($table->find('tr') as $row){
\\ take the innertext using simplehtmldom
$tech_specs = $row->innertext;
\\ strip some 'garbage'
$tech_specs = str_replace(" \t\t\t\t\t\t\t\t\t\t\t
","", $tech_specs);\\ find the first word of the string so I can use it
$spec1 = explode('
', $tech_specs)[0];\\ use the found string to strip down the rest of the table
$tech_specs = str_replace("
",":", $tech_specs);\\ manual correction because of the " used
$tech_specs = str_replace("
",":", $tech_specs);\\ manual correction because of the " used
$tech_specs = str_replace("
",":", $tech_specs);\\ strip some 'garbage'
$tech_specs = str_replace("\t\t\t\t\t\t\t\t\t\t","\n", $tech_specs);
$tech_specs = str_replace("
","", $tech_specs);$tech_specs = str_replace(" ","", $tech_specs);
\\ put the clean row in an array ready for usage
$specs[] = $tech_specs;
}
}