我正在尝试将html文件转换为xml。它在很大程度上起作用。我遇到的问题是链接。现在它似乎完全忽略了我的测试文件中的链接。
这是转换代码:
ini_set('display_errors', 1);
ini_set('log_errors', 1);
ini_set('error_log', dirname(__FILE__) . '/error_log.txt');
error_reporting(E_ALL);
function convertToXML()
{
$titleLength = 35;
$output = "";
$date = date("D, j M Y G:i:s T");
$fi = fopen( "../newsTEST.htm", "r" );
$fo = fopen( "../newsfeed.xml", "w" );
//This is the first parts of the XML
$output .= "<?xml version=\"1.0\"?>\n";
$output .= "\n";
$output .= "\n";
$output .= "\t
Wiggle 100 News\n";$output .= "\thttp://www.wiggle100.com/news.php\n";
$output .= "\tWiggle 100 Daily News\n";
$output .= "\ten-us\n";
$output .= "\t". $date ."\n";
$output .= "\twiggle100@gmail.com\n";
$output .= "\tjosh@jacurren.com\n";
$article = "";
$skip = true; //if false will continue to put lines into output until
$newArticle = false;
while( !feof($fi) )
{
$line = fgets($fi);
$link = "";
if( strpos( $line, "
{
$pos = strpos( $line, "
$line = substr( $line, $pos );
$pos = strpos( $line, ">" );
$line = substr( $line, $pos + 1 );
$skip = false;
}
if( strpos( $line, "
" ) !== false ){
$pos = strpos( $line, "
" );$line = substr( $line, 0, $pos - 1 );
$newArticle = true;
}
//This adds the line to the article
if( !$skip )
{
$article .= $line;
}
//This mixes the article, title, link, and date with
// XML and puts it into the output
if( $newArticle )
{
//This if is to get rid of stuff like
if( (strlen($article) > 10) )
{
$link = findLink( $article );
//$article = strip_tags($article);
$title = substr( $article, 0, $titleLength ) . "...";
$output .= "\t\n";
$output .= "\t\t
". $title ."\n";$output .= "\t\t". $link ."\n";
$output .= "\t\t". $article . "\n";
$output .= "\t\t". $date . "\n";
$output .= "\t\n\n";
}
$article = "";
$line = "";
$skip = true;
}
}
$output .= "\n";
$output .= "\n";
fwrite( $fo, $output );
fclose($fi);
fclose($fo);
echo "
News converted to XML";
}
//*****************************************************************************
//*****************************************************************************
//Find and return a link in the input.
//Else use the a default
function findLink( $input )
{
$link = "http://www.wiggle100.com/news.php";
if( strpos( $input, "
{
$startpos = strpos( $input, "href" );
$link = substr( $input, $startpos + 5 );
$endpos = strpos( $link, ">" );
$link = substr( $link, 0, $endpos - 2 );
}
return $link;
}
?>这是html测试代码:
Test PageThis is an article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.
This is another article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.
This is the 3rd article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.
This is the news for today. Blah Blah Blah!
http://www.thedailyreview.com/news/
这是XML输出:
Wiggle 100 Newshttp://www.wiggle100.com/news.php
Wiggle 100 Daily News
en-us
Fri, 23 Oct 2009 23:49:04 EDT
wiggle100@gmail.com
josh@jacurren.com
This is an article. Blah. Blah. Bla...http://www.wiggle100.com/news.php
This is an article. Blah. Blah. Blah. Blah. Blah. Blah. Blah
Fri, 23 Oct 2009 23:49:04 EDT
This is another article. Blah. Blah...http://www.wiggle100.com/news.php
This is another article. Blah. Blah. Blah. Blah. Blah. Blah. Blah
Fri, 23 Oct 2009 23:49:04 EDT
This is the 3rd article. Blah. Blah...http://www.wiggle100.com/news.php
This is the 3rd article. Blah. Blah. Blah. Blah. Blah. Blah. Blah
Fri, 23 Oct 2009 23:49:04 EDT
This is the news for...http://www.wiggle100.com/news.php
This is the news for today. Blah Blah Blah!
Fri, 23 Oct 2009 23:49:04 EDT
取消注释strip_tags()时,字体标记将消失。