如何将html转换成链接,尝试将HTML转换为XML时链接出现问题

我正在尝试将html文件转换为xml。它在很大程度上起作用。我遇到的问题是链接。现在它似乎完全忽略了我的测试文件中的链接。

这是转换代码:

ini_set('display_errors', 1);

ini_set('log_errors', 1);

ini_set('error_log', dirname(__FILE__) . '/error_log.txt');

error_reporting(E_ALL);

function convertToXML()

{

$titleLength = 35;

$output = "";

$date = date("D, j M Y G:i:s T");

$fi = fopen( "../newsTEST.htm", "r" );

$fo = fopen( "../newsfeed.xml", "w" );

//This is the first parts of the XML

$output .= "<?xml version=\"1.0\"?>\n";

$output .= "\n";

$output .= "\n";

$output .= "\t

Wiggle 100 News\n";

$output .= "\thttp://www.wiggle100.com/news.php\n";

$output .= "\tWiggle 100 Daily News\n";

$output .= "\ten-us\n";

$output .= "\t". $date ."\n";

$output .= "\twiggle100@gmail.com\n";

$output .= "\tjosh@jacurren.com\n";

$article = "";

$skip = true; //if false will continue to put lines into output until

$newArticle = false;

while( !feof($fi) )

{

$line = fgets($fi);

$link = "";

if( strpos( $line, "

{

$pos = strpos( $line, "

$line = substr( $line, $pos );

$pos = strpos( $line, ">" );

$line = substr( $line, $pos + 1 );

$skip = false;

}

if( strpos( $line, "

" ) !== false )

{

$pos = strpos( $line, "

" );

$line = substr( $line, 0, $pos - 1 );

$newArticle = true;

}

//This adds the line to the article

if( !$skip )

{

$article .= $line;

}

//This mixes the article, title, link, and date with

// XML and puts it into the output

if( $newArticle )

{

//This if is to get rid of stuff like

if( (strlen($article) > 10) )

{

$link = findLink( $article );

//$article = strip_tags($article);

$title = substr( $article, 0, $titleLength ) . "...";

$output .= "\t\n";

$output .= "\t\t

". $title ."\n";

$output .= "\t\t". $link ."\n";

$output .= "\t\t". $article . "\n";

$output .= "\t\t". $date . "\n";

$output .= "\t\n\n";

}

$article = "";

$line = "";

$skip = true;

}

}

$output .= "\n";

$output .= "\n";

fwrite( $fo, $output );

fclose($fi);

fclose($fo);

echo "
News converted to XML";

}

//*****************************************************************************

//*****************************************************************************

//Find and return a link in the input.

//Else use the a default

function findLink( $input )

{

$link = "http://www.wiggle100.com/news.php";

if( strpos( $input, "

{

$startpos = strpos( $input, "href" );

$link = substr( $input, $startpos + 5 );

$endpos = strpos( $link, ">" );

$link = substr( $link, 0, $endpos - 2 );

}

return $link;

}

?>这是html测试代码:

Test Page

This is an article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.

This is another article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.

This is the 3rd article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.

This is the news for today. Blah Blah Blah!

http://www.thedailyreview.com/news/

这是XML输出:

Wiggle 100 News

http://www.wiggle100.com/news.php

Wiggle 100 Daily News

en-us

Fri, 23 Oct 2009 23:49:04 EDT

wiggle100@gmail.com

josh@jacurren.com

This is an article. Blah. Blah. Bla...

http://www.wiggle100.com/news.php

This is an article. Blah. Blah. Blah. Blah. Blah. Blah. Blah

Fri, 23 Oct 2009 23:49:04 EDT

This is another article. Blah. Blah...

http://www.wiggle100.com/news.php

This is another article. Blah. Blah. Blah. Blah. Blah. Blah. Blah

Fri, 23 Oct 2009 23:49:04 EDT

This is the 3rd article. Blah. Blah...

http://www.wiggle100.com/news.php

This is the 3rd article. Blah. Blah. Blah. Blah. Blah. Blah. Blah

Fri, 23 Oct 2009 23:49:04 EDT

This is the news for...

http://www.wiggle100.com/news.php

This is the news for today. Blah Blah Blah!

Fri, 23 Oct 2009 23:49:04 EDT

取消注释strip_tags()时,字体标记将消失。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值