php img alt,如何使用php从html中提取img src、title和alt？

最新推荐文章于 2023-03-13 16:26:36 发布

weixin_39897749

最新推荐文章于 2023-03-13 16:26:36 发布

阅读量234

点赞数

文章标签： php img alt

现在我更清楚了

使用regexp解决这类问题是坏主意并可能导致无法维护和不可靠的代码。最好用HTML解析器.

用regexp的溶液

在这种情况下，最好将流程分为两部分：获取所有IMG标记

提取它们的元数据

我假设您的文档不是严格的xHTML，所以您不能使用XML解析器。例如，使用此网页的源代码：/* preg_match_all match the regexp in all the $html string and output everything as

an array in $result. "i" option is used to make it case insensitive */preg_match_all('/]+>/i',$html, $result); print_r($result);Array(

[0] => Array

(

[0] =>

[1] =>

ain to undo)" />

[2] =>

(click again to undo)" />

[3] =>

width=32 alt="gravatar image" />

[4] =>

o undo)" />[...]

))

然后，我们使用一个循环获得所有IMG标记属性：$img = array();foreach( $result as $img_tag){

preg_match_all('/(alt|title|src)=("[^"]*")/i',$img_tag, $img[$img_tag]);}print_r($img);Array(

[] => Array

(

[0] => Array

(

[0] => src="/Content/Img/stackoverflow-logo-250.png"

[1] => alt="logo link to homepage"

)

[1] => Array

(

[0] => src [1] => alt )

[2] => Array

(

[0] => "/Content/Img/stackoverflow-logo-250.png"

[1] => "logo link to homepage"

)

)

[] => Array

(

[0] => Array

(

[0] => src="/content/img/vote-arrow-up.png"

[1] => alt="vote up"

[2] => title="This was helpful (click again to undo)"

)

[1] => Array

(

[0] => src [1] => alt [2] => title

)

[2] => Array

(

[0] => "/content/img/vote-arrow-up.png"

[1] => "vote up"

[2] => "This was helpful (click again to undo)"

)

)

[]

=> Array

(

[0] => Array

(

[0] => src="/content/img/vote-arrow-down.png"

[1] => alt="vote down"

[2] => title="This was not helpful (click again to undo)"

)

[1] => Array

(

[0] => src [1] => alt [2] => title

)

[2] => Array

(

[0] => "/content/img/vote-arrow-down.png"

[1] => "vote down"

[2] => "This was not helpful (click again to undo)"

)

)

[

alt="gravatar image" />] => Array

(

[0] => Array

(

[0] => src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"

[1] => alt="gravatar image"

)

[1] => Array

(

[0] => src [1] => alt )

[2] => Array

(

[0] => "http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"

[1] => "gravatar image"

)

)

[..]

))

Regexp是CPU密集型的，因此您可能需要缓存此页面。如果没有缓存系统，则可以使用OB起动并从文本文件加载/保存。

这些东西是怎么工作的？

首先，我们使用预匹配一个函数，它获取与模式匹配的每个字符串，并将其输出到它的第三个参数中。

雷杰普：]+>

我们将其应用于所有html网页。它可以理解为每一个以“”字符，以a>结尾。.(alt|title|src)=("[^"]*")

我们将它依次应用于每个IMG标签上。它可以理解为每一个以“alt”、“title”或“src”开头的字符串，然后是“=”，然后是‘，一堆不是’并以‘“结尾的东西。.

最后，每次您想要处理regexp时，都可以方便地拥有快速测试它们的好工具。看看这个在线检验仪.

编辑：回答第一个评论。

的确，我没有想到使用单引号的人(希望很少)。

嗯，如果你只使用‘，只需替换所有的“by”。

如果你把两者混合在一起。首先，你应该拍打自己：-)，然后试着用“或”和“来代替[^”]。

weixin_39897749

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php img alt,如何使用php从html中提取img src、title和alt？

现在我更清楚了使用regexp解决这类问题是坏主意并可能导致无法维护和不可靠的代码。最好用HTML解析器.用regexp的溶液在这种情况下，最好将流程分为两部分：获取所有IMG标记提取它们的元数据我假设您的文档不是严格的xHTML，所以您不能使用XML解析器。例如，使用此网页的源代码：/*preg_match_allmatchtheregexpinallthe$htmlstrin...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。