php 从html里提取img,如何从html使用php提取img src,标题和alt？

最新推荐文章于 2021-08-19 17:21:41 发布

张昕宇梁红

最新推荐文章于 2021-08-19 17:21:41 发布

阅读量564

点赞数

文章标签： php 从html里提取img

编辑：现在我知道更好

使用regexp来解决这种问题是a bad idea，可能会导致不可维护和不可靠的代码。更好地使用HTML parser。

解决方案与regexp

在这种情况下，最好将过程分为两部分：

>获取所有的img标签

>提取他们的元数据

我将假设你的文档不是xHTML严格，所以你不能使用XML解析器。例如。与此网页源代码：

/* preg_match_all match the regexp in all the $html string and output everything as

an array in $result. "i" option is used to make it case insensitive */

preg_match_all('/]+>/i',$html, $result);

print_r($result);

Array

(

[0] => Array

(

[0] =>

[1] =>

[2] =>

[3] =>

[4] =>

[...]

)

)

然后我们得到所有的img标签属性与一个循环：

$img = array();

foreach( $result as $img_tag)

{

preg_match_all('/(alt|title|src)=("[^"]*")/i',$img_tag, $img[$img_tag]);

}

print_r($img);

Array

(

[] => Array

(

[0] => Array

(

[0] => src="/Content/Img/stackoverflow-logo-250.png"

[1] => alt="logo link to homepage"

)

[1] => Array

(

[0] => src

[1] => alt

)

[2] => Array

(

[0] => "/Content/Img/stackoverflow-logo-250.png"

[1] => "logo link to homepage"

)

)

[] => Array

(

[0] => Array

(

[0] => src="/content/img/vote-arrow-up.png"

[1] => alt="vote up"

[2] => title="This was helpful (click again to undo)"

)

[1] => Array

(

[0] => src

[1] => alt

[2] => title

)

[2] => Array

(

[0] => "/content/img/vote-arrow-up.png"

[1] => "vote up"

[2] => "This was helpful (click again to undo)"

)

)

[] => Array

(

[0] => Array

(

[0] => src="/content/img/vote-arrow-down.png"

[1] => alt="vote down"

[2] => title="This was not helpful (click again to undo)"

)

[1] => Array

(

[0] => src

[1] => alt

[2] => title

)

[2] => Array

(

[0] => "/content/img/vote-arrow-down.png"

[1] => "vote down"

[2] => "This was not helpful (click again to undo)"

)

)

[] => Array

(

[0] => Array

(

[0] => src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"

[1] => alt="gravatar image"

)

[1] => Array

(

[0] => src

[1] => alt

)

[2] => Array

(

[0] => "http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"

[1] => "gravatar image"

)

)

[..]

)

)

正则表达式是CPU密集型的，因此您可能想要缓存此页面。如果没有缓存系统，您可以使用ob_start和从文本文件加载/保存自己调整。

这个东西如何工作？

首先，我们使用preg_ match_ all，一个函数获取每个字符串匹配的模式和输出它的第三个参数。

正则表达式：

]+>

我们将其应用于所有html网页。它可以读为每个以“< img”开头的字符串，包含非“>” char并以>结尾。

(alt|title|src)=("[^"]*")

我们连续地应用它在每个img标签。它可以读为每个以“alt”，“title”或“src”开头的字符串，然后是“=”，然后是“”，一串不是“ 。隔离()之间的子字符串。

最后，每次你想要处理正则表达式，它都方便有好的工具来快速测试他们。检查这个online regexp tester。

编辑：回答第一个评论。

这是真的，我没有想到(希望很少)人使用单引号。

好吧，如果你只使用’，只是替换所有的“by”。

如果你混合两者。首先你应该拍下自己:-)，然后尝试使用(“|”)或“和[^ø]来替换[^”]。

张昕宇梁红

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php 从html里提取img,如何从html使用php提取img src,标题和alt？

编辑：现在我知道更好使用regexp来解决这种问题是a bad idea，可能会导致不可维护和不可靠的代码。更好地使用HTML parser。解决方案与regexp在这种情况下，最好将过程分为两部分：>获取所有的img标签>提取他们的元数据我将假设你的文档不是xHTML严格，所以你不能使用XML解析器。例如。与此网页源代码：/* preg_match_all match the reg...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。