curl 抓取页面html,如何使用curl函数从存储为字符串的HTML页面中提取值

最新推荐文章于 2021-06-10 04:28:36 发布

柯一颗

最新推荐文章于 2021-06-10 04:28:36 发布

阅读量464

点赞数

文章标签： curl 抓取页面html

我使用php/curl将HTML获取到一个字符串中,然后需要提取以下数据,然后从中投影出一个图表。

我想要的数据如下:

/p>

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />

Income
Operating income	22,922.00	21,507.30	17,492.60	13,683.90	10,227.12
Expenses
Material consumed	4,029.40	3,442.60	2,952.30	1,889.00	1,367.67
Manufacturing expenses	2,213.20	1,841.80	299.80	120.50	1,020.70
Personnel expenses	9,062.80	9,249.80	7,409.10	5,768.20	4,279.03
Selling expenses	378.10	308.40	532.10	-	171.05
Adminstrative expenses	1,737.00	1,906.00	2,583.70	2,651.70	904.78
Expenses capitalised	-	-	-	-	-
Cost of sales	17,420.50	16,748.60	13,777.00	10,429.40	7,743.22
Operating profit	5,501.50	4,758.70	3,715.60	3,254.50	2,483.90
Other recurring income	434.20	468.20	326.90	288.70	113.59
Adjusted PBDIT	5,935.70	5,226.90	4,042.50	3,543.20	2,597.49
Financial expenses	108.40	196.80	116.80	7.20	3.13
Depreciation	579.60	533.60	456.00	359.80	292.26
Other write offs	-	-	-	-	-
Adjusted PBT	5,247.70	4,496.50	3,469.70	3,176.20	2,302.10
Tax charges	790.80	574.10	406.40	334.10	286.10
Adjusted PAT	4,456.90	3,922.40	3,063.30	2,842.10	2,016.00
Non recurring items	441.10	-948.60	-	-	38.33
Other non cash adjustments	-	-	-	-	-33.85
Reported net profit	4,898.00	2,973.80	3,063.30	2,842.10	2,020.48
Earnigs before appropriation	4,898.00	2,973.80	3,063.30	2,842.10	2,020.48
Equity dividend	880.90	586.00	876.50	873.70	712.88
Preference dividend	-	-	-	-	-
Dividend tax	128.30	99.60	148.90	126.80	99.98
Retained earnings	3,888.80	2,288.20	2,037.90	1,841.60	1,207.62

我想提取每一个值,比如制造数据和该行中提到的所有年份的值。我该怎么办?

我发现了一些

preg_match('#

(.*) price#', $content, $match);

但这不符合我想要的价值观。

关注