rails table html,ruby on rails - How to parse a HTML table with Nokogiri? - Stack Overflow

Your desired output is nonsense:

['Raw name 1', 2,094, 0,017, 0,098, 0,113, 0,452]

# ~> -:1: Invalid octal digit

# ~> ['Raw name 1', 2,094, 0,017, 0,098, 0,113, 0,452]

I'll assume you want quoted numbers.

After stripping the stuff that keeps the code from working, and reducing the HTML to a more manageable example, then running it:

require 'nokogiri'

html = <

Table nameColumn name 1Column name 2
Raw name 12,0940,017
Raw name 52,0940,017

EOT

doc = Nokogiri::HTML(html)

tables = doc.css('table.open')

tables_data = []

tables.each do |table|

title = table.css('tr[1] > th').text # !> assigned but unused variable - title

cell_data = table.css('tr > td').text

raw_name = table.css('tr > th').text

tables_data << [cell_data, raw_name]

end

Which results in:

tables_data

# => [["2,0940,0172,0940,017",

# "Table nameColumn name 1Column name 2Raw name 1Raw name 5"]]

The first thing to notice is you're not using title though you assign to it. Possibly that happened when you were cleaning up your code as an example.

css, like search and xpath, returns a NodeSet, which is akin to an array of Nodes. When you use text or inner_text on a NodeSet it returns the text of each node concatenated into a single string:

Get the inner text of all contained Node objects.

This is its behavior:

require 'nokogiri'

doc = Nokogiri::HTML('

foo

bar

')

doc.css('p').text # => "foobar"

Instead, you should iterate over each node found, and extract its text individually. This is covered many times here on SO:

doc.css('p').map{ |node| node.text } # => ["foo", "bar"]

That can be reduced to:

doc.css('p').map(&:text) # => ["foo", "bar"]

The docs say this about content, text and inner_text when used with a Node:

Returns the content for this Node.

Instead, you need to go after the individual node's text:

require 'nokogiri'

html = <

Table nameColumn name 1Column name 2Column name 3Column name 4Column name 5
Raw name 12,0940,0170,0980,1130,452
Raw name 52,0940,0170,0980,1130,452

EOT

tables_data = []

doc = Nokogiri::HTML(html)

doc.css('table.open').each do |table|

# find all rows in the current table, then iterate over the second all the way to the final one...

table.css('tr')[1..-1].each do |tr|

# collect the cell data and raw names from the remaining rows' cells...

raw_name = tr.at('th').text

cell_data = tr.css('td').map(&:text)

# aggregate it...

tables_data += [raw_name, cell_data]

end

end

Which now results in:

tables_data

# => ["Raw name 1",

# ["2,094", "0,017", "0,098", "0,113", "0,452"],

# "Raw name 5",

# ["2,094", "0,017", "0,098", "0,113", "0,452"]]

You can figure out how to coerce the quoted numbers into decimals acceptable to Ruby, or manipulate the inner arrays however you want.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值