Oracle解析html中的字段
前提 将html以文本形式保存道数据库中
待解析html
<table class="dataintable">
<tr>
<th style="width:20%">产品代码</th>
<th style="width:20%">描述</th>
<th style="width:30%">单位净值</th>
<th style="width:30%">资金余额</th>
</tr>
<tr>
<td>SJ1265</td>
<td>私募基金</td>
<td>3,589.06</td>
<td>68,975,124.06</td>
</tr>
</table>
解析sql
with data_temp as (
select replace(a.t_context,' ','') as content from regex_tt a where a.t_id = 2
) --select * from data_temp
,data_temp1 as ( --获取第二行 tr
select substr(content,instr(content,'<tr',1,2),instr(content,'</tr',1,2)-instr(content,'<tr',1,2)) as data_info from data_temp
) --select substr(data_info,instr(data_info,'<td',1,1),instr(data_info,'</td',1,1)-instr(data_info,'<td',1,1)) as a from data_temp1;
,data_temp2 as ( --解析获取第二行的具体列
select REGEXP_REPLACE(substr(data_info,instr(data_info,'<td',1,1),instr(data_info,'</td',1,1)-instr(data_info,'<td',1,1)),'<td>','') as con_a
,REGEXP_REPLACE(substr(data_info,instr(data_info,'<td',1,2),instr(data_info,'</td',1,2)-instr(data_info,'<td',1,2)),'<td>','') as con_b
,replace(REGEXP_REPLACE(substr(data_info,instr(data_info,'<td',1,3),instr(data_info,'</td',1,3)-instr(data_info,'<td',1,3)),'<td>',''),',','') as con_c
,replace(REGEXP_REPLACE(substr(data_info,instr(data_info,'<td',1,4),instr(data_info,'</td',1,4)-instr(data_info,'<td',1,4)),'<td>',''),',','') as con_d from data_temp1
) select * from data_temp2
用到的函数
- replace(stringg,‘gg’,‘g’) 执行后就是string
- instr() 方法 参数说明及使用
–instr(str1,str2[,num1,num2]) “[,num1,num2]” 为可选参数
–str1 :被搜索的目标
–str2 :希望搜索的目标
–num1 :str1 的检索开始位置 ,默认为 1 ,(从左往右检索),若为 负数,则从右往左检索
–num2 :str2 在 str1 出现的第 num2 次的位置,默认为 1 - substr函数格式 (俗称:字符截取函数)
格式1: substr(string string, int a, int b);
格式2:substr(string string, int a) ;
解析:
格式1:
1、string 需要截取的字符串
2、a 截取字符串的开始位置(注:当a等于0或1时,都是从第一位开始截取)
3、b 要截取的字符串的长度 - REGEXP_REPLACE() 正则匹配替换
–将字符串中的字母删除
SELECT regexp_replace(‘01234abcde56789’,’[a-z]’,’…’) AS new_str FROM dual;
–等价于SELECT replace(‘01234abcde56789’,‘abcde’,’…’) AS new_str FROM dual;
————————————————————————————
- 解析HTML中的文本道理同上 可以使用正则表达式进行精准匹配从而获取到需要的字段