Need help on a fuzzy match problem

西门的贱

于 2014-05-19 16:03:35 发布

阅读量469

点赞数

分类专栏：转载分享文章标签： match

转载分享专栏收录该内容

10 篇文章 0 订阅

订阅专栏

A fuzzy match is request to perform a provider match while loading extract file from clients, the code is easy but we met an efficiency problem.

On the extract file, each the provider has a not null account_no as their identity; also, we have a column pr_account_no ( with index ) in our production.

The problem is, both of them are varchar2 data type and have both leading '0's, plus, the extract file may have trailing '0's too. so all the following account_no in extract file should be matched  with the provider '00012345' in our system:

'001234500', '12345', '000012345', '1234500000', etc.

The current code is like this:

with
  providers as (
   select 1 id, '0012345' pr_account_no from dual union all
select 2,    '0023450' from dual
),
clients as (
   select '001234500' client_account_no from dual union all   -- will match to provider 1
  select '000012345'    from dual union all -- will match to provider 1
  select '1234500000' from dual union all -- will match to provider 1
  select '00234500' from dual union all -- will match to provider 2
  select '23450'    from dual union all   -- will match to provider 2
  select '000023450'    from dual union all -- will match to provider 2
  select '00002345' from dual union all -- will NOT match to provider 2
  select '12345'    from dual union all -- will match to provider 1
  select '2345'       from dual           --  will NOT match to provider 2
)
s ele ct
providers.id, providers.pr_account_no , clients.client_account_no
from   providers, clients
where  regexp_like
   ( clients.client_account_no,
      '0*' || trim ( leading '0' from providers.pr_account_no )|| '0*'
   );

As of the efficiency issue,  we have  574 K rows in provider table, and 74K rows in extract file on a daily basis which is processed row by row.

About 5-7 minutes only on this matching procedure in 11g 2, which is not acceptable according to the business request. Any one would have some hints to improve it? Appreciated in advance.

西门的贱

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Need help on a fuzzy match problem

A fuzzy match is request to perform a provider match while loading extract file from clients, the code is easy but we met an efficiency problem.On the extract file, each the provider has a not nul
复制链接

扫一扫