A fuzzy match is request to perform a provider match while loading extract file from clients, the code is easy but we met an efficiency problem.
On the extract file, each the provider has a not null account_no as their identity; also, we have a column pr_account_no ( with index ) in our production.
The problem is, both of them are varchar2 data type and have both leading '0's, plus, the extract file may have trailing '0's too. so all the following account_no in extract file should be matched with the provider '00012345' in our system:
'001234500', '12345', '000012345', '1234500000', etc.
The current code is like this:
with
providers as (
select 1 id, '0012345' pr_account_no from dual union all
select 2, '0023450' from dual
),
clients as (
select '001234500' client_account_no from dual union all -- will match to provider 1
select '000012345' from dual union all -- will match to provider 1
select '1234500000' from dual union all -- will match to provider 1
select '00234500' from dual union all -- will match to provider 2
select '23450' from dual union all -- will match to provider 2
select '000023450' from dual union all -- will match to provider 2
select '00002345' from dual union all -- will NOT match to provider 2
select '12345' from dual union all -- will match to provider 1
select '2345' from dual -- will NOT match to provider 2
)
s ele ct
providers.id, providers.pr_account_no , clients.client_account_no
from providers, clients
where regexp_like
( clients.client_account_no,
'0*' || trim ( leading '0' from providers.pr_account_no )|| '0*'
);
As of the efficiency issue, we have 574 K rows in provider table, and 74K rows in extract file on a daily basis which is processed row by row.
About 5-7 minutes only on this matching procedure in 11g 2, which is not acceptable according to the business request. Any one would have some hints to improve it? Appreciated in advance.
On the extract file, each the provider has a not null account_no as their identity; also, we have a column pr_account_no ( with index ) in our production.
The problem is, both of them are varchar2 data type and have both leading '0's, plus, the extract file may have trailing '0's too. so all the following account_no in extract file should be matched with the provider '00012345' in our system:
'001234500', '12345', '000012345', '1234500000', etc.
The current code is like this:
with
providers as (
select 1 id, '0012345' pr_account_no from dual union all
select 2, '0023450' from dual
),
clients as (
select '001234500' client_account_no from dual union all -- will match to provider 1
select '000012345' from dual union all -- will match to provider 1
select '1234500000' from dual union all -- will match to provider 1
select '00234500' from dual union all -- will match to provider 2
select '23450' from dual union all -- will match to provider 2
select '000023450' from dual union all -- will match to provider 2
select '00002345' from dual union all -- will NOT match to provider 2
select '12345' from dual union all -- will match to provider 1
select '2345' from dual -- will NOT match to provider 2
)
s ele ct
providers.id, providers.pr_account_no , clients.client_account_no
from providers, clients
where regexp_like
( clients.client_account_no,
'0*' || trim ( leading '0' from providers.pr_account_no )|| '0*'
);
As of the efficiency issue, we have 574 K rows in provider table, and 74K rows in extract file on a daily basis which is processed row by row.
About 5-7 minutes only on this matching procedure in 11g 2, which is not acceptable according to the business request. Any one would have some hints to improve it? Appreciated in advance.