I am working on a simple script to read from one database (oracle) and
write to another (postgresql). I retrieve the data from oracle in
chunks and drop the data to postgresql continuously. The author of one
of the python database clients mentioned that using one thread to
retrieve the data from the oracle database and another to insert the
data into postgresql with something like a pipe between the two threads
might make sense, keeping both IO streams busy. Any hints on how to
get started?
Thanks,
Sean
解决方案Sean Davis wrote:
The author of one of the python database clients mentioned that
using one thread to retrieve the data from the oracle database and
another to insert the data into postgresql with something like a
pipe between the two threads might make sense, keeping both IO
streams busy.
IMHO he''s wrong. Network interaction is quite slow compared with CPU
performance, so there''s no gain (maybe even overhead due to thread
management and locking stuff). That''s true even on multiprocessor
machines, not only because there''s almost nothing to compute but
only IO traffic. CMIIW.
Using multiplexing, you''ll get good results with simple code without
the danger of deadlocks. Have a look at asyncore (standard library)
or the Twisted framework -- personally, I prefer the latter.
Regards,
Bj?rn
--
BOFH excuse #194:
We only support a 1200 bps connection.
Bjoern Schliessmann wrote:
Sean Davis wrote:
>The author of one of the python database clients mentioned that
using one thread to retrieve the data from the oracle database and
another to insert the data into postgresql with something like a
pipe between the two threads might make sense, keeping both IO
streams busy.
IMHO he''s wrong. Network interaction is quite slow compared with CPU
performance, so there''s no gain (maybe even overhead due to thread
management and locking stuff). That''s true even on multiprocessor
machines, not only because there''s almost nothing to compute but
only IO traffic. CMIIW.
Using multiplexing, you''ll get good results with simple code without
the danger of deadlocks. Have a look at asyncore (standard library)
or the Twisted framework -- personally, I prefer the latter.
Regards,
Bj??rn
Sean you can''t win - everyone has a different idea! You need to explain
that oracle has millions of records and it''s possible to a pipe open to
feed the Postgres side.
One thing I didn''t get - is this a one time transfer or something that is
going to happen often.
One time transfer live to the time issue.
Johnf
Bjoern Schliessmann a écrit :
Sean Davis wrote:
>The author of one of the python database clients mentioned that
using one thread to retrieve the data from the oracle database and
another to insert the data into postgresql with something like a
pipe between the two threads might make sense, keeping both IO
streams busy.
IMHO he''s wrong. Network interaction is quite slow compared with CPU
performance, so there''s no gain (maybe even overhead due to thread
management and locking stuff). That''s true even on multiprocessor
machines, not only because there''s almost nothing to compute but
only IO traffic. CMIIW.
Not so sure, there is low CPU in the Python script, but there may be
CPU+disk activity on the database sides [with cache management and other
optimizations on disk access].
So, with a reader thread and a writer thread, he can have a select on a
database performed in parallel with an insert on the other database.
After, he must know if the two databases use same disks, same
controller, same host... or not.
But, if its only a do-once job, maybe the optimization is net really
necessary.