最近用python做网页的抓取,因为想得到JS解释后的HTML,先后尝试了selenium,windmill,htmlunit等web测试框架,因为只要得到html不需要界面展现,最后选择了htmlunit,而htmlunit只有JAVA的实现,所以考虑用RPC来进行python与JAVA的连接
最开始试用了一下ICE,JAVA端无问题,在用python做client的时候,发现ICE现在还不支持python2.7,放弃,再来看看thrift
下载地址http://thrift.apache.org/download/
先编写一个IDL接口定义
demo.thrift
namespace java service.demo
service Hello {
string helloString(1:string word)
}
再生成JAVA文件与python文件
thrift --gen java demo.thrift
thrift --gen py demo.thrift
编写JAVAServer
接口方法实现
package service.demo;
import org.apache.thrift.TException;
import service.demo.Hello.Iface;
public class HelloImpl implements Iface {
@Override
public String helloString(String word) throws TException {
System.out.println("get " + word);
return "hello " + word;
}
}
Server实现
package service.demo;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.protocol.TBinaryProtocol.Factory;
import org.apache.thrift.server.TServer;
import org.apache.thrift.server.TThreadPoolServer;
import org.apache.thrift.server.TThreadPoolServer.Args;
import org.apache.thrift.transport.TServerSocket;
import org.apache.thrift.transport.TTransportException;
import service.demo.Hello.Processor;
public class Server {
public void startServer() {
try {
TServerSocket serverTransport = new TServerSocket(1234);
Hello.Processor process = new Processor(new HelloImpl());
Factory portFactory = new TBinaryProtocol.Factory(true, true);
Args args = new Args(serverTransport);
args.processor(process);
args.protocolFactory(portFactory);
TServer server = new TThreadPoolServer(args);
server.serve();
} catch (TTransportException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
Server server = new Server();
server.startServer();
}
}
JAVAClient实现
package service.demo;
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import org.apache.thrift.transport.TTransportException;
public class Client {
public void startClient() {
TTransport transport;
try {
transport = new TSocket("localhost", 1234);
TProtocol protocol = new TBinaryProtocol(transport);
Hello.Client client = new Hello.Client(protocol);
transport.open();
System.out.println(client.helloString("panguso"));
transport.close();
} catch (TTransportException e) {
e.printStackTrace();
} catch (TException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
Client client = new Client();
client.startClient();
}
}
编写pythonClient
首先要安装一下thrift的python支持,在thrift-0.9.0\lib\py下执行python setup.py install,此处要注意的是如果在eclise下编写代码要在pvdev->interpreter-python->system pythonpath下加入C:\Python27\Lib\site-packages\thrift-0.9.0-py2.7.egg
pythonclient实现
from WebGetIce import Hello
from thrift.protocol import TBinaryProtocol
from thrift.transport import TSocket
# Talk to a server via TCP sockets, using a binary protocol
transport = TSocket.TSocket("localhost", 1234)
transport.open()
protocol = TBinaryProtocol.TBinaryProtocol(transport)
# Use the service we already defined
client = Hello.Client(protocol)
print client.helloString("python")
# Retrieve something as well