OncoKB 官网:OncoKB™ - MSK's Precision Oncology Knowledge Base
一、思路:
1、首先我们要获取genes的列表
2、然后以genes列表的每个gene获取对应的突变信息和药物信息
例:以EGFR为例
gene详情
突变信息
具体突变的用药信息
FDA认证数据
这几大类就是我们要获取的重要信息
下来我们开始具体的操作:
第一步获取oncoKB现有gene列表
进入oncoKB主页,点击genes(现在更新有905个gene)
点击
进入到gene列表页面(第一种直接下载列表cancerGeneList.tsv )
第二种(打开f12开发者工具获取api接口数据)
详细代码获取api接口
String url = "https://www.oncokb.org/api/v1/utils/cancerGeneList";
//gene列表list
int count = 0;
List<String> geneList = new ArrayList<>();
JsonNode geneNode = getHttpJsonNode(url);
for (JsonNode node : geneNode) {
String hugoSymbol = node.get("hugoSymbol").asText();
geneList.add(hugoSymbol);
}
这是一个通用的HTTP调用工具代码 后面好多地方都会用到
public static JsonNode getHttpJsonNode(String url) throws IOException {
try (CloseableHttpClient httpClient = HttpClients.createDefault()) {
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("Content-Type", "application/json");
httpGet.setHeader("Authorization", "Basic " +
httpGet.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7");
httpGet.setHeader("Accept-Encoding", "gzip, deflate, br");
httpGet.setHeader("Accept-Language", "zh-CN,zh;q=0.9");
httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36");
HttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
String responseBody = EntityUtils.toString(entity);
// 创建 ObjectMapper 对象
ObjectMapper objectMapper = new ObjectMapper();
// 使用 ObjectMapper 将 JSON 字符串转换为 JsonNode 对象
JsonNode jsonNode = objectMapper.readTree(responseBody);
return jsonNode;
}
}
获取gene详情数据
详细代码(通用的http请求代码上面已经说过了)
String url1 = "https://www.oncokb.org/api/private/search/variants/biological?hugoSymbol=#{gene}";
//gene列表list
List<String> geneList = new ArrayList<>();
for (String gene : geneList) {
String url11 = url1.replace("#{gene}",gene);
JsonNode annotNode = getHttpJsonNode(url11);
for (JsonNode jsonNode : annotNode) {
JsonNode variant = jsonNode.get("variant");
String alteration = variant.get("alteration").asText();
String oncogenic = jsonNode.get("oncogenic").asText();
String mutationEffect = jsonNode.get("mutationEffect").asText();
String mutationEffectDescription = jsonNode.get("mutationEffectDescription").asText();
}
}
获取具体药物数据
详细代码
String url1 = "https://www.oncokb.org/api/private/search/variants/clinical?hugoSymbol=#{gene}";
//gene列表list
List<String> geneList = new ArrayList<>();
for (String gene : geneList) {
String url11 = url1.replace("#{gene}",gene);
JsonNode therapeuticNode = getHttpJsonNode(url11);
for (JsonNode jsonNode : therapeuticNode) {
String level = jsonNode.get("level").asText();
JsonNode variant = jsonNode.get("variant");
String alteration = variant.get("name").asText();
String levelAssociaCancerType = jsonNode.get("cancerTypes").get(0).get("subtype").asText();
JsonNode drugsnode = jsonNode.get("drug");
String drugs = "";
for (int i = 0; i < drugsnode.size(); i++) {
String drug = drugsnode.get(i).asText();
drugs += drug;
if (i != drugsnode.size() - 1){
drugs += " + ";
}
}
String drugDescription = jsonNode.get("drugDescription").asText();
}
}
FDA也包含在这个api中 由fdaLevel来区分
在往里面更深层次的也是以此类推
爬虫的详细描述就结束了!!!!
附结果图:
可以按照这个表结构来处理
今天元宵节 祝大家节日快乐吧!