Free Medical Data


Free Medical Data


Mar 14, 2016


A lot has been made over “free” software—what it is, how it’s different from “open source” software, the merits of copyleft v.s. non-copyleft free software, and so on.

关于软件免费到底是什么,如何区分它与开源软件有什么不同,关于copyleft v.s,非copyleft免费软件的有点等等,人们已经做出了很多研究。

One issue that has come to my attention recently is that many medical data sets are proprietary, and this leads to worse patient treatment options.


Here’s an example. Let’s say you have some sort of cancer, and there are several treatment options available (e.g. radiation therapy, chemotherapy, surgery) to try to treat the cancer. There is something called a “nomogram” where doctors take a bunch of other historical (anonymized) cases with their pre-surgery data points, the surgery option chosen, and the outcome. Based on these numbers they give you an answer like “option X has A% chance of curing you”, “option Y has B% chance of curing you”, etc. Here’s a concrete example. Let’s say you have prostate cancer which has been confirmed by measuring your blood PSA levels and have the cancer has been confirmed by a prostate biopsy test. Based on these factors, and any other relevant factors (age, weight, etc.), they’re able to create what is called a nomogram. The nomogram tells you for your specific numbers what they estimate you’ll be fully cured of prostate cancer (measured after 5 years) in different situations, e.g. you chose radical prostatectomy as your treatment option instead of radiation therapy.


I’m not sure of the math behind this, but I believe they use some sort of clustering algorithm to find similar patients and calculate a score based on their treatment results and how similar you were to those patients.


This is really cool, and it lets doctors choose the best treatment option to patients based on statics of thousands of previous patients. In many cases there is some treatment option that is usually best, but under various special circumstances an alternative is better; this system lets the doctor really choose the best option. For instance, in my father’s case normally a radical prostatectomy would be the treatment option for prostate cancer, but based on his nomogram it was discovered that radiation therapy has a much better treatment rate.


Unfortunately, basically all of these nomogram databases are proprietary. The way it works is a hospital internally collects these numbers, and may share this data with other hospitals (I’m not sure under what IP terms). Then as a hospital you have to choose which nomogram database to use. Typically you’d be paying for such access, and the quality of the nomorgram data is based on how many data points are in that nomogram.


Fox Chase Cancer Center has a large online free nomogram database for various cancers. In addition to their own data, which is signficant, Fox Chase has a way for other hospitals to submit their own nomogram data, which increases the total information and helps doctors lead to more accurate predictions. I don’t know what the data licensing terms are; presumably you cannot directly download the Fox Chase cancer nomogram data. But at least you can use their online nomogram tools for free.

Fox Chase 癌症中心有一个针对各种癌症的大型在线免费列线图数据库,除了他们自己显著的数据,也有其他医院提交自己的列线图数据方式,这为总的信息增加了数据,并帮助医生更准确预测,我不知道数据许可条款是什么,大概你不能直接下载Fox Chase 癌症列线图数据,但是你至少可以免费试用他们的在线列线图工具。

There are a bunch of studies of techniques like this that you can find at the National Center for Biotechnology Information which is part of the NIH. For instance, here’s a study on intramedullary rods vs plate and screw fixation to fix humerus fractures;


However, there a number of problems with this:


  • While there may have been a few studies on the matter (there are a dozen articles or so on the plate fixation vs intrameduallary rod technique), the data isn’t tagged or aggregated openly in a free way that would allow one to try to build a nomogram based on the most amount of information possible
  • 虽然可能已经有一些关于这个问题的研究(有十几篇关于钢板固定和髓内棒技术的文章)但是这些数据并没有以一种免费的方式公开地标记或聚合,使人们能够建立一个基于尽可能多的信息的列线图。
  • The research articles that do exist tend to have small sample sets because they only include information from a single hospital or group of related hospitals
  • 确实存在的研究论文往往有小样本集,因为它们只包括来自单个医院或相关医院集团的信息。
  • The intellectual property that hospitals have by collecting your health records has some amount of intellectual property value, and there’s no monetary reason for them to share it for free
  • 医院通过收集你的健康记录所拥有的知识产权具有一定的知识产权价值,而且他们没有金钱理由免费分享
  • Most of these research articles are published through a for-profit publisher like Elsevier which means that as a normal citizen, I cannot read the results of the study without paying the publisher a large fee for access to the article.这些研究文章大多是通这些研究文章大多是通过像爱思唯尔这样的营利出版商发表的,这意味着作为一个普通公民,如果不向出版商支付一大笔访问文章的费用,我就将无法阅读研究的结果。


The Department of Health has a lot of issues on its hand, but this is one that I think they should focus on seriously. Consider the following class of medical conditions:


  • There is some way to collect numerical pre-treatment data
  • 有一些方法可以收集数值预处理数据
  • There are multiple treatment options
  • 有多种治疗方案
  • Efficacy of the treatment can be evaluated somehow
  • 治疗的效果可以通过某种方式来评估

In every case I belive the NIH should build an open (anonymized) database about the pre-treatment data, treatment option chosen, and the efficacy of the treatment. In some cases (say, for very rare conditions) it may not be possible to do this while observing privacy concerns, but surely we can come to a common ground where we take common medical problems (many forms of cancer, bone fractures, etc.) and then use these databases to make medical treatment decisions.


Hospitals can be made to submit such data to the NIH as a result of these treatments (in fact, I wouldn’t be surprised if they already do). The NIH can enforce this by making this type of data-sharing contingent of funding to the hospitals from the NIH.


I sincerely hope that an effort like this happens in the future. It could save millions of lives, save people from unnecessary pain, and I think frames the current hot-button topic debate of “intellectual property” in a good and reasonable way.


1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 、4下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。、可私 6信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 、4下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。、可 6私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 、4下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。、可私 6信博主看论文后选择购买源代码。




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


