

Bioinformaticians provide irreplaceable expertise across the field of biology. It’s no surprise that universities offer more and more opportunities for students to learn these techniques. While university classes and online courses provide a much-needed background, they might not supply you with sufficient practice. After all, if you’re looking to establish yourself as a bioinformatician, experience with real datasets is invaluable.

乙 ioinformaticians提供跨越生物学领域不可替代的专业知识。 毫不奇怪,大学为学生提供了越来越多的机会学习这些技术。 虽然大学课程和在线课程提供了急需的背景知识,但它们可能无法为您提供足够的练习。 毕竟,如果您希望将自己建立为生物信息学家,那么拥有真实数据集的经验将是无价之宝。

With many new technological breakthroughs, the cost of many new technologies has declined. Consequently, scientists became inundated with gigabytes of biological data. Everything from EEG brain signals to genomic information needs to be processed and analyzed. There is a clear need for experts to understand and process this data. If you’re starting on your journey to bioinformatics, this guide will provide you with a few resources with large datasets to develop your skillset.

随着许多新技术的突破,许多新技术的成本下降了。 结果,科学家被千兆字节的生物数据所淹没。 从脑电图脑信号到基因组信息的一切都需要进行处理和分析。 显然,专家需要理解和处理这些数据。 如果您开始生物信息学之旅,本指南将为您提供一些资源,其中包含大型数据集以发展您的技能。

If you’re looking for tutorials to learn the basics and the skills needed for this data, I recommend these resources:


  1. Processing Brain Signal Data: Makoto’s Preprocessing Pipeline

    处理脑信号数据: Makoto的预处理管道

  2. RNA-Sequencing Tutorial


  3. Microbiome 16S Tutorial

    Microbiome 16S教程

  4. Microbiome WGS Resources


  5. Online Server for Processing Large Datasets: Galaxy


开放式神经:大脑记录 (Open Neuro: Brain Recordings)

Remember Elon Musk’s recent update on his brain-computer interface company, Neuralink? Researchers in many sub-fields of neuroscience use similar, albeit less high-throughput techniques to measure brain activity. It is especially important for studies looking to understand how the brain responds to different stimuli and even for seizure prediction.

还记得埃隆·马斯克(Elon Musk)对他的脑机接口公司Neuralink的最新更新吗? 神经科学许多子领域的研究人员都使用了相似的方法,尽管高通量的技术较少,只能用来测量大脑活动。 对于希望了解大脑对不同刺激的React乃至癫痫发作预测的研究而言,这一点尤其重要。

If you’re interested in learning how to decode these signals as well as develop predictive models, you will find plenty of datasets here. You will find links to electroencephalogram (EEG), magnetoencephalogram (MEG), electrocorticography (ECoG) and intracranial EEG (iEEG).

如果您对学习如何解码这些信号以及开发预测模型感兴趣,则可以在此处找到大量数据集。 您将找到脑电图(EEG),脑磁图(MEG),脑电图(ECoG)和颅内脑电图(iEEG)的链接。

The processing of these datasets commonly occurs in MATLAB and Python.


Image for post
Screenshot taken by author

RNA测序数据集 (RNA-Sequencing Datasets)

High-throughput RNA-sequencing has drastically dropped in cost as the technology has progressed. Consequently, there is a cornucopia of publicly accessible datasets, across many different sequencing platforms. I recommend focusing on the Illumina platform, as its the most common choice of many researchers today. You’ve likely come across different Python and R packages for processing RNA-seq data. Now it’s time to put those skills to the test, adding this skill to your portfolio.

随着技术的进步,高通量RNA测序的成本已大大降低。 因此,在许多不同的测序平台上都有可公开访问的数据集的聚宝盆。 我建议重点关注Illumina平台,它是当今许多研究人员最普遍的选择。 您可能遇到了用于处理RNA序列数据的不同Python和R软件包。 现在是时候对这些技能进行测试,并将其添加到您的投资组合中。

If you’re interested specifically in the brain, the Allen Brain Map initiative has several datasets available. Otherwise, you will find thousands of datasets across different organisms through the Gene Expression Omnibus DataSets repository.

如果您对大脑特别感兴趣,那么Allen Brain Map计划有几个可用的数据集。 否则,您将通过Gene Expression Omnibus DataSets存储库找到跨不同生物的数千个数据

Image for post
Screenshot by author | All this data is just waiting to be perused by you!
作者截图| 所有这些数据都在等待您的阅读!

微生物组序列数据集 (Microbiome Sequence Datasets)

With the emergence of RNA-seq technology came an increase in interest in the microbiome. There are many datasets in the Gene Expression Omnibus that measure the gastrointestinal, faecal, salivary or environmental microbiomes. There are also plenty of tools available as R packages that provide you with open datasets to practice. Check out these R packages:

随着RNA-seq技术的出现,人们对微生物组的兴趣也在增加。 基因表达综合中有许多数据集可测量胃肠道,粪便,唾液或环境微生物组。 作为R包,还有很多工具可以为您提供开放的数据集进行练习。 查看以下R软件包:

  1. curatedMetagenomicData


  2. HMP16SData: Human Microbiome Project 16S Data

    HMP16SData :人类微生物计划16S数据

  3. microbiome


单细胞RNA测序数据集 (Single-Cell RNA-Sequencing Datasets)

One of the most remarkable innovations in molecular transcriptomics is single-cell RNA sequencing. It lets us assess which genes are active in individual cells, allowing us to characterize and group them. Since this technique is relatively young, there are fewer datasets available for practice. Nonetheless, combined with available tutorials, they are sufficient for practicing data processing, analysis and visualization. These datasets are found here. The scRNAseq package in R also allows you to access this data.

分子转录组学中最杰出的创新之一是单细胞RNA测序 。 它可以让我们评估哪些基因在单个细胞中有活性,从而可以对其进行表征和分组。 由于这项技术还比较年轻,因此可用于实践的数据集较少。 但是,结合可用的教程,它们足以进行数据处理,分析和可视化。 这些数据集可在此处找到。 R中的scRNAseq软件包还允许您访问此数据。

What are you wait for? There are many free courses and tutorials available online to pair with these datasets! You can start today!

还等什么 在线有许多免费课程和教程可与这些数据集配对! 您可以从今天开始!

Happy data mining!


翻译自: https://towardsdatascience.com/finding-biological-datasets-to-inspire-your-next-bioinformatics-project-5c6c6c17b6d2






当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


