【Multilevel networks】

参考文献

保命链接:机器学习(Machine Learning)&深度学习(Deep Learning)资料

眼睛一瞎,终于找到了:《A novel vSLAM framework with unsupervised semantic segmentation based on adversarial transfer learning》
百度nlp处理脑图
深度学习在EHR的应用
multilevel regression models是什么统计方法?
多水平模型(1)
《Enhanced bag of words using multilevel k-means for human activity recognition》
机器学习之路: tensorflow 一个最简单的神经网络
word2vec–fetch_20newsgroups数据集
这是一份「不正经」的深度学习简述
深度学习在自然语言处理的应用v0.76
word2vec 网课
聚类算法-练习

多任务学习

多任务学习 Multitask learning
该领域主要针对跨领域的有监督模型,同一个模型会有主要的任务与辅助的任务,其中辅助的任务可能与主要的任务为不同的领域,但是其中会有潜藏的关系可以相互的帮助提高预测能力
摘自:深度学习在EHR的应用

多水平统计分析模型

多水平统计分析模型(multilevelstatistical models),是用于处理多水平数据(multilevel data)的一类模型的总称。在不同书籍或文献中,由于因变量类型或研究目等的不同,常被冠以不同的名称,如随机效应模型、随机系数模型、混合效应模型、广义线性混合效应模型等。接下来我们通过一个实例来简单介绍多水平统计分析模型。

实例:某省调查农村居民卫生服务情况,随机从该省抽取30个乡镇,每个乡镇抽取2个行政村,每个村再随机抽取一定数量的家庭,对每个家庭15岁以上的常住人口均进行问卷调查。共调查30个乡镇,60个村,832家庭,2369名居民。问卷内容包括家庭的一般情况(如饮水卫生状况等)以及个体的一般社会人口学特征(性别、年龄等)。

(乡镇——行政村——家庭——居民)

现拟探讨该省农村居民卫生服务需求的影响因素,以个体两周患病(二分类)为因变量。(实例摘自杨珉和李晓松主编的《医学和公共卫生研究常用多水平统计模型》)

几个关键问题

1. 怎么理解“水平”和“多水平数据”?

上述实例中每个个体成员都属于某个家庭,每个家庭又属于某个村,每个村又属于某个乡镇,层层嵌套。这里我们就可以将个体作为一个水平,其处于最低水平,定义为水平1,家庭是比个体高一级的水平,定义为水平2,以此类推,村定义为水平3,乡镇定义为水平4。数字越大表示处于越高水平。

“多水平”,顾名思义,就是指有2个或以上的水平。“多水平”的结构特点就是较低水平的单位嵌套于较高水平的单位内。

在医学和公共卫生研究中,多水平结构广泛存在,自然而然使得我们收集到的数据呈现出多个水平。因此,简单的理解多水平数据就是具有多水平层次结构的数据。

我们需要注意的是在判断数据是否为多水平数据,以及具体的水平数时,除根据数据层次结构,还应结合研究目的、专业知识等灵活处理。例如简单是重复测量资料也可看作2水平数据,每次测量时间点均嵌套于某个个体,测点时间点为水平1,个体为较高的水平2。再如上面的例子由乡镇→村→家庭→个体成员,看似有4个水平,但如果研究目的不关注乡镇和村这个两个水平,也未收集这两个水平的相关数据,此时可考虑忽略这乡镇和村水平,将其看作两水平数据。

2. 多水平数据中的变量

在多水平数据分析中,因变量必须是在最低水平上测量的变量,而自变量可以是在研究者关注的各个低和高水平上测量的变量。

举例说明,如果我们将上述实例看作两水平数据,即家庭为水平2,个体为水平1(忽略村和乡镇水平)。因变量两周患病一定是在最低水平(个体水平1)测量的变量。而自变量既有水平2的家庭饮水卫生状况,也有水平1上个体的年龄、性别等。

我们常将高水平上的变量称作场景变量或组群变量。需注意的是实际研究中我们收集到的某些低水平上的变量也可被转化为高水平场景变量,例如将水平1上的变量年龄在家庭内求平均值,将性别用家庭中男性或女性构成比表示等,那么家庭成员的平均年龄和家庭成员的性别构成就成了场景变量。

3. 多水平数据的主要特征

多水平数据最主要的特征是反应变量的分布在个体间不具备独立性,存在地区或特定空间内的相似性,又称聚集性。举例说明,在上述实例中832个家庭的饮水卫生条件有好有坏。如果某个家庭饮水卫生条件较差,那么可能这个家庭中所有个体成员都比较容易患病,另外某个家庭饮水卫生条件较好,可能这个家庭中所有个体成员都不容易患病,因此同一家庭内的个体是否患病具有相似性,换句话说,2369名个体是否患病并非完全独立。

4. 忽略多水平结构的后果

我们知道统计分析方法都是建立在一定假设(前提)基础上的,上述实例中因变量为两周患病(二分类),分析其影响因素,最简单的办法是将家庭和个体变量均作为自变量一起纳入进行logistic回归分析。但是传统的回归模型的估计方法(如多重线性回归和logistic回归)是建立在个体测量值间相互独立的假设上,通过前面的介绍,我们知道本实例并不满足该独立性假设,如果盲目使用logistic回归会使参数标准误估计值及统计推断结论产生偏倚。最终可能得到与错误的结论。因此这里就需要考虑数据间的相关性,需要用到二分类资料的多水平模型。

5. 多水平模型有哪些优势?

A. 不需要建立在个体独立性的假设上,可修正因观测数据的非独立性引起的参数标准误估计的偏倚;

B. 可同时分析低水平和高水平自变量对结局的影响;

C. 还可分析随机斜率和跨水平交互作用等。

6. 哪些软件可以分析多水平数据?

MLwiN是用于多水平数据分析的专用软件,纯点击式,不需要自己编程,操作简单,能处理各类型的多水平数据。

SAS的PROC MIXED过程用于拟合含多水平结构的线性模型,而PROC GLMMIX和PROC NLMIXED过程拟合多水平结构的非线性模型(如多水平logistic回归模型,多水平泊松模型等)。

另SPSS、R、Stata、Mplus等软件也可实现不同程度的多水平数据分析。

7. 多水平模型的分析步骤

第一步:拟合零模型(又叫空模型、截距模型),即不含任何自变量的模型,这是多水平模型分析的基础,用于判断是否有必要考虑数据的多水平结构,只有通过零模型判断数据存在显著的相关性,多水平结构不能忽略,才有必要继续多水平分析,否则用常规的多因素分析方法即可。

第二步:筛选自变量,从逻辑上讲是先在空模型基础上引入高水平自变量,再引入低水平上的变量,但实际分析中我们常将各水平自变量一起引入。另外我们还应通过模型比较不断调整模型。

第三步:简单的多水平分析到第二步就结束,更进一步的研究还可考虑设定随机斜率、跨水平的交互作用等。(这部分相对较复杂,有兴趣的同学可参考王济川等主编的《多层统计分析模型》自学)。

结束语

多水平统计分析模型是一大类处理多水平数据的模型,而多水平数据又是一大类具有多水平结构的数据。数据是否存在多水平结构,这种多水平结构可否忽略,我们应根据专业知识,研究目的,零模型等综合判断,灵活处理。如果多水平结构不存在或可忽略,则可采用传统的多因素分析方法。但如果多水平结构不可忽略,我们就需要根据因变量类型及研究目等选择适当的多水平模型。
摘自: multilevel regression models是什么统计方法?

篇2:多水平模型(1)

多水平统计模型建模步骤:

STEP1:通过分层抽样获取样本数据(多水平数据);

STEP2:通过计算组内相关系数ICC,识别样本数据是否具有组群效应(是否存在多水平结构),以此判断是否使用多水平模型,若没有群组效应,可简化模型为固定效应模型;

STEP3:拟合零模型(又叫空模型、截距模型),即不含任何自变量的模型,这是多水平模型分析的基础,用于判断是否有必要考虑数据的多水平结构,只有通过零模型判断数据存在显著的相关性,多水平结构不能忽略,才有必要继续多水平分析,否则用常规的多因素分析方法即可。

STEP4:筛选自变量,从逻辑上讲是先在空模型基础上引入高水平自变量,再引入低水平上的变量,但实际分析中我们常将各水平自变量一起引入。另外我们还应通过模型比较不断调整模型。

STEP5:简单的多水平分析到第二步就结束,更进一步的研究还可考虑设定随机斜率、跨水平的交互作用等。(这部分相对较复杂,有兴趣的同学可参考王济川等主编的《多层统计分析模型》自学)。
摘自: 多水平模型(1)

无监督学习

可能需要用word2vec来进行词向量的划分
摘自:深度学习在自然语言处理的应用v0.76
资源:word2vec 网课

论文列表

《Multi-level cluster-based satellite-terrestrial integrated communication in Internet of vehicles》2020
《Multi-level Clustering for Extracting Process-Related Information from Email Logs》
《A Two-Level Topic Model Towards Knowledge Discovery from Citation Networks》
《Knowledge Discovery from Citation Networks》

------下面这些不一定有用------
《INTEGRATIVE SPARSE K-MEANS WITH OVERLAPPING GROUP LASSO IN GENOMIC APPLICATIONS FOR DISEASE SUBTYPE DISCOVERY》
《Scalable parallel computing on clouds using Twister4Azure iterative MapReduce》
《The differential impact of a classroom-based, alcohol harm reduction intervention, on adolescents with different alcohol use experiences: A multi-level growth modelling analysis》2014 可以学习这里的多级成长模型是怎么构建的
《EVALUATION OF MULTI-LEVEL CONTEXT-DEPENDENT ACOUSTIC MODEL FOR LARGE VOCABULARY SPEAKER ADAPTATION TASKS》 2012 涉及到语音知识,与主题不符


《Enhanced bag of words using multilevel k-means for human activity recognition》 2016

在这里插入图片描述 在这里插入图片描述在这里插入图片描述在这里插入图片描述
摘自:《Enhanced bag of words using multilevel k-means for human activity recognition》

《A novel vSLAM framework with unsupervised semantic segmentation based on adversarial transfer learning》2020

Abstract
Significant progress has been made in the field of visual Simultaneous Localization and Mapping (vSLAM) systems. However, the localization accuracy of vSLAM can be significantly reduced in dynamic applications with mobile robots or passengers. In this paper, a novel semantic SLAM framework in dynamic environments is proposed to improve the localization accuracy. We incorporate a semantic segmentation model into the Oriented FAST and Rotated BRIEF-SLAM2 (ORB-SLAM2) system to filter out dynamic feature points, but we encounter one main challenge, i.e. the performance of a segmentation network well-trained with labeled datasets may decrease seriously in a real application without any labeled data due to the inconsistency between the source domain and the target domain. Therefore, we proposed an unsupervised semantic segmentation model with a Residual Neural Network (ResNet) structure, which is trained by the adversarial transfer learning method in the multi-level feature spaces. This work may be the first to perform multi-level feature space adversarial transfer learning for the semantic SLAM task in dynamic environments. In order to evaluate our method, images of indoor scenes from three datasets are used as the source domain, and the dynamic sequences of the TUM dataset are used as the target domain. The extensive experimental results show favorable performance against the state-of-the-art methods in terms of the absolute trajectory accuracy and image semantic segmentation quality. © 2020 Elsevier B.V. All rights reserved.

《Multi-level Clustering for Extracting Process-Related Information from Email Logs》2017

Abstract
Emails represent a valuable source of information that can be harvested for understanding undocumented business processes of institutions. Towards this aim, a few researchers investigated the problem of extracting process oriented information from email logs to make benefit of the many available process mining techniques. In this work, we go further in this direction, by proposing a new method for mining process models from email logs that leverages unsupervised machine learning techniques. Moreover, our method allows to label emails with activity names, that can be used for activity recognition in new incoming emails. A use case illustrates the usefulness of the proposed solution.

《Multi-level Topical Text Categorization with Wikipedia》2016

** FTTC!!**

Abstract
This paper introduces an automatic categorical-marking model for text categorization. Traditional classification algorithms are generally applying labeled training set and call for a lot of manual work to tag classifications beforehand. Also due to the ambiguity and fuzziness of texts, the results of traditional text categorization algorithms may not be clear enough and abundant in content. ** This paper presents an unsupervised, training-set-free and hierarchical categorization model called Folk-Topical Text Categorization (FTTC) **. FTTC applies topic model to abstract documents to topical words and make use of Wikipedia’s crowd-sourcing and collective control to extend hierarchical classifications. The results are not restricted to predefined categories but contain categories abstracted to deeper semantic levels and greatly facilitate traditional text categorization applications. For a document, its topical words are obtained using a popular topic model called Latent Dirichlet Allocation (LDA). Afterwards, the topical words are used to build and trace through the category-trees of Wikipedia. Based on the filtered results, the final classifications comprehensively reflect the diversified and content-rich information of the text, and fully cover different aspects of the text. Experimental results on different kinds of datasets show that our model advances in classification accuracy, flexibility and intelligibility, as compared with traditional models.

《A Two-Level Topic Model Towards Knowledge Discovery from Citation Networks》2014

Abstract
Knowledge discovery from scientific articles has received increasing attention recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper, we propose a Bernoulli process topic (BPT) model which considers the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. An efficient computation algorithm is proposed to overcome the difficulty of matrix inverse operation. In addition to conducting the experimental evaluations on the document modeling and document clustering tasks, we also apply the BPT model to well known corpora to discover the latent topics, recommend important citations, detect the trends of various research areas in computer science between 1991 and 1998, and to investigate the interactions among the research areas. The comparisons against state-of-the-art methods demonstrate a very promising performance. The implementations and the data sets are available online [1].

《Timeline Generation: Tracking individuals on Twitter》2014

Abstract
In this paper, we preliminarily learn the problem of reconstructing users’ life history based on the their Twitter stream and proposed an unsupervised framework that create a chronological list for personal important events (PIE) of individuals. By analyzing individual tweet collections, we find that what are suitable for inclusion in the personal timeline should be tweets talking about personal (as opposed to public) and time-specific (as opposed to time-general) topics. To further extract these types of topics, we introduce a non-parametric multi-level Dirichlet Process model to recognize four types of tweets: personal time-specific (PersonTS), personal time-general (PersonTG), public time-specific (PublicTS) and public time-general (PublicTG) topics, which, in turn, are used for further personal event extraction and timeline generation. To the best of our knowledge, this is the first work focused on the generation of timeline for individuals from Twitter data. For evaluation, we have built gold standard timelines that contain PIE related events from 20 ordinary twitter users and 20 celebrities. Experimental results demonstrate that it is feasible to automatically extract chronological timelines for Twitter users from their tweet collection(1).

《Semi-supervised Abstraction-Augmented String Kernel for Multi-level Bio-Relation Extraction》2010

Abstract
io-relation extraction (bRE), an important goal in bio-text mining, involves subtasks identifying relationships between bio-entities in text at multiple levels, e.g., at the article, sentence or relation level. A key limitation of current bRE systems is that they are restricted by the availability of annotated corpora. In this work we introduce a semi-supervised approach that can tackle multi-level bRE via string comparisons with mismatches in the string kernel framework. Our string kernel implements an abstraction step, which groups similar words to generate more abstract entities, which can be learnt with unlabeled data. Specifically, two unsupervised models are proposed to capture contextual (local or global) semantic similarities between words from a large unannotated corpus. This Abstraction-augmented String Kernel (ASK) allows for better generalization of patterns learned from annotated data and provides a unified framework for solving bRE with multiple degrees of detail. ASK shows effective improvements over classic string kernels on four datasets and achieves state-of-the-art bRE performance without the need for complex linguistic features.

《Knowledge Discovery from Citation Networks》2010

Abstract
Knowledge discovery from scientific articles has received increasing attentions recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper we propose a Bernoulli Process Topic (BPT) model which models the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. In addition to conducting the experimental evaluations on the document modeling task, we also apply the BPT model to a well known scientific corpus to discover the latent topics. The comparisons against state-of-the-art methods demonstrate a very promising performance.

  • 看 multilevel 的基本概念
  • 可能需要学习网络的知识 看 multilevel network
  • 尝试根据人对数据进行归类(先尝试前5W)
  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
回归分析是一种常用的统计方法,可以用于研究因变量与一个或多个自变量之间的关系。在逻辑回归中,因变量是二分类变量,而自变量可以是连续变量或者分类变量。 引用提到了使用Bagging分类的逻辑回归、决策树和森林分析心脏病患者。Bagging是一种集成学习方法,通过对训练集进行有放回抽样,建立多个模型,再通过投票或平均的方式产生最终预测结果。逻辑回归是一种广泛应用的分类方法,可以用于预测心脏病患者的患病风险。 引用提到了关于数据筛选、多重回归和逻辑回归的更新。数据筛选是数据分析的重要步骤,用于检查和清理数据集中的错误或异常值。多重回归是一种统计方法,用于研究多个自变量与因变量之间的关系。逻辑回归在这个章节中也进行了更新,可能是为了反映最新的研究和应用。 引用提到了如何处理逻辑回归的残差图。残差图是用来评估模型拟合优度的工具,可以帮助我们了解模型的假设是否得到满足。在逻辑回归中,残差图可以用于检查模型对观察值的拟合情况,进而评估模型的准确性和可靠性。 综上所述,多级逻辑回归是一种用于研究多级数据的统计方法。它可以用于分析数据集中的多个层次,例如学生在学校中的表现与学生个人特征和学校特征之间的关系。这种方法可以帮助我们理解多级数据的结构和变化,从而更好地解释和预测因变量的变化。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值