SDTM--study data tabulation model implementation guide:human clinical trails(临床研究数据模型实施指南)3.4

Fundamental of the SDTM(SDTM原理)

observations and variables(观察结果与变量)

  • SDTM build by observations and variables, each observation can be describe by series of variables,each variable can be classified according to type of information conveyed by the variable about each distinct observation.(SDTM是由观察结果与变量构成,每一个观察结果可以通过一系列结果来描述,每一个变量可以通过每个清晰的观察值所传递的不同的信息进行分类)
  • variables can by classified into 5 major roles
    • identifier variables, identify the study、subject、domain、and sequence number of the record
    • 识别类的variables,例如studyid subject id 这些数字记录
    • topic variables,which specify the focus of the observation,eg the name of a lab test
    • 话题variables,观察的重点
    • timing variables,which describe the timing of the observation,eg start date and end date
    • 时间variables,描述观察结果的时间点
    • qualifier variables,which include additional illustrative text or numeric values that describe the result or additional traits of the observation(units,descripive adj)
    • 修饰语variables,进一步描述结果的记录,使用文字或数字
    • rule variables.which describe the condition to start,end,branch,or loop in the trial design model.
    • 规则variables,去表述开始、结束、循环、分流等方法
  • qualifier variables can be further categorized into 5 subclasses
    • grouping qualifiers-对同一域中的数据进行分组
    • result qualifiers-in finding datasets,在发现数据集中,用来描述结果的特定结果,回答了关于topic variable 所表达的问题
    • synonym qualifiers-同义词修饰语,指定了观测记录中某一特定变量的其他可用名称
    • record qualifiers-记录水平定义某一观测的附加属性
    • variable qualifiers -进一步修饰和描述某一观测的特定变量,只能结合他的所修饰的变量使用才有意义(进一步的细分result qualifier)

datasets and domain(数据集和域)

  • domain(域)
    • defined as a collection of logically related obsevations with a common topic.(共同主题并且逻辑相关的观测结果的合集)
    • each domain is represented by single dataset(每一个域都有相对应的数据集)
    • each domain-dataset is distinguished by a unique,2-character code(每个域都是由两个英文字母组成的代码进行区分)

the general observation class通用观测数据类别)

  • subject-level observation collected
  • 3 SDTM general observation classes
    • intervention class
      • investigational/ 调查研究
      • therapeutic/治疗
      • other treatments/其他治疗
      • events class
        • protocol milestone
      • finding class
        • observation resulting from planned evaluation to address specific tests or questions/针对特定问题测试计划的评估结果

datasets other than general observation class domains(其他通用观测数据类别)

  • special- purpose:domain datasets with subject-level data that do not conform to 1 of the 3 general observation classe
  • trial design model(TDM)datasets
  • relationship datasets
  • study reference datasets include device identifiers(DI) and non-host organism identifiers(OI)

the SDTM standard domain models(SDTM 标准域模型)

  • general rules apply when determining which variables to include in domain
    • the identifier variables,required in all domains based on the general observation classes基于通用观测数据类别的所有域中是必须的
    • any timing variables are permissible for use in any submission dataset based on a general observation calss相关时间变量可被允许使用在基于通用观测数据类别的域数据集中
    • any additional qualifier variables from the same general observation class may be added to a domain model except where restricted by specific domain assumptions除特定域特定限制外,同一观测数据类别的任意修饰变量可以被添加到该域模型中
    • sponsors may not add any variables other than those described in the preceding 3 bullets 不可以添加的三种
    • standard variables must not be renamed or modified for novel usage.标准变量不准被重新命名
    • a permissible variables should be used in an SDTM datasets wherever appropriate.允许变量在sdtm中使用

creating a new domain(创建一个自定义的域)

  • must based on 1 of 3 sdtm general observation classes
    • custom domain 被创建在现有标准域不能满足需求且数据有本质的不同不能应用于现有标准域中时候
      • 建立一个共同主题的域
      • 不要基于时间将域拆分
      • 不能基于采集的数据如何使用来创建自定义新域
      • CRF单独模块或页采集的数据可以对应一个现有的域
      • 有必要反应等级数据关系
    • check the SDTM draft domains area of the cdisc wiki for proposed domain developed since the last published version of the SDTMIG 看一下最新的ig里面有没有这些变量
    • look for an existing,relevant domain model to serve as a prototype 寻找一个现有的去做参考

submitting data in standard format(提交标准格式的数据)

standard metadata for dataset contents and attibutes(标准元数据以及数据集内容)

metadata attributes should be include in a define-xml document, in addition,the cdisc domain model incude 2 shaded columns that are not sent to the FDA, but which assist sponsors in preparing their datasets 2个灰色columns 不用提交给FDA,元数据属性是最常用数据域的标准描述,一般存在于define.xml文件中

useing the cdisc domain models in regulatory submissions - dataset matedata 用CDISC 域模型递交数据集元数据

  • the define-xml document that accompanies a submission should also describe each sataset that is inculded in the submission and describe the natural key structure of each dataset。在申报的时候与数据一起提交的define-xml文档,描述了每个数据集的基本属性以及其关键结构

dataset-level metadata

include dataset 、description 、class 、structure、 purpose 、keys and location

primary keys

a sponsor might submit as variables that comprise the primary key for SDTM datasets 显示在表中key栏里面 哪些名字是作为那个dataset 里面的primary keys

  • natural key is a set of data that uniquely indentifies that entity and distinguishes it from any other row in the table 自然主键已经存在,用来唯一标识每一个实体数据行的某一行或多列
  • surrogate key is a single-part,artificially established identifier for a record代替主键,用于记录,单方面人为设置的记录标识符,通过衍生数据而来
cdisc submission value-level metadata


conformance 一致性

  • 遵守数据域完整的元数据结构
  • 尽可能遵守SDTMIG于模型
  • 尽可能遵守使用SDTM 确定的标准域名称和前缀
  • 使用SDTM 标准变量名称
  • 在所有标准域中使用SDTM 规定的变量标签
  • 对所有变量使用SDTM 规定的数据类型
  • 受控术语及或有关的变量数量
  • 所有标准域中,包括所有必须变量与预期变量,并确保必须变量不为空
  • 确保数据集中的每条记录包含有适合的标识变量。时间变量以及主题变量
  • 遵守cdisc注释列

assumption for domain model 域模型假设

general domain assumptions 基本域假设

SDTM core designations(核心变量)

three categories are specified in the core column in the domain models;

  • required variable,必须变量用于识别数据记录最基础的变量,或必要变量,必须存在于数据集中
  • expected variable,期望变量在某个域中此变量是必须的
  • permissible variable,许可变量,当该变量对应的信息是收集或衍生的,可根据实际情况运用

splitting domain

may choose to split a domain of topically related information into physically separate datasets,根据信息的相关性将某些域拆分成多个独立的数据集
splitting domain followed two ways

  1. a domain based on a general observation class may be split according to values in – CAT,when a domain is split on --CAT, --CATmust not be null(基于通用观测结果类别的域,可根据变量–cat的值进行拆分,变量不能为空)
  2. the funding about domain may altematively be split based on the domain of the value in --obj
    some rule must be adhered to when splitting a domain into separated datasets to ensure they can be appended back in to 1 domain dataset(拆分规则,确保可以拆分以及重新合并)
  • 在拆分的数据集中,domain必须保持一致
  • . 在拆分之前确保他们肯定是唯一一个值,在不同数据集中domain必须是一致的
  • 在所有需要域浅醉的变量,必须使用domain的值作为前缀
  • 再被拆分的不同数据集集中,同一个同一个usubjid中的变量—seq的值必须是不同的,例如 if there are 1000 records for usubjid across the separated,all 1000 records need unique values for --seq,1000条在usubjid 被拆分,那么所有的1000个记录都有自己的单独的unique --seq value
  • 当关联数据集从原本的数据集连接的时候,idvar 通常被称为–seq,当不是seq的时候,要确保主记录里面的变量值是不同的,这样使子记录可以正确的回溯主记录
  • 许可变量可能只存在于某一个拆分的数据集中,而不是所有的
  • 拆分的数据集名称长度最多为4位英文字符
  • suppdatasets 也可以被拆分
  • in relrec,如果定义的数据集水平关系是存在于finding about domain里面且需要被拆分,那么变量rdomain 可能包含4位英文字符的数据集名称
  • 根据拆分指南进行拆分(IG上面有地址)

assigning natural keys in the metadata(在元数据中自然分配主键)

how to define natural keys

  • musculoskeletal system finding(MK)
  • 1
  • 7
