学习笔记_SDTM(3). Assumptions for Domain Models

1. General Domain Assumptions

1. 1 Order of the Variables

The order of variables in the Define-XML document must reflect the order of variables in the dataset. The order of variables in CDISC domain models has been chosen to facilitate the review of the models and application of the models.

  • Variables for the 3 general observation classes must be ordered with Identifiers -Topic - Qualifier - Timing variables.
  • Within each role, variables must be ordered following instruction.

1.2 SDTM Core Designations

Three categories are specified in the Core column in the domain models:

  • A Required variable is any variable that is basic to the identification of a data record (i.e., essential key variables and a topic variable) or is necessary to make the record meaningful. Required variables must always be included in the dataset and cannot be null for any record. 必须有非空值
  • An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Expected variables may contain some null values, but in most cases will not contain null values for every record. When the study does not include the data item for an expected variable, however, a null column must still be included in the dataset, and a comment must be included in the Define-XML document to state that the study does not include the data item.必须有,可为空值
  • A Permissible variable should be used in an SDTM dataset wherever appropriate. Although domain specification tables list only some of the identifier, timing, and general observation class variables listed in the SDTM, all are permissible unless specifically restricted. 若无特殊规定可以有

1.3 Additional Guidance on Dataset Naming

SDTM datasets are normally named to be consistent with the domain code.

e.g., the Demographics dataset (DM) is named dm.xpt.

Domain codes beginning with the letters X, Y, and Z have been reserved for the creation of custom domains (optional, and not required for custom domains).

1.4 Splitting Domains

Sponsors may choose to split a domain of topically related information into physically separate datasets.

  • A domain based on a general observation class may be split according to values in --CAT. When a domain is split on --CAT, --CAT must not be null.
  • The Findings About (FA) domain may alternatively be split based on the domain of the value in --OBJ.

The following rules must be adhered to when splitting a domain into separate datasets to ensure they can be appended back into 1 domain dataset:

  1. The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not been split (e.g., QS, FA). 域名一致
  2. All variables that require a domain prefix (e.g., --TESTCD, --LOC) must use the value of DOMAIN as the prefix value (e.g., QS, FA).使用domain的值作为前缀
  3. –SEQ must be unique within USUBJID for all records across all the split datasets. 唯一–SEQ
  4. When relationship datasets (e.g., SUPPxx, FAxx, CO, RELREC) relate back to split parent domains, IDVAR would generally be --SEQ. When IDVAR is a value other than --SEQ (e.g., --GRPID, --REFID, – SPID), care should be used to ensure that the parent records across the split datasets have unique values for the variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent records.
  5. Permissible variables included in one split dataset need not be included in all split datasets.
  6. For domains with 2-letter domain codes (i.e., other than SUPPxx and RELREC), split dataset names can be up to 4 characters in length. The 4-character dataset-name limitation allows the use of a Supplemental Qualifier dataset associated with the split dataset.

If splitting by --CAT, dataset names would be the domain name plus up to 2 additional characters (e.g., QS36 for SF-36).
If splitting Findings About by parent domain, then the dataset name would be the domain code, “FA”, plus the 2-character domain code for parent domain code (e.g., “FACM”).

  1. Supplemental Qualifier datasets for split domains would also be split. The nomenclature would include the additional 1 to 2 characters used to identify the split dataset (e.g., SUPPQS36, SUPPFACM). The value of RDOMAIN in the SUPP-- datasets would be the 2-character domain code (e.g., QS, FA).
  2. In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN may contain the 4-character dataset name, rather than the domain name “FA”.

1.5 Origin Metadata

1.5.1 Origin Metadata for Variables

The origin element in the Define-XML document file is used to indicate where the data originated. Its purpose is to unambiguously communicate to the reviewer the origin of the data source. data could be collected (on the CRF, from a vendor, or from a device), derived, or assigned.

  • CRF data should be traceable to an annotated CRF
  • Derived data should be traceable to some derivation algorithm

1.5.2 Origin Metadata for Records

A derived origin means that all values for that variable were derived, and
that collected on the CRF applies to all values as well. 一列数据来源应一致。
In some cases, both collected and derived values may be reported in the same field. For example, some records in a Findings dataset such as Questionnaires (QS) contain values collected from the CRF; other records may contain derived values, such as a total score. When both derived and collected values are reported in a variable, the origin is to be described using value-level metadata in the Define- XML document. 若不一致,在Define- XML中说明。

1.6 Assigning Natural Keys in the Metadata

A sponsor should include in the metadata the variables that contribute to the natural key for a domain. In a case where a dataset includes a mix of records with different natural keys, the natural key that provides the most granularity is the one that should be provided.

2. General Variable Assumptions

2.1 Variable-naming Conventions

SDTM variables are named according to a set of conventions, using fragment names.
Variables with names ending in “CD” are “short” versions of associated variables that do not include the “CD” suffix

e.g., --TESTCD is the short version of --TEST.

Values of –TESTCD must be limited to 8 characters and cannot start with a number, nor can they contain characters other than letters, numbers, or underscores.

Because QNAM serves the same purpose as --TESTCD within supplemental qualifier datasets, values of QNAM are subject to the same restrictions as values of --TESTCD.

Values of other “CD” variables are not subject to the same restrictions as --TESTCD:

  • ETCD (the companion to ELEMENT) and TSPARMCD (the companion to TSPARM) are limited to 8 characters and do not have the character restrictions that apply to --TESTCD.
  • ARMCD/ACTARMCD is limited to 20 characters and does not have the character restrictions that apply to --TESTCD.

Variable descriptive names (labels), up to 40 characters, should be provided as data variable labels for all variables.

2.2 Two-character Domain Identifier

In order to minimize the risk of difficulty when merging/joining domains for reporting purposes, the 2-character domain identifier is used as a prefix in most variable names.

Exceptions:

  • Required Identifiers (STUDYID, DOMAIN, USUBJID)
  • Commonly used grouping and merge keys (e.g., VISIT, VISITNUM, VISITDY)
  • All Demographics (DM) domain variables other than DMDTC and DMDY
  • All variables in RELREC and SUPPQUAL, and some variables in the Comments and Trial Design datasets

Required identifiers are not prefixed because they are usually used as keys when merging/joining observations. The --SEQ and the optional Identifiers --GRPID and --REFID are prefixed because they may be used as keys when relating observations across domains.

2.3 Use of “Subject” and USUBJID

"Subject" is used to generically refer to both patients and healthy volunteers in order to be consistent with the recommendation in FDA guidance.
To identify a subject uniquely across all studies for all applications or submissions involving the product, a unique identifier (USUBJID) should be assigned and included in all datasets.

  • The unique subject identifier (USUBJID) is required in all datasets containing subject-level data.
  • USUBJID values must be unique for each trial participant (subject) across all trials in the submission.
  • The same person who participates in multiple clinical trials (when this is known) must be assigned the same USUBJID value in all trials.

2.4 Variable Lengths

The maximum SAS v5 transport file character variable length of 200 characters should not be used unless necessary.
Sponsors should consider the nature of the data and apply reasonable, appropriate lengths to variables.

For example:

  • The length of flags will always be 1.
  • –TESTCD and IDVAR will never be more than 8, so the length can always be set to 8.
  • The length for variables that use controlled terminology can be set to the length of the longest term.

3. Coding and Controlled Terminology Assumptions

4. Actual and Relative Time Assumptions

4.1 Formats for Date/Time Variables:

ISO 8601

Spaces are not allowed in any ISO 8601 representations.

  • dates: YYYY-MM-DD.
  • times: hh:mm:ss(.n+)?(((+|-)hh:mm)|Z)?.
  • date & time: YYYY-MM-DDThh:mm:ss (e.g. 2001-12-26T00:00:01).

5. Other Assumptions

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值