学习笔记_SDTM(3). Assumptions for Domain Models

Mandy_Wang98

已于 2022-03-22 11:26:47 修改

阅读量733

点赞数

分类专栏： SDTM 文章标签：学习

于 2022-02-16 14:52:09 首次发布

原文链接：https://www.cdisc.org/standards/foundational/sdtmig/sdtmig-v3-4

版权

SDTM 专栏收录该内容

5 篇文章

订阅专栏

文章目录

1. General Domain Assumptions
2. General Variable Assumptions
3. Coding and Controlled Terminology Assumptions
4. Actual and Relative Time Assumptions
- 4.1 Formats for Date/Time Variables:
- - ISO 8601
5. Other Assumptions

1. General Domain Assumptions

1. 1 Order of the Variables

The order of variables in the Define-XML document must reflect the order of variables in the dataset. The order of variables in CDISC domain models has been chosen to facilitate the review of the models and application of the models.

Variables for the 3 general observation classes must be ordered with Identifiers -Topic - Qualifier - Timing variables.
Within each role, variables must be ordered following instruction.

1.2 SDTM Core Designations

Three categories are specified in the Core column in the domain models:

A Required variable is any variable that is basic to the identification of a data record (i.e., essential key variables and a topic variable) or is necessary to make the record meaningful. Required variables must always be included in the dataset and cannot be null for any record. 必须有非空值
An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Expected variables may contain some null values, but in most cases will not contain null values for every record. When the study does not include the data item for an expected variable, however, a null column must still be included in the dataset, and a comment must be included in the Define-XML document to state that the study does not include the data item.必须有，可为空值
A Permissible variable should be used in an SDTM dataset wherever appropriate. Although domain specification tables list only some of the identifier, timing, and general observation class variables listed in the SDTM, all are permissible unless specifically restricted. 若无特殊规定可以有

1.3 Additional Guidance on Dataset Naming

SDTM datasets are normally named to be consistent with the domain code.

e.g., the Demographics dataset (DM) is named dm.xpt.

Domain codes beginning with the letters X, Y, and Z have been reserved for the creation of custom domains (optional, and not required for custom domains).

1.4 Splitting Domains

Sponsors may choose to split a domain of topically related information into physically separate datasets.

A domain based on a general observation class may be split according to values in --CAT. When a domain is split on --CAT, --CAT must not be null.
The Findings About (FA) domain may alternatively be split based on the domain of the value in --OBJ.

The following rules must be adhered to when splitting a domain into separate datasets to ensure they can be appended back into 1 domain dataset:

The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not been split (e.g., QS, FA). 域名一致
All variables that require a domain prefix (e.g., --TESTCD, --LOC) must use the value of DOMAIN as the prefix value (e.g., QS, FA).使用domain的值作为前缀
–SEQ must be unique within USUBJID for all records across all the split datasets. 唯一–SEQ
When relationship datasets (e.g., SUPPxx, FAxx, CO, RELREC) relate back to split parent domains, IDVAR would generally be --SEQ. When IDVAR is a value other than --SEQ (e.g., --GRPID, --REFID, – SPID), care should be used to ensure that the parent records across the split datasets have unique values for the variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent records.
Permissible variables included in one split dataset need not be included in all split datasets.
For domains with 2-letter domain codes (i.e., other than SUPPxx and RELREC), split dataset names can be up to 4 characters in length. The 4-character dataset-name limitation allows the use of a Supplemental Qualifier dataset associated with the split dataset.

If splitting by --CAT, dataset names would be the domain name plus up to 2 additional characters (e.g., QS36 for SF-36).
If splitting Findings About by parent domain, then the dataset name would be the domain code, “FA”, plus the 2-character domain code for parent domain code (e.g., “FACM”).

Supplemental Qualifier datasets for split domains would also be split. The nomenclature would include the additional 1 to 2 characters used to identify the split dataset (e.g., SUPPQS36, SUPPFACM). The value of RDOMAIN in the SUPP-- datasets would be the 2-character domain code (e.g., QS, FA).
In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN may contain the 4-character dataset name, rather than the domain name “FA”.

1.5 Origin Metadata

1.5.1 Origin Metadata for Variables

The origin element in the Define-XML document file is used to indicate where the data originated. Its purpose is to unambiguously communicate to the reviewer the origin of the data source. data could be collected (on the CRF, from a vendor, or from a device), derived, or assigned.

CRF data should be traceable to an annotated CRF
Derived data should be traceable to some derivation algorithm

1.5.2 Origin Metadata for Records

A derived origin means that all values for that variable were derived, and
that collected on the CRF applies to all values as well. 一列数据来源应一致。
In some cases, both collected and derived values may be reported in the same field. For example, some records in a Findings dataset such as Questionnaires (QS) contain values collected from the CRF; other records may contain derived values, such as a total score. When both derived and collected values are reported in a variable, the origin is to be described using value-level metadata in the Define- XML document. 若不一致，在Define- XML中说明。

1.6 Assigning Natural Keys in the Metadata

A sponsor should include in the metadata the variables that contribute to the natural key for a domain. In a case where a dataset includes a mix of records with different natural keys, the natural key that provides the most granularity is the one that should be provided.

2. General Variable Assumptions

2.1 Variable-naming Conventions

SDTM variables are named according to a set of conventions, using fragment names.
Variables with names ending in “CD” are “short” versions of associated variables that do not include the “CD” suffix

e.g., --TESTCD is the short version of --TEST.

Values of –TESTCD must be limited to 8 characters and cannot start with a number, nor can they contain characters other than letters, numbers, or underscores.

Because QNAM serves the same purpose as --TESTCD within supplemental qualifier datasets, values of QNAM are subject to the same restrictions as values of --TESTCD.

Values of other “CD” variables are not subject to the same restrictions as --TESTCD:

ETCD (the companion to ELEMENT) and TSPARMCD (the companion to TSPARM) are limited to 8 characters and do not have the character restrictions that apply to --TESTCD.
ARMCD/ACTARMCD is limited to 20 characters and does not have the character restrictions that apply to --TESTCD.

Variable descriptive names (labels), up to 40 characters, should be provided as data variable labels for all variables.

2.2 Two-character Domain Identifier

In order to minimize the risk of difficulty when merging/joining domains for reporting purposes, the 2-character domain identifier is used as a prefix in most variable names.

Exceptions:

Required Identifiers (STUDYID, DOMAIN, USUBJID)
Commonly used grouping and merge keys (e.g., VISIT, VISITNUM, VISITDY)
All Demographics (DM) domain variables other than DMDTC and DMDY
All variables in RELREC and SUPPQUAL, and some variables in the Comments and Trial Design datasets

Required identifiers are not prefixed because they are usually used as keys when merging/joining observations. The --SEQ and the optional Identifiers --GRPID and --REFID are prefixed because they may be used as keys when relating observations across domains.

2.3 Use of “Subject” and USUBJID

"Subject" is used to generically refer to both patients and healthy volunteers in order to be consistent with the recommendation in FDA guidance.
To identify a subject uniquely across all studies for all applications or submissions involving the product, a unique identifier (USUBJID) should be assigned and included in all datasets.

The unique subject identifier (USUBJID) is required in all datasets containing subject-level data.
USUBJID values must be unique for each trial participant (subject) across all trials in the submission.
The same person who participates in multiple clinical trials (when this is known) must be assigned the same USUBJID value in all trials.

2.4 Variable Lengths

The maximum SAS v5 transport file character variable length of 200 characters should not be used unless necessary.
Sponsors should consider the nature of the data and apply reasonable, appropriate lengths to variables.

For example:

The length of flags will always be 1.
–TESTCD and IDVAR will never be more than 8, so the length can always be set to 8.
The length for variables that use controlled terminology can be set to the length of the longest term.