本AdaMIG (v1.2)来自CDISC官网以下链接:
https://www.cdisc.org/standards/foundational/adam/adam-implementation-guide-v1-2-release-package
第4章(下)
4.5 Identification of Records Used for Analysis
4.5 识别用于分析的记录
This section addresses how to identify the records of an ADaM dataset that are used for analysis. The four specific issues addressed include: (1) identification of the records used in an LOCF analysis; (2) identification of the record containing the baseline value; (3) identification of post-baseline conceptual timepoint records, such as endpoint, minimum, maximum, or average; and (4) identification of specific records used in an analysis.
本节介绍如何识别用于分析的ADaM数据集的记录。解决的四个具体问题包括:(1)确定LOCF分析中使用的记录;(2)识别包含基线值的记录;(3)识别基线后的概念性时间点记录,例如端点,最小值,最大值或平均值;(4)确定分析中使用的特定记录。
4.5.1 Identification of Records Used in a Timepoint Imputation Analysis
4.5.1识别时间点归因分析中使用的记录
This section considers the issue of how to identify records used in a timepoint-related imputation analysis as well as how to represent data imputed for missing timepoints in an ADaM dataset. LOCF is one of the most commonly used timepoint-related imputation analyses, and is therefore specifically mentioned. However, the methodology is general and is not restricted to LOCF analysis. WOCF analysis is also mentioned to emphasize the generalizability.
本节考虑如何识别与时间点相关的插补分析中使用的记录,以及如何表示ADaM数据集中缺少时间点的插补数据的问题。LOCF是最常用的与时间点相关的归因分析之一,因此专门提及。但是,该方法是通用的,并不限于LOCF分析。还提到了WOCF分析以强调可概括性。
ADaM Methodology
ADaM方法论
When an analysis timepoint is missing, the ADaM methodology is to create a new record in the ADaM dataset to represent the missing timepoint and identify these imputed records by populating the derivation type variable DTYPE.
当缺少分析时间点时,ADaM方法将在ADaM数据集中创建一个新记录来表示缺少的时间点,并通过填充派生类型变量DTYPE来识别这些估算记录。
For example, when an LOCF/WOCF analysis is being performed, create LOCF/WOCF records when the LOCF/WOCF analysis timepoints are missing, and identify these imputed records by populating the derivation type variable DTYPE with values LOCF or WOCF. All of the original records would have null values in DTYPE. It would be very simple to select the appropriate records for analysis by selecting DTYPE=null for Data as Observed (DAO) analysis, DTYPE=null or LOCF for LOCF analysis, and DTYPE=null or WOCF for WOCF analysis. This approach would require understanding and communicating that if the DTYPE flag were not referenced correctly, the analysis would default to using all records, including the DAO records, plus the records derived by LOCF and WOCF. To perform a correct DAO analysis, one would need to explicitly select DTYPE=null.
例如,当执行LOCF / WOCF分析时,在缺少LOCF / WOCF分析时间点时创建LOCF / WOCF记录,并通过使用值LOCF或WOCF填充派生类型变量DTYPE来识别这些估算记录。所有原始记录在DTYPE中都将具有空值。选择适当的记录进行分析非常简单,方法是选择DTYPE = null进行“按观测的数据”(DAO)分析,选择DTYPE = null或LOCF进行LOCF分析,而选择DTYPE = null或WOCF进行WOCF分析。这种方法需要理解和传达,如果未正确引用DTYPE标志,则分析将默认使用所有记录,包括DAO记录以及LOCF和WOCF派生的记录。为了执行正确的DAO分析,需要明确选择DTYPE = null。
Example 1
例子1
Identification of rows used in a LOCF analysis. 标识LOCF分析中使用的行。
In Table 4.5.1.1, some subjects have complete data and others have rows imputed by one method (LOCF). Subjects with no missing data have the same number of rows as in the input dataset, with all DTYPE values blank. Subject 1001 has complete data. DTYPE is blank for all rows indicating they are not imputed. AVISIT matches VISIT (from SDTM) in this example. AVISIT does not always match VISIT from SDTM even in scenarios where there is no missing data. Subject 1002 is missing the Week 2 assessment. Week 2 is imputed using the LOCF method.
在表4.5.1.1中,一些主题具有完整的数据,而另一些主题具有通过一种方法(LOCF)估算的行。没有缺失数据的主题的行数与输入数据集中的行数相同,所有DTYPE值均为空白。主题1001具有完整的数据。对于所有行,DTYPE均为空白,表示未估算。在此示例中,AVISIT与VISIT(来自SDTM)匹配。即使在没有丢失数据的情况下,AVISIT也不总是与SDTM的VISIT匹配。受试者1002缺少第2周评估。第2周使用LOCF方法估算。
AVISIT=Week 2 but VISIT=Week 1 so one can see where the imputed value came from in the original data.
AVISIT =第2周,但VISIT =第1周,因此可以看到估算值在原始数据中的来源。
Subject 1003 is missing Week 2 and 3 data. A Data as Observed (DAO) analysis can be performed by selecting only those rows where DTYPE is null. For a LOCF analysis, all rows (DTYPE=null or DTYPE="LOCF") should be used.
对象1003缺少第2周和第3周数据。可以通过仅选择DTYPE为空的那些行来执行观测数据(DAO)分析。对于LOCF分析,应使用所有行(DTYPE = null或DTYPE =“ LOCF”)。
Table 4.5.1.1 Example 1: ADaM Dataset with Identification of Rows Used in a LOCF Analysis
表4.5.1.1示例1:具有LOCF分析中使用的行标识的ADaM数据集
Row |
USUBJID |
VISIT |
AVISIT |
ADY |
PARAM |
AVAL |
DTYPE |
VSSEQ |
1 |
1001 |
Baseline |
Baseline |
-4 |
SUPINE SYSBP (mm Hg) |
145 |
171 |
|
2 |
1001 |
Week 1 |
Week 1 |
3 |
SUPINE SYSBP (mm Hg) |
130 |
191 |
|
3 |
1001 |
Week 2 |
Week 2 |
9 |
SUPINE SYSBP (mm Hg) |
133 |
201 |
|
4 |
1001 |
Week 3 |
Week 3 |
20 |
SUPINE SYSBP (mm Hg) |
125 |
211 |
|
5 |
1002 |
Baseline |
Baseline |
-1 |
SUPINE SYSBP (mm Hg) |
145 |
50 |
|
6 |
1002 |
Week 1 |
Week 1 |
7 |
SUPINE SYSBP (mm Hg) |
130 |
60 |
|
7 |
1002 |
Week 1 |
Week 2 |
7 |
SUPINE SYSBP (mm Hg) |
130 |
LOCF |
60 |
8 |
1002 |
Week 3 |
Week 3 |
22 |
SUPINE SYSBP (mm Hg) |
135 |
70 |
|
9 |
1003 |
Baseline |
Baseline |
1 |
SUPINE SYSBP (mm Hg) |
150 |
203 |
|
10 |
1003 |
Week 1 |
Week 1 |
8 |
SUPINE SYSBP (mm Hg) |
140 |
213 |
|
11 |
1003 |
Week 1 |
Week 2 |
8 |
SUPINE SYSBP (mm Hg) |
140 |
LOCF |
213 |
12 |
1003 |
Week 1 |
Week 3 |
8 |
SUPINE SYSBP (mm Hg) |
140 |
LOCF |
213 |
Example 2
例子2
Identification of rows used in both LOCF and WOCF analyses.
标识在LOCF和WOCF分析中使用的行。
Table 4.5.1.2 shows a situation where there is more than one imputation method used. In this case, additional rows are generated for each type of imputation. A DAO analysis can be performed by selecting only those rows where DTYPE is null. For LOCF analysis, all rows with DTYPE=null or DTYPE="LOCF" should be used. For WOCF analysis, all rows with DTYPE=null or DTYPE="WOCF" should be used.
表4.5.1.2显示了使用不止一种插补方法的情况。在这种情况下,将为每种插补类型生成其他行。通过仅选择DTYPE为空的那些行可以执行DAO分析。对于LOCF分析,应使用DTYPE = null或DTYPE =“ LOCF”的所有行。对于WOCF分析,应使用DTYPE = null或DTYPE =“ WOCF”的所有行。
Table 4.5.1.2 Example 2: ADaM Dataset with Identification of Rows Used in Both LOCF and WOCF Analyses
表4.5.1.2示例2:具有在LOCF和WOCF分析中使用的行标识的ADaM数据集
Row |
USUBJID |
VISIT |
AVISIT |
ADY |
PARAM |
AVAL |
DTYPE |
VSSEQ |
1 |
1002 |
Baseline |
Baseline |
-4 |
SUPINE SYSBP (mm Hg) |
145 |
77 |
|
2 |
1002 |
Week 1 |
Week 1 |
3 |
SUPINE SYSBP (mm Hg) |
130 |
78 |
|
3 |
1002 |
Week 2 |
Week 2 |
9 |
SUPINE SYSBP (mm Hg) |
138 |
79 |
|
4 |
1002 |
Week 3 |
Week 3 |
18 |
SUPINE SYSBP (mm Hg) |
135 |
80 |
|
5 |
1002 |
Week 3 |
Week 4 |
18 |
SUPINE SYSBP (mm Hg) |
135 |
LOCF |
80 |
6 |
1002 |
Week 2 |
Week 4 |
9 |
SUPINE SYSBP (mm Hg) |
138 |
WOCF |
79 |
7 |
1002 |
Week 5 |
Week 5 |
33 |
SUPINE SYSBP (mm Hg) |
130 |
81 |
|
8 |
1003 |
Baseline |
Baseline |
-1 |
SUPINE SYSBP (mm Hg) |
145 |
122 |
|
9 |
1003 |
Week 1 |
Week 1 |
7 |
SUPINE SYSBP (mm Hg) |
140 |
123 |
|
10 |
1003 |
Week 2 |
Week 2 |
15 |
SUPINE SYSBP (mm Hg) |
138 |
124 |
|
11 |
1003 |
Week 2 |
Week 3 |
15 |
SUPINE SYSBP (mm Hg) |
138 |
LOCF |
124 |
12 |
1003 |
Week 2 |
Week 4 |
15 |
SUPINE SYSBP (mm Hg) |
138 |
LOCF |
124 |
13 |
1003 |
Week 2 |
Week 5 |
15 |
SUPINE SYSBP (mm Hg) |
138 |
LOCF |
124 |
14 |
1003 |
Week 1 |
Week 3 |
7 |
SUPINE SYSBP (mm Hg) |
140 |
WOCF |
123 |
15 |
1003 |
Week 1 |
Week 4 |
7 |
SUPINE SYSBP (mm Hg) |
140 |
WOCF |
123 |
16 |
1003 |
Week 1 |
Week 5 |
7 |
SUPINE SYSBP (mm Hg) |
140 |
WOCF |
123 |
Approaches Considered and Not Adopted
考虑和不采用的方法
Another approach considered is to create a complete separate set of records for each analysis type (or a separate dataset), indicating the various analysis types by assigning unique values of the analysis timepoint description AVISIT, for example, "Week 4," "Week 4 (LOCF)," and "Week 4 (WOCF)". This approach would make it more foolproof to perform the DAO, LOCF, and WOCF analysis in one step by referencing only AVISIT. However, because so many records would be duplicated, a very large dataset is one of the major disadvantages for this approach. In addition, this approach might be less tool-friendly, in that one might need to parse AVISIT searching for a key substring such as "(LOCF)." This approach should not be used.
另一种考虑的方法是为每个分析类型(或一个单独的数据集)创建一个完整的单独的记录集,通过分配分析时间点描述AVISIT的唯一值来指示各种分析类型,例如,“第4周”、“第4周(LOCF)”和“第4周(WOCF)”。通过只引用AVISIT,这种方法可以更加简单地一步执行DAO、LOCF和WOCF分析。
但是,由于要复制的记录太多,所以非常大的数据集是这种方法的主要缺点之一。
此外,这种方法可能对工具不太友好,因为可能需要解析AVISIT搜索关键子字符串,如“(LOCF)”。
不应该使用这种方法。
A third approach considered is to create a flag (LOCFFL/LOCFFN) to indicate when a record is created by virtue of last observation carried forward, and similarly for WOCF. This is similar to the specified ADaM methodology, except that a separate flag is created for each derivation type, rather than indicating row derivation type in one column DTYPE. This approach might result in fewer records than the recommended approach (e.g., if the WOCF record is the same as the LOCF record). In other respects, this approach shares the advantages and disadvantages of the recommended approach. This approach of creating separate flags for each derivation type is not recommended.
考虑的第三种方法是创建一个标志(LOCFFL / LOCFFN),以指示何时根据结转的最后观察创建记录,对于WOCF也是类似的。这与指定的ADaM方法相似,除了为每种派生类型创建一个单独的标志,而不是在一个列DTYPE中指示行派生类型。与推荐的方法相比,此方法可能会导致记录减少(例如,如果WOCF记录与LOCF记录相同)。在其他方面,此方法具有推荐方法的优点和缺点。不建议这种为每种派生类型创建单独标志的方法。
4.5.2 Identification of Baseline Records
4.5.2 基准记录的识别
Many statistical analyses require the identification of a baseline value. This section describes how a record used as a baseline is identified.
许多统计分析都需要确定基线值。本节介绍如何识别用作基线的记录。
ADaM Methodology
ADaM方法论
The ADaM methodology is to create a baseline flag column to indicate the record used as baseline (the record whose value of AVAL is used to populate the BASE variable). This method does not require duplication of records in the event that the baseline record is not derived.
ADaM方法是创建一个基线标志列以指示用作基线的记录(该记录的AVAL值用于填充BASE变量)。如果不导出基线记录,则此方法不需要重复记录。
Although a baseline record flag variable ABLFL is created and used to identify the record that is the baseline record, this does not prohibit also providing a record with a unique value of AVISIT (e.g., "Baseline"), designating the baseline record used for analysis, even if redundant with another record. For more complicated baseline definitions (functions of multiple records), a derived baseline record would have to be created as described in 4.2.1.3, Rule 3: A function of one or more rows within the same parameter for the purpose of creating an analysis timepoint should be added as a new row for the same parameter. This methodology requires that clear metadata be provided for the baseline record variable so that the value can be reproduced accurately.
尽管创建了基线记录标志变量ABLFL并将其用于标识作为基线记录的记录,但这并不禁止还提供具有AVISIT唯一值的记录(例如“ Baseline”),指定用于分析的基线记录,即使与另一个记录重复。对于更复杂的基线定义(多个记录的功能),必须按照4.2.1.3第3条规则中的描述创建派生基线记录: 为了创建分析时间点,在同一参数内具有一个或多个行的功能应该添加为同一参数的新行。此方法要求为基线记录变量提供清晰的元数据,以便可以准确地重现该值。
Example 1
例子1
Identification of baseline rows – using screening visit to impute a baseline row.
识别基线行–使用筛选访问来估算基线行。
This example (Table 4.5.2.1) illustrates the use of a baseline flag variable ABLFL. It also illustrates the inclusion of an additional row for a baseline analysis timepoint (row 6). In this example, a unique value of AVISIT has been defined for the baseline record used for analysis. Subject 1001 had complete data. There was no record that qualified as a baseline value for Subject 1002 in the source data. A derived baseline record (AVISIT="Baseline") is added with DTYPE="LVPD" (Last Value Prior to Dosing) to indicate that the record is imputed to be used as baseline.
此示例(表4.5.2.1)说明了基准标志变量ABLFL的用法。它还说明了基线分析时间点(第6行)的附加行。在此示例中,已为用于分析的基线记录定义了AVISIT的唯一值。受试者1001具有完整的数据。在源数据中没有记录可以作为主题1002的基线值。派生的基线记录(AVISIT =“ Baseline”)与DTYPE =“ LVPD”(加料前的最后值)一起添加,以指示该记录被推算为用作基线。
Table 4.5.2.1 Example 1: ADaM Dataset with Identification of Baseline Rows when Imputation Is Used
表4.5.2.1示例1:使用插补时具有基线行标识的ADaM数据集
Row |
USUBJID |
VISIT |
AVISIT |
ADY |
ABLFL |
PARAM |
AVAL |
BASE |
DTYPE |
VSSEQ |
1 |
1001 |
Screening |
Screening |
-12 |
SUPINE SYSBP (mm Hg) |
144 |
1 |
|||
2 |
1001 |
Baseline |
Baseline |
1 |
Y |
SUPINE SYSBP (mm Hg) |
145 |
2 |
||
3 |
1001 |
Week 1 |
Week 1 |
6 |
SUPINE SYSBP (mm Hg) |
130 |
145 |
3 |
||
4 |
1001 |
Week 2 |
Week 2 |
12 |
SUPINE SYSBP (mm Hg) |
133 |
145 |
4 |
||
5 |
1002 |
Screening |
Screening |
-14 |
SUPINE SYSBP (mm Hg) |
144 |
1 |
|||
6 |
1002 |
Screening |
Baseline |
-14 |
Y |
SUPINE SYSBP (mm Hg) |
144 |
LVPD |
1 |
|
7 |
1002 |
Week 1 |
Week 1 |
8 |
SUPINE SYSBP (mm Hg) |
130 |
144 |
2 |
||
8 |
1002 |
Week 2 |
Week 2 |
14 |
SUPINE SYSBP (mm Hg) |
133 |
144 |
3 |
Example 2
例子2
Identification of baseline rows – using an average of multiple visits to derive a baseline row.
识别基准行–使用多次访问的平均值得出基准行。
This example (Table 4.5.2.2) illustrates the use of a baseline flag variable ABLFL to identify the record used as baseline for analysis in a scenario where the baseline value is based on the average of the non-missing values collected prior to dosing. Row 3 is a derived "Baseline" record using the average of the values of row 1 and row 2. DTYPE="AVERAGE" to indicate that row 3 is derived. The Baseline flag (ABLFL="Y") indicates that AVAL from row 3 is used to populate the BASE (Baseline) column. VISIT (from SDTM) is left blank on row 3 since AVAL on that record is not merely a copy of AVAL on another record.
此示例(表4.5.2.2)说明了使用基线标志变量ABLFL来识别用作基线的记录的情况,其中基线值基于给药前收集的非缺失值的平均值。第3行是派生的“基线”记录,使用第1行和第2行的值的平均值。DTYPE =“ AVERAGE”表示派生了第3行。基准标志(ABLFL =“ Y”)表示AVAL
第3行中的数据用于填充BASE(基准)列。(来自SDTM的)VISIT在第3行上保留为空白,因为该记录上的AVAL不仅是另一条记录上的AVAL的副本。
Table 4.5.2.2 Example 2: ADaM Dataset with Identification of Baseline Rows when Baseline Is an Average
表4.5.2.2示例2:基线为平均值时具有基线行标识的ADaM数据集
Row |
USUBJID |
VISIT |
AVISIT |
ADY |
ABLFL |
PARAM |
AVAL |
BASE |
DTYPE |
1 |
1001 |
Screening |
Screening |
-12 |
SUPINE SYSBP (mm Hg) |
144 |
144.5 |
||
2 |
1001 |
Baseline |
Baseline |
1 |
SUPINE SYSBP (mm Hg) |
145 |
144.5 |
||
3 |
1001 |
Baseline |
Y |
SUPINE SYSBP (mm Hg) |
144.5 |
144.5 |
AVERAGE |
||
4 |
1001 |
Week 1 |
Week 1 |
12 |
SUPINE SYSBP (mm Hg) |
130 |
144.5 |
||
5 |
1001 |
Week 2 |
Week 2 |
-14 |
SUPINE SYSBP (mm Hg) |
133 |
144.5 |
Example 3例子3
Identification of baseline rows – using an average of multiple visits to derive a baseline row.
识别基准行–使用多次访问的平均值得出基准行。
This example (Table 4.5.2.3) is the same as Example 2 except that the analysis timepoint description "Screening/Baseline Combination" helps differentiate the derived average baseline record from an existing observed record whose timepoint description is "Baseline." This was helpful in analysis and reporting because it was desired to summarize all scheduled visits in addition to the average baseline visit. The analysis was straightforward using the distinct descriptions of AVISIT. The choice of AVISIT values is up to the producer.
此示例(表4.5.2.3)与示例2相同,不同之处在于分析时间点描述“筛选/