Why Controlled Terminology Errors Survive Until Submission

CDISC Controlled Terminology (CT) errors in SDTM and ADaM datasets are systematically underaddressed until the final submission package review — not because they are hard to find once you look, but because the standard workflow for SDTM mapping creates a structural opportunity for errors to accumulate undetected. Understanding the structural cause explains both why the errors persist and what operational change fixes them.

How CT Errors Enter the Dataset

CDISC Controlled Terminology specifies the permissible values for categorical variables in SDTM domains. For a variable like AESEV (adverse event severity), the required terms are "MILD", "MODERATE", and "SEVERE" — not "mild", "Moderate", or "moderate/severe". For route of administration (ROUTE), permitted values come from the NCI thesaurus CDISC route of administration codelist, which contains specific terms like "ORAL" and "INTRAVENOUS" but does not contain abbreviations or alternate forms commonly used in clinical practice.

CT errors enter datasets in three ways:

Verbatim-to-codelist translation at mapping time. When the SDTM programmer maps a field value from the eCRF to the SDTM codelist, they look up the correct CT term once. The lookup is accurate at the time of mapping. The CDISC CT package is updated quarterly, and terms occasionally change between major releases — a term that was valid under CT 2023-03-31 may be retired or renamed in CT 2024-09-27. A study mapped against the 2023 CT package and submitted under the 2024 package will have errors on any terms that changed.

Free-text entry at the eCRF level. Many eCRFs accept free-text input for fields that SDTM requires CT-coded values. Site coordinators enter "oral" or "PO" rather than "ORAL"; "moderate-severe" rather than "MODERATE" or "SEVERE". The mapping converts these to the correct SDTM term at the transformation layer, but the conversion is a static lookup table that does not automatically catch new free-text variants. Any verbatim value not in the lookup table produces a blank or incorrect SDTM term that passes validation unless an explicit check is written for missing or invalid CT values.

Incorrect term selection from ambiguous codelists. Some CDISC codelists have terms that are similar enough to create genuine ambiguity. ROUTE has both "SUBCUTANEOUS" and "INTRADERMAL" — clinically distinct, but site coordinators occasionally confuse them, and the SDTM mapper translates what was entered without flagging the clinical plausibility question. The CT conformance check passes because "INTRADERMAL" is a valid CT term; only a clinical review catches that the route was misrecorded.

Why These Errors Are Not Caught in Routine QC

The structural reason CT errors persist is that they require a check against an external reference — the current CDISC CT package — that most internal QC processes do not perform automatically.

Typical SDTM dataset QC checks verify that: variable values are in the expected format, required fields are populated, ranges are within protocol-defined limits, and cross-domain consistency holds for shared variables. These checks are coded against the dataset structure and data expectations, not against the CDISC CT package. A QC check that verifies AESEV has no missing values will pass even if AESEV contains "moderate/severe" — a value that is not a valid CT term.

Pinnacle 21 Community checks do validate against CDISC CT, but Pinnacle 21 is typically run at two points: an early check during programming and a final check before packaging. Early checks catch many CT errors but are run against preliminary data. Final checks are run against finalized data, but at that point the submission timeline is tight and the standard response to a CT error is to try to correct it quickly — which sometimes introduces new errors in the correction process.

The fix is to run CT validation continuously — at every dataset refresh cycle — rather than only at designated QC milestones. This changes CT conformance from a final-stage quality gate (catch errors before submission) to a continuous monitoring activity (catch errors as they occur).

The CT Version Management Problem

A separate but related issue is CT version management across long-running studies. A Phase III study from first patient enrolled to NDA submission spans 4–6 years. The CDISC CT package has 16–24 quarterly releases in that window. The submission will reference one specific CT version, and every term in every dataset must be valid under that version.

The challenge: statistical programmers set up the SDTM mapping early in the study and typically lock the CT version at that point. The submission, years later, needs to reference a more current CT version. The programmer must either:

Accept that the submission will reference an old CT package (acceptable if no terms changed in the interim)
Re-verify all CT terms against the current CT package (time-consuming but thorough)
Implement CT term change tracking through the life of the study (requires infrastructure but is the correct long-term approach)

CT term change tracking means storing the CT package version in use at the time of each mapping decision and maintaining a comparison between the study's locked CT terms and any terms that changed in subsequent CT releases. When a term changes, the programmer receives a notification and decides whether the change affects the study's data.

This is not complicated to implement, but it requires making CT version management an explicit workflow activity rather than a one-time setup step. Most CDM teams treat it as the latter and discover version mismatches during submission prep as a result.

Practical CT Validation Rules

Beyond Pinnacle 21, several additional CT validation rules are worth incorporating into a CDM pipeline:

Extensible versus non-extensible codelists. CDISC designates codelists as either extensible (sponsor may add values not in the standard codelist, with documentation in define.xml) or non-extensible (only the standard codelist values are permitted). Non-extensible codelists require strict enforcement; a validation rule that permits any value not in the codelist is incorrect for non-extensible codelists.

Case sensitivity. CDISC CT values are case-sensitive. "ORAL" and "oral" are different values, and only "ORAL" is valid. A validation rule that performs case-insensitive comparison will pass incorrect values. Validation rules should be case-sensitive for all CT-constrained variables.

CT values in SUPPQUAL datasets. Non-standard variables in SUPPQUAL datasets may reference CT values in their QVAL fields. These values are less commonly checked than parent domain variables but can generate conformance errors that affect the linking between SUPPQUAL and the parent domain if invalid values prevent parsing.

Cross-domain CT consistency. A variable that appears in multiple domains (e.g., ROUTE appears in EX, CM, and potentially in SUPPEX or SUPPCM) should have consistent CT values for the same physical data across all appearances. A cross-domain CT consistency check catches cases where the same medication administration route is coded differently in different domains due to separate mapping decisions.

What Automated CT Monitoring Looks Like

The operational implementation of continuous CT monitoring involves running Pinnacle 21 Community checks — or an equivalent CT validation engine — on every SDTM dataset export and routing the results into the CDM monitoring workflow alongside other quality indicators. CT conformance errors are categorized by severity and age (how long has this error been present in the dataset?) and reviewed in the weekly CDM status meeting alongside open query counts and protocol deviation tallies.

This requires that CT conformance checks run automatically — not manually initiated — and that their output is stored for trend analysis. The trend matters: a conformance error that first appeared two months ago and has not been resolved is a different risk than one that appeared in the current week's extract.

MLPipeKit runs Pinnacle 21 Community conformance checks as part of every SDTM export cycle. CT errors are tracked over time, with age and recurrence information available in the study monitoring dashboard. The system also maintains CT version history for each study, flagging terms that changed in a CT package update since the study's mapping was configured.

Integrating CT conformance checks into your regular data management cycle? Talk to the MLPipeKit team about how automated monitoring works in practice.

Back to Blog