Language Validation

Language validation, also known as linguistic validation, is the process of guaranteeing that translated content, a translated language assessment tool or instrument accurately captures the intended meaning and construct of the original source material. The goal of language validation is for the translated version to be understood in the same way as the original. Language validation allows conceptual and cultural equivalences to persist across languages. It is a crucial step in developing and using language properly across different languages and cultures.

Take, for example, idiomatic consistency. Idioms may not translate literally from language to language because they may not translate from culture to culture. The idea of bocci ball is not culturally relevant to the target language so one may have to substitute something that is. The game of bocci ball is not cross-cultural but the movements that one makes as one plays are similar to bowling so bowling would be substituted.

Language validation typically involves multiple steps: forward translation by multiple translators, reconciliation into one version, back-translation into the source language, and expert review to check accuracy and equivalence, and cognitive debriefing with target language speakers.

Collecting different types of validity evidence, such as construct, content, criterion-related, through analyses and qualitative feedback is part of the validation process. Validation is an ongoing process, not a one-time exercise, as the validity of interpretations can change over time and context. Target language speakers often will perform a cognitive debriefing to verify that the translated items are understood as intended. Contemporary validation models suggest making use of multiple stakeholders such as language users and teachers to incorporate their perspectives on the language domain being tested.

Large Language Models (LLM) and Natural Language Processing (NLP) functionalities could be incorporated into the five steps of the language validation process:

Forward Translation

The Current Process: Engage two professional translators who are native speakers of the target language and have subject matter expertise in the domain such as healthcare and technical writing, for example. Have each translator independently translate the source text/content into the target language, creating two forward translations.

An LLM could be used for automated translation from the source language to the target languages, potentially providing multiple translation options.

NLP techniques such as machine translation and transfer learning on parallel corpora could be leveraged.

Reconciliation

The Current Process: Bring in a third linguist, also a native speaker of the target language, to review and reconcile the two forward translations into a single reconciled version. This reconciler should select the best renderings from each translation.

An LLM could analyze and compare the different forward translations to identify the best portions and generate a reconciled version.

NLP tasks like text summarization, sentence fusion, and language generation conditioning could potentially aid this step.

Back Translation

The Current Process: Have two new translators, native speakers of the source language, independently back-translate the reconciled version into the source language without seeing the original text. Review the back-translations against the original source to check for conceptual equivalence and identify any meaning that may have been lost or changed.

The LLM’s generation capabilities could be used to back-translate the reconciled target versions into the source language. This back-translation could then be compared to the original source text using NLP methods like semantic similarity, textual entailment, etc. to check for conceptual equivalence.

Cognitive Debriefing

The Current Process: Conduct cognitive debriefing interviews with a small sample (5-8) of target language speakers from the intended user population. Have them complete the translated text/survey and then probe their understanding of each item. Based on the cognitive debrief feedback, make any necessary modifications to the reconciled translation to improve clarity and cultural appropriateness.

Large language models excel at open-ended question answering and dialogue, which could be leveraged to conduct virtual cognitive interviews probing the understanding of translated texts.

NLP techniques for query understanding, reading comprehension, and response generation could simulate the cognitive debriefing process.

Finalization & Harmonization

The Current Process: Have the reconciler review and finalize the translation based on all the evidence collected in the previous steps. If possible, the final translation should be reviewed by a subject matter expert familiar with the target language population.

After back-translation comparison, the LLM could suggest revisions to the target translations to improve conceptual alignment with the source. This could use capabilities like text editing, rephrasing, and constrained language generation.

The multi-step linguistic validation process involving forward translation, reconciliation, back-translation, cognitive debriefing and expert review helps confirm conceptual equivalence and cultural appropriateness of translations. It follows best practice guidelines from ISPOR and regulatory bodies like FDA and EMA.

This process can incorporate capabilities like text editing, rephrasing, and constrained language generation. However, the results do not provide clear implementations or benchmarks on how effective current LLMs or NLP methods would be for these linguistic validation tasks compared to the standard human-driven processes outlined. Close human supervision and quality checks would likely still be required, especially for high-stakes domains like clinical trials. But the above are some potential ways an LLM or NLP could programmatically assist and augment the linguistic validation workflow. Developing customized models and pipelines for this specific use case could be an area of further research and application.

Sources:

“The Validation of Language Tests”; Johann L. van der Walt and H.S. Steyn (Jr.); Stellenbosch Papers in Linguistics, Vol. 38, 2008, 191-204 doi: 10.5774/38-0-29; https://www.ajol.info/index.php/spl/article/download/116343/105877

“What is Linguistic Validation?”; RWS Life Sciences Blog; Beatrice Tedeshi; September 6, 2022; https://www.rws.com/industries/life-sciences/blog/what-is-linguistic-validation/

“Linguistic Validation Explained”; Win and Winnow Language Services; May 6, 2022; https://winandwinnow.com/site/linguistic-validation-explained/

“Linguistic Validation”; Wikipedia; November 27, 2023; https://en.wikipedia.org/wiki/Linguistic_validation

About Ben