AI powered clinical translators

Introduction

immediate real time language translation can be critically important in ensuring best patient care outcomes, particularly in resuscitation scenarios when timely access to human interpreters is not available
errors in medical translation can lead to severe consequences, including misdiagnosis and inappropriate treatment - in a resuscitation situation, these errors may be seen to be acceptable compared to no translation at all.

General process

speech to text
- OpenAI Whisper API
  - can transcribe 99 languages and translate them all into English
  - https://github.com/openai/whisper
- Mozilla's Project Deep Speech
  - open source, its model follows the Baidu Deep Speech research paper, making it end-to-end trainable and capable of transcribing audio in several languages. It is also trained and implemented using Google’s TensorFlow.
optionally, text sentence splitting
optionally, text normalisation
translation via translation large language model:
- set model tokenizer source language
- get tokens for source
- generate tokens for target language
- decode target language tokens
- see example at https://huggingface.co/facebook/m2m100_418M - 100 languages trained on a Many 2 Many approach
  - how the m2m_100 model was developed: https://arxiv.org/pdf/2010.11125
- another model: https://huggingface.co/facebook/nllb-200-distilled-600M - 200 languages
- a fine tuned medical version is available at https://huggingface.co/Tippawan/medical_translation_v.1
optionally, text de-normalisation
optionally, text sentence joining
text output
text to speech

Challenges of AI powered translation

privacy and patient confidentiality
- apps which send data to the cloud for translation are likely to be an issue with privacy
- worse still, many of these apps have a clause along the lines of openly stating that any information entered may be communicated, published, reproduced, shared, and used to train their engine, or to develop future technologies or products - even if they don't get hacked or misused by a bad actor
accuracy
- usually measured by:
  - BLEU scores (Bilingual Evaluation Understudy) for text translation
- machine learning translation has become incredibly good and can produce what appears to be convincing medical translations - however, there can still be major accuracy issues due to:
  - nuance and context issues
    - the French word “l'intoxication” which is considered in French to mean “poisoning”. The same word “intoxication” in English refers to a state of inebriation from consuming too much alcohol.
  - localisation of clinical terminology and new terminologies
    - most current apps are US-trained
    - translating localized medical expressions, acronyms, abbreviations, emerging medical technologies and processes, and even definitions of certain clinical protocol can pose challenges
  - cultural competence
  - speech to text accuracy
    - voice based translators must first convert speech audio to textual data before it can be translated and this is still error-prone, although could be detected if the text is displayed to the user
    - this will be magnified by patients with speech impediments
    - usually measured by:
      - FLEURS score (Few-shot Learning Evaluation of Universal Representations of Speech) for speech to text
  - inability to interpret non-verbal communication
    - this may change with new visual apps as demonstrated by GPT-4o's ability to assess emotive states from both visual and speech data
broad range of languages
- many multicultural regions such as in Melbourne, Australia have patient cohorts speaking a range of over 100 languages
cost
accessibility
- is it available on a smart phone?
lack of accountability for errors
quality control
- generally requires “human-in-the-loop” human review by subject matter experts for quality control
- just checking translation by searching the word on Google or other search tools is likely to give errors of context and nuance
has it been trained via translation vs transcreation ¹⁾:
- transcreation
  - seeks to reproduce the aim and effect of the original message in a new way deemed completely natural in the target language and culture
  - goal is to keep the same intent, style, tone and emotion of the source material in the target language, and is the best methodology to utilize to maximize the intent and thereby accuracy of medical translations
  - it is not enough to be bilingual, to maximize accuracy, transcreation should be performed by people who were born into and think in the target language.
  - the translator should be a medical expert of the same country and thus understanding local nuances

Current options

general translators

Google translate

https://translate.google.com/
Google translator app
- this is great for causal use such as on international travel but can have difficulties with the complexities of clinical information
- has an offline mode
your own app accessing Google API
- https://firebase.google.com/docs/ml-kit/translation for offline use on iOS or Android devices
- https://github.com/matheuss/google-translate-api

specialized clinical translators

OneMeta AI
- VerbumOS's AI-powered features, offering translations in over 150+ languages
Vital
- announced in Aug 2023, Vital's HIPAA-compliant Doctor-to-Patient Translator creates 5th-grade reading-level explanations of lab and imaging results, doctor notes, discharge summaries and patient instructions, and other important medical information ²⁾
- only translates to plain English!

creating your own fine tuned translation models

based on Google translation models

https://cloud.google.com/translate/docs/advanced/automl-quickstart - uses Cloud and Google APIs

using other translation models

https://pypi.org/project/dl-translate/ python code
- can use models: m2m100, mBART-50 Large, nllb-200
- unlike the Google translate or MSFT Translator APIs, this library can be fully used offline. However, you will need to first download the packages and models, and move them to your offline environment to be installed and loaded inside a venv on a computer which can run python and which preferably has a powerful GPU

using OpenAI whisper

import whisper
import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav

# Load the Whisper model
model = whisper.load_model("base") #this will automatically download the model to user\.cache\whisper if not already downloaded

# Function to record audio from the microphone
def record_audio(duration, fs):
    print("Recording...")
    recording = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype='int16')
    sd.wait()  # Wait until the recording is finished
    return recording

# Function to save the recorded audio to a WAV file
def save_audio(filename, recording, fs):
    wav.write(filename, fs, recording)

# Function to transcribe audio using Whisper
def transcribe_audio(filename):
    result = model.transcribe(filename)
    return result['text']

if __name__ == "__main__":
    duration = 10  # Duration of the recording in seconds
    fs = 16000  # Sample rate

    # Record audio
    recording = record_audio(duration, fs)

    # Save the recorded audio to a file
    audio_filename = "recorded_audio.wav"
    save_audio(audio_filename, recording, fs)

    # Transcribe the audio file
    transcription = transcribe_audio(audio_filename)
    print("Transcription: ", transcription)

Record Audio: The sounddevice library is used to record audio from the microphone. The recording duration and sample rate are specified. Save Audio: The recorded audio is saved to a WAV file using the scipy.io.wavfile module.

hand held translation devices

offline capability

Google Translator app on smartphones
- this has an offline capability but you need to download each specific language module
TimeKettle T1 mini and the
- mainly designed for online use via its own mobile data connection via its own global data card built in
- offline use only allows 13 language pairs
iFlyTek Smart Translator
- mainly designed for online use via its own mobile data connection via its own global data card built in
- offline use only allows 18 language pairs

online only capability

there are many of these devices coming out
most access internet via Bluetooth connection to your smartphone
- eg. Enence

¹⁾

https://www.languagescientific.com/making-medical-translation-a-differentiating-factor-for-successful-clinical-trials/

²⁾

https://www.businesswire.com/news/home/20230808050943/en/Vital-Releases-Doctor-to-Patient-Translator-That-Uses-AI-and-LLMs-to-Transform-Medical-Jargon-Into-Simple-Accurate-Content-for-Patients

OzEMedicine - Wiki for Australian Emergency Medicine Doctors

Table of Contents