it:ai_translation
Table of Contents
AI powered clinical translators
see also:
-
- conversion of audio communications into formatted and often summarised text outputs which could potentially be used to create clinical notes for EMR systems
Introduction
- immediate real time language translation can be critically important in ensuring best patient care outcomes, particularly in resuscitation scenarios when timely access to human interpreters is not available
- errors in medical translation can lead to severe consequences, including misdiagnosis and inappropriate treatment - in a resuscitation situation, these errors may be seen to be acceptable compared to no translation at all.
General process
- speech to text
-
- can transcribe 99 languages and translate them all into English
-
- open source, its model follows the Baidu Deep Speech research paper, making it end-to-end trainable and capable of transcribing audio in several languages. It is also trained and implemented using Google’s TensorFlow.
-
- optionally, text sentence splitting
- optionally, text normalisation
- translation via translation large language model:
- set model tokenizer source language
- get tokens for source
- generate tokens for target language
- decode target language tokens
- see example at https://huggingface.co/facebook/m2m100_418M - 100 languages trained on a Many 2 Many approach
- how the m2m_100 model was developed: https://arxiv.org/pdf/2010.11125
- another model: https://huggingface.co/facebook/nllb-200-distilled-600M - 200 languages
- a fine tuned medical version is available at https://huggingface.co/Tippawan/medical_translation_v.1
- optionally, text de-normalisation
- optionally, text sentence joining
- text output
- text to speech
Challenges of AI powered translation
- privacy and patient confidentiality
- apps which send data to the cloud for translation are likely to be an issue with privacy
- worse still, many of these apps have a clause along the lines of openly stating that any information entered may be communicated, published, reproduced, shared, and used to train their engine, or to develop future technologies or products - even if they don't get hacked or misused by a bad actor
- accuracy
- usually measured by:
- BLEU scores (Bilingual Evaluation Understudy) for text translation
- machine learning translation has become incredibly good and can produce what appears to be convincing medical translations - however, there can still be major accuracy issues due to:
- nuance and context issues
- the French word “l'intoxication” which is considered in French to mean “poisoning”. The same word “intoxication” in English refers to a state of inebriation from consuming too much alcohol.
- localisation of clinical terminology and new terminologies
- most current apps are US-trained
- translating localized medical expressions, acronyms, abbreviations, emerging medical technologies and processes, and even definitions of certain clinical protocol can pose challenges
- cultural competence
- speech to text accuracy
- voice based translators must first convert speech audio to textual data before it can be translated and this is still error-prone, although could be detected if the text is displayed to the user
- this will be magnified by patients with speech impediments
- usually measured by:
- FLEURS score (Few-shot Learning Evaluation of Universal Representations of Speech) for speech to text
- inability to interpret non-verbal communication
- this may change with new visual apps as demonstrated by GPT-4o's ability to assess emotive states from both visual and speech data
- broad range of languages
- many multicultural regions such as in Melbourne, Australia have patient cohorts speaking a range of over 100 languages
- cost
- accessibility
- is it available on a smart phone?
- lack of accountability for errors
- quality control
- generally requires “human-in-the-loop” human review by subject matter experts for quality control
- just checking translation by searching the word on Google or other search tools is likely to give errors of context and nuance
- has it been trained via translation vs transcreation 1):
- transcreation
- seeks to reproduce the aim and effect of the original message in a new way deemed completely natural in the target language and culture
- goal is to keep the same intent, style, tone and emotion of the source material in the target language, and is the best methodology to utilize to maximize the intent and thereby accuracy of medical translations
- it is not enough to be bilingual, to maximize accuracy, transcreation should be performed by people who were born into and think in the target language.
- the translator should be a medical expert of the same country and thus understanding local nuances
Current options
general translators
Google translate
- Google translator app
- this is great for causal use such as on international travel but can have difficulties with the complexities of clinical information
- has an offline mode
- your own app accessing Google API
- https://firebase.google.com/docs/ml-kit/translation for offline use on iOS or Android devices
specialized clinical translators
-
- VerbumOS's AI-powered features, offering translations in over 150+ languages
-
- announced in Aug 2023, Vital's HIPAA-compliant Doctor-to-Patient Translator creates 5th-grade reading-level explanations of lab and imaging results, doctor notes, discharge summaries and patient instructions, and other important medical information 2)
- only translates to plain English!
creating your own fine tuned translation models
based on Google translation models
- https://cloud.google.com/translate/docs/advanced/automl-quickstart - uses Cloud and Google APIs
using other translation models
- https://pypi.org/project/dl-translate/ python code
- can use models: m2m100, mBART-50 Large, nllb-200
- unlike the Google translate or MSFT Translator APIs, this library can be fully used offline. However, you will need to first download the packages and models, and move them to your offline environment to be installed and loaded inside a venv on a computer which can run python and which preferably has a powerful GPU
using OpenAI whisper
import whisper
import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav
# Load the Whisper model
model = whisper.load_model("base") #this will automatically download the model to user\.cache\whisper if not already downloaded
# Function to record audio from the microphone
def record_audio(duration, fs):
print("Recording...")
recording = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype='int16')
sd.wait() # Wait until the recording is finished
return recording
# Function to save the recorded audio to a WAV file
def save_audio(filename, recording, fs):
wav.write(filename, fs, recording)
# Function to transcribe audio using Whisper
def transcribe_audio(filename):
result = model.transcribe(filename)
return result['text']
if __name__ == "__main__":
duration = 10 # Duration of the recording in seconds
fs = 16000 # Sample rate
# Record audio
recording = record_audio(duration, fs)
# Save the recorded audio to a file
audio_filename = "recorded_audio.wav"
save_audio(audio_filename, recording, fs)
# Transcribe the audio file
transcription = transcribe_audio(audio_filename)
print("Transcription: ", transcription)
Record Audio: The sounddevice library is used to record audio from the microphone. The recording duration and sample rate are specified. Save Audio: The recorded audio is saved to a WAV file using the scipy.io.wavfile module.
hand held translation devices
offline capability
- Google Translator app on smartphones
- this has an offline capability but you need to download each specific language module
- TimeKettle T1 mini and the
- mainly designed for online use via its own mobile data connection via its own global data card built in
- offline use only allows 13 language pairs
-
- mainly designed for online use via its own mobile data connection via its own global data card built in
- offline use only allows 18 language pairs
online only capability
- there are many of these devices coming out
- most access internet via Bluetooth connection to your smartphone
- eg. Enence
it/ai_translation.txt · Last modified: 2024/08/13 02:42 by gary1