Book translation

Book translation#

Mar, 2024

Machine translation

Background#

A few years ago, I wrote a book in Spanish about my personal experience walking the Camino de Santiago. At the time, I considered translating it into English to reach more readers on Amazon, but I discarded the idea because of the effort and costs involved.

However, with the advancement of Artificial Intelligence during these last few years, I thought it would be interesting to experiment with some model to automatically translate the book for me. Perhaps it would turn out so well that it wouldn’t need further manual adjustments by humans.

The historical evolution of Machine Translation has been the following one:

1950 - 2000: Rule based linguistic approach.
2000 - 2010: Statistical approach. Access to many bilingual texts. Instead of building rules, statistical models were considered.
2010 - 202x: Neural Machine Translation (NMT). Advanced Deep Learning models, more data available to train models, more computing power.

And in this latest phase, it is worth noting the emergence of Generative Artificial Intelligence, with large language models (such as OpenAI’s GPT: a LLM based on complex Deep Learning architectures) trained for text generation, which can also be used for translating.

So, I decided to make a foray into this world to explore the possibilities before carrying out the full translation of my book.

Comparing models#

The data#

To compare translation results across different models, I would use a metric, for which first of all I needed a dataset with sentences in Spanish and their correct translation into English. For this, I relied on Tatoeba, a collaborative website dedicated to collecting sentences and their translations into multiple languages.

I downloaded a file containing over a quarter of a million translated sentences. I sorted the entries by length and kept the top thousand longest Spanish sentences (and their corresponding translations). I thought that for translating a book like mine, when comparing models, it would be best to make them work on longer sentences rather than shorter ones.

Translators#

In my comparison, I was clear that I wanted to include Google Translate and OpenAI’s GPT text generation tool. Since both are proprietary models, I thought it would be interesting to also have a free open-source model that I could run locally on my computer.

A quick search on Hugging Face convinced me to use the Helsinki-NLP/opus-mt-es-en model, an open-source model ready to translate text from Spanish to English. It is a project led by the University of Helsinki, co-funded by the EU, in which they have fine-tuned for language couples the open pre-trained model Mariam-NMT on a corpus of translated texts from the web called OPUS.

For this project, I instantiated each of these models as follows:

For the Helsinki-NLP model I used a Hugging Face pipeline. This has the model downloaded locally in my computer before running it.
For Google Translate I used a library called deep_translator as a wrapper to the Google API (no costs involved).
For GPT, I utilized the OpenAI API, using my key for access and billing. As model I selected the gpt-3.5-turbo, which is not as advanced as GPT-4 but may be enough for the purpose (and cheaper!).

Show code cell source Hide code cell source

# Instanciate Helsinki model
helsinkitranslator = pipeline(task="translation", model="Helsinki-NLP/opus-mt-es-en")

# Instanciate Google Translator
googletranslator = GoogleTranslator(source="es", target="en")

# Instanciate OpenAI-GPT
api_key = "xxx"
gptranslator = OpenAI(api_key=api_key)


# Function to translate with Helsinki model
def translate_helsinki(client, text):
    trans_hel = client(text)[0]["translation_text"]
    return trans_hel


# Function to translate with Google model
def translate_google(client, text):
    max_attempts = 5
    retry_gap = 3.0  # Initial gap between retries in seconds
    for attempt in range(max_attempts):
        try:
            trans_goo = client.translate(text)
            break  # Break inner loop on success
        except Exception as e:
            print(f"Request failed on attempt {attempt + 1}. Error: {str(e)}")
            if attempt < max_attempts - 1:
                retry_gap *= 1.5  # Increase the retry gap exponentially
                time.sleep(retry_gap)
    return trans_goo


# Function to translate with GPT model
def translate_gpt(client, text):
    # Parameters
    gpt_model = "gpt-3.5-turbo"
    model_parameters = {"model": gpt_model, "temperature": 0.0}
    system_prompt = f"""You are a translation tool.
    You receive a string written in Spanish, and solely return the same string in English without losing any of the original formatting.
    Your translations should be accurate, aiming not to deviate from de original structure, content, writing style and tone."""

    max_attempts = 5
    retry_gap = 3.0  # Initial gap between retries in seconds
    for attempt in range(max_attempts):
        try:
            messages = [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": text},
            ]
            completion = client.chat.completions.create(
                **model_parameters, messages=messages
            )
            trans_gpt = completion.choices[0].message.content
            tokens_in = completion.usage.prompt_tokens
            tokens_out = completion.usage.completion_tokens
            break  # Break inner loop on success
        except Exception as e:
            print(f"Request failed on attempt {attempt + 1}. Error: {str(e)}")
            if attempt < max_attempts - 1:
                retry_gap *= 1.5  # Increase the retry gap exponentially
                time.sleep(retry_gap)

    return trans_gpt, tokens_in, tokens_out


# Function to calculate cost for "gpt-3.5-turbo"
def cost_gpt(tokens_in, tokens_out):
    price_in = 1.50  # $ /1M tokens
    price_out = 2.00  # $ /1M tokens
    return (tokens_in * price_in / 1000000) + (tokens_out * price_out / 1000000)

SacreBLEU metric#

As comparison metric I used SacreBLEU from the Hugging Face evaluation library.

I had the three models translate successively the 1000 sentences extracted from the Tatoeba database. I compared each translation with the reference translation in Tatoeba, thus obtaining 1000 SacreBLEU scores for each of the 3 models.

...
Completed: 1000 / 1000

Cost GPT: 0.30 $

../_images/742e67097c9ea4987a58831f8907e32e710a0ee03e13a6b455e290c98a7cacea.png

It doesn’t seem like there’s much difference between the three models, although Google and GPT seem to achieve slightly better results than Helsinki in terms of SacreBLEU averages and medians. How much better? The confidence intervals are not too distinct in this last bar chart. I decided to calculate the distribution of their means to see it more clearly.

../_images/bfcf37faf68e1b7a84fd2e1e37d99aa3cbc63c90bfb82c2e7f80ee6f0ec3176d.png

Here it is clear that the difference between Google Translator and OpenAI GPT compared to Helsinki is significant in the tests conducted on the sample of 1000 long sentences.

Translating the book#

I carried out the translation of my book on these three systems. The quality of the translation should be checked later by myself (or by a professional translator or native speaker), but for now, I also wanted to see the differences in the time required and the cost.

The book is titled De Camino Tu Santiago and consists of approximately 65,000 words in Spanish.

To begin with, I loaded the book from the text file where I stored it and split it into chunks separated by line breaks to proceed with its translation paragraph by paragraph.

Number of paragraphs --> 1290

OpenAI GPT#

A book of 65,000 words translates to approximately the following number of tokens for GPT.

Book input tokens -> 105909

So at least twice as many tokens will be consumed since the result of the translation also needs to be taken into account.

Let’s proceed with the translation using OpenAI’s GPT:

...
Completed: 1290 / 1290

Execution time: 2008 s

Total tokens: 262180

Cost GPT: 0.43 $

I save the translation in a new file named dcts_en_gpt.txt.

Google Translate#

Let’s proceed with the translation using Google Translate:

...
Completed: 1290 / 1290

Execution time: 2650 s

I save the translation in a new file named dcts_en_goo.txt.

Helsinki-NLP#

Let’s proceed with the translation using the open-source model by Helsinki-NLP:

...
Completed: 1290 / 1290

Execution time: 3747 s

I save the translation in a new file named dcts_en_hel.txt.

Conclusions#

I gather the results into a table to compare them.

	Google Translate	Helsinki-NLP	OpenAI GPT
Score	53.8	48.4	53.4
Time	00:44:10	01:02:27	00:33:27
Cost	0.0	0.0	0.43

Regarding the quality of the translations, I would need to look into it more closely, although a quick glance indicates that both Google and GPT indeed seem to perform slightly better than Helsinki. However, the files containing these translations are available on GitHub for anyone to judge.

I was surprised that GPT was the fastest in translating the book, in just over half an hour. And that it took Helsinki an hour, running it on my own computer. However, I suspect that these times will vary depending on the time of the day the OpenAI and Google API services are queried, as well as the processing power of each PC for the case of open-source models run locally.

As for the cost, Google and Helsinki are free, while the translation by OpenAI using GPT 3.5 Turbo cost me 0.43 $, 0.39 € at today’s exchange rate. Inexpensive. Obviously the results are not comparable to those of a human translator, especially when it comes to translating a work of fiction. However, machine translation can help introduce the text to someone who doesn’t know the original language. Perhaps also as a first draft to work on for a more polished version later.

In any case, the results have not satisfied me for a narrative work, and I believe my book will remain unpublished in English.