• Ingen resultater fundet

Chapter 3. Theoretical framework

3.4 Literature review

3.4.2 Experimental studies of Machine Translation-assisted Translation Memory translation translation

3.4.2.1 Editing speed

As stated by Federico et al., drawing on O’Brien (2011), “improving processing speed is indeed the primary interest of [sic] translation industry as this figure can be directly related to the cost of the translation” (2012, p.2). Hence, most studies of MT-assisted TM translation are interested in the speed with which translators edit TM and MT matches.

One of these studies is O’Brien (2007) which investigated the cognitive effort required from translators for TM and MT matches. This she measured on the basis of editing speed and increase in pupil size in an experiment with four professional translators. In the experiment, two translators translated an English source text of 235 words about an anti-virus program into their native language German, and two translators translated the same source text into their native language French, using the CAT tool SDL Translator’s Workbench. O’Brien’s design differed slightly from the typical MT-assisted TM setup in that, in addition to MT matches, 100% TM matches and different types of fuzzy matches (74-75%, 80%, 90% and 99% matches), she also included no matches in her design. O’Brien applied eye-tracking and cued retrospective interviews, i.e. both an online and offline data collection method and both a method for observing translator behaviour and a method for obtaining verbal-report data (Krings 2005). Although it is not explicitly stated, the experiment was most likely conducted in a laboratory setting.

The time spent on each segment was measured by means of the eye-tracking analysis software, apparently by noting starting and ending times for the editing of every segment.

Afterwards, editing speeds for the different match types were calculated by dividing the number of words in a segment by the seconds spent editing that segment. Results showed that 100% matches were processed at a faster speed than the other match types and that no matches were processed at the lowest speed, thus suggesting that no matches required the

most cognitive effort from translators and 100% matches the least. Also, O’Brien found that the editing of an MT match took as much time and cognitive effort as the editing of an 80-90% fuzzy match. O’Brien also found that editing speeds decreased with fuzzy match value, although the speed was slightly higher for 74% than for 80% TM matches. O’Brien attributed this to a limited data set.

Based on O’Brien’s (2007) findings, one of the hypotheses formulated and investigated by Guerberof Arenas (2009) was that “the time invested in post-editing one string of machine translated text will correspond to the same time invested in editing a fuzzy matched string located in the 80-90 percent range” (Guerberof Arenas 2009, p.12). The study, a pilot study published ahead of Guerberof Arenas’ doctoral thesis (Guerberof Arenas 2012, see below), was conducted as an experiment with 8 professional translators. The translators were asked to translate an English source text of 791 words from the localization domain into Spanish using an online post-editing tool unfamiliar to the translators. Like O’Brien’s (2007) study, Guerberof Arenas’ design differed slightly from the typical MT-assisted TM setup in that translators were presented with TM matches, MT matches and no matches. Unlike O’Brien’s setup, the translators did not know the provenance of the segments, i.e. whether they came from a TM or an MT engine, and Guerberof Arenas only included TM matches from the 80-90% range. When translating the text, the translators could only see one segment at a time and were not allowed to go back to previous segments. In terms of methods, Guerberof Arenas used the post-editing tool to collect data on the time spent on each segment, which, in Krings’ (2005) terms, can be classified as an online method for observing behaviour and, more specifically, as a type of keystroke logging restricted to the collection of data on time.

Guerberof Arenas also applied offline methods, namely analysis of the translation product (to obtain data on quality) and a combination of a retrospective and a generalized

questionnaire where translators were asked about the experiment and their general experience.

The study showed that translators edited MT matches at higher speeds than they edited 80-90% TM matches and translated from scratch. Thus, Guerberof Arenas’ hypothesis was not validated, and the findings were not in line with O’Brien’s (2007) results. Furthermore, Guerberof Arenas’ study showed that translators edited 80-90% TM matches faster than they translated segments from scratch. Thus, the findings suggest that translators experience a gain in productivity if they use a translation aid and that this gain is higher if they edit MT matches than if they edit TM matches (with a gain in productivity of 25% and 11%, respectively). Guerberof Arenas’ study also indicated that faster translators

experienced a smaller productivity gain than slower translators when using a translation aid compared to when translating from scratch. The author was also interested in whether technical experience was associated with higher editing speed. She defined technical experience as “a combination of experience in localisation, in knowledge of tools, in subject matter (in this case supply chain), and in post-editing of machine translated output”

(Guerberof Arenas 2009, p.18). Guerberof Arenas’ findings showed that experience has a clear effect on processing speed, with experienced translators being faster than less experienced.

In terms of ecological validity, it appears problematic that the translators were asked to

translate using an unfamiliar tool, that the translators could only see one segment at a time and were not allowed to return to previous segments. However, since the source text was not a coherent text, i.e. did not consist of consecutive segments, it might have been less relevant in Guerberof Arenas’ setup to allow translators to return to previous segments.

Tatsumi (2010) conducted an experiment with 9 professional translators working in an MT-assisted TM environment. Tatsumi’s main interest was MT; however, she included a small proportion of TM matches since she wished to study the differences between editing MT and editing TM matches (Tatsumi 2010, p.71). In the following, after describing her research design, I will concentrate on the analyses where Tatsumi focused on MT-assisted TM.

Tatsumi had native speakers of Japanese translate 5,029 words each from English into Japanese. The 5,029 words were distributed over a number of source texts, all of which had been extracted from a user manual of a data storage product developed by the Symantec corporation. The experiment was conducted in the field, enabling the translators to work in their familiar environment. The CAT tool applied was SDL Trados Translator’s Workbench, a tool with which the translators were familiar. Tatsumi included TM matches with match values from 75 to 99%. The translators were told that the target text did not have to be stylistically sophisticated. In terms of methods, Tatsumi applied screen recording and in order to measure the time spent on each segment, she drew on a function in SDL Trados Translator’s Workbench in combination with a macro devised for the experiment.

Furthermore, she asked translators to complete a retrospective questionnaire asking participants for facts about them and their opinions about post-editing and MT. Thus, she applied both online observation methods and generated offline verbal-report data (Krings 2005).

In relation to obtaining data relating to time, SDL Trados Translator’s Workbench was able to record the time when a translation segment had last been saved. However, if a segment was opened and closed more than once, the CAT tool would overwrite the previously recorded time with the new time. The mentioned macro was devised to force the CAT tool to save the data every time the translator entered a specific segment. However, this required the translators to use a certain keyboard shortcut (and not, for example, the mouse) to close and open segments, and Tatsumi thus instructed translators to work in this manner.

However, Tatsumi still only had the time when a segment was closed and not the time it was opened, and thus had to assume that the time when a translator finished editing a segment was the same as the time the translator started editing the next segment. Tatsumi noted that this does not reflect the exact time spent on each segment, since translators are expected to spend time sipping coffee, stretching shoulders and so on, but added that she was able to consult the screen recording data when she needed to clarify why certain segments had taken translators excessive time (Tatsumi 2010, p.67).

Tatsumi primarily studied the difference between the editing of MT matches and TM matches in terms of temporal and technical effort (for the latter, see Section 3.4.2.2). She operationalized temporal effort as editing speed, which she measured in words per minute.

Tatsumi’s results showed that the average editing speed for MT matches was at least faster than the average editing speed for 75-79% TM matches for all translators and thus

concluded that MT editing speed was not substantially lower than TM editing speed.

Tatsumi’s results are highly interesting since the data set is quite large, compared to other studies, and since the experiment was conducted in workplace settings. However, it might have impacted on the results that translators were instructed to work in a certain way (using shortcuts for opening and closing segments), that only data on the closing of segments were obtained and that the translators were told that the target text did not have to be

stylistically sophisticated. This might have had a relatively larger impact on the editing speed for MT matches than on the editing speed for TM matches since one might expect it to require more time from translators to turn MT matches into something which is stylistically sophisticated than to do the same with TM matches which have been translated by a human being.

Teixeira (2011) studied MT-assisted TM translation in an experiment with two translators.

The study was a pilot study published ahead of his 2014 doctoral thesis (Teixeira 2014b, see below). Teixeira was interested in whether and how translators’ behaviour was influenced by the availability of provenance information, i.e. information about whether segments come from MT, TM and, in the latter case, at which match percentage. Teixeira suggested that differences in the presence of provenance information might be one of the reasons for the different findings in e.g. O’Brien (2007) and Guerberof Arenas (2009) (Teixeira 2011, p.108). Teixeira compared two environments; one where the translators did not know the provenance of translation suggestions (blind) and another where they did (visible). Teixeira had two professional translators each translate two similar source texts (approx. 500 words per text), one in each environment. The texts were taken from the same technical text about composite materials in car manufacturing and were translated from English into Spanish, the translators’ native language. The CAT tool applied was SDL Trados Studio 2009 Freelance.

Both translators used their own laptop computers during the experiment which meant that they could keep their preferred configuration in terms of, for example, keyboard, browser favourites, dictionaries etc. The translators also had access to the Internet. Data were collected using screen recording, keystroke logging and retrospective interviews. Thus, both online and offline methods were applied, and Teixeira observed translator behaviour as well as generated verbal-report data (Krings 2005).

Teixeira hypothesized that the editing speed (measured in words per hour) was higher when provenance information was available to the translator. In order to measure the time spent on each segment, Teixeira manually noted down start and end times for the editing of each segment while watching the screen recordings. Time was counted when translators were typing, thinking, hesitating, or looking at the source text, but not when translators “switched to another window to look up terminology, tried to find a specific function in the tool, or spoke with the researcher” (Teixeira 2011, p.111). Also, time was not counted when the translator “started moving the mouse to go to another application (usually a web browser) outside of the translation environment” or when the translator “moved to the source segment to copy text to be pasted in the browser” (Teixeira 2011, p.111). Time was again counted when the translator returned to the CAT tool. Time spent on searches within the CAT tool (mainly with the concordance function) was included in the time spent on the particular segment.

Teixeira’s results are inconclusive with regard to translation speed as one translator was slightly faster in the visible environment and the other slightly faster in the blind

environment. However, it is interesting that with both translators, Teixeira saw a dramatic reduction in speed for 100% matches when provenance information was not available.

Hence, it seems that the indication that a match is a 100% match affects the translation process considerably in relation to speed.

The fact that the translators used their own computers during the experiment contributed to a high degree of ecological validity in Teixeira’s study, as they did not have to familiarize themselves with a new computer with different settings. However, the fact that Teixeira omitted time spent by translators on, for example, looking up terminology outside the CAT tool and using the Internet is questionable. An argument for this approach may be that, due to a potentially uneven distribution of terms in the source text, including this time might influence the data in an inappropriate way. However, such activities are an important part of the translation process and therefore time spent on these should be included, in my opinion.

Also, if the translator, for example, encountered a terminological translation problem which was solved by means of the concordance function, Teixeira included the time spent on the concordance search(es), whereas he did not include the time if the translator used the Internet to solve the problem. On the face of it, this seems contradictory. Another plausible explanation might be that Teixeira wished to include only the time spent in the CAT tool itself. However, again, I would argue that activities undertaken outside the CAT tool should also be taken into account. Finally, as noted by Teixeira (2011, p.117), in the checking phase, it was especially difficult to identify which segment the translators were focusing on.

Teixeira did not explain how he dealt with cases of doubt, but noted that he was considering eye-tracking as a means of solving this issue.

Skadiņš et al. (2011) conducted an experiment on MT-assisted TM translation as part of the LetsMT! project. They were specifically interested in the potential value of integrating MT and TM within the localization domain and for the language pair English-Latvian, since Latvian is a highly inflected language which might pose difficulties in relation to MT. In the experiment, Skadiņš et al. integrated MT into the CAT tool SDL Trados 2009 and compared two scenarios: one where only TM was employed and one where MT was used in

combination with TM, where the aim was to measure the impact of adding MT to the process. The impact was measured in translation performance (number of words translated per hour) and quality (cf. Section 3.4.2.3). The authors did not explicitly state how they measured the time spent by each translator on each task, i.e. whether this was, for example, measured automatically by means of the CAT tool, registered manually by the researchers or self-reported by the translators.

Five professional translators with different levels of experience participated in the experiment which was conducted within a professional localization company, the usual workplace of the translators. The source texts came from the IT domain and were selected from the incoming work pipeline provided they contained between 950 and 1,050 words.

Each document was split in half, where the first part of it was translated in the TM scenario

by one translator and the second half in the MT scenario by another translator. Skadiņš et al.

did not explain how the tasks were assigned to the different translators, and whether all translators translated an equal number of texts. During the experiment, provenance information was visible to the translators and they were allowed to use whatever external resources they wanted. In the MT scenario, translators were provided with both a TM and an MT match for every source segment for which a 100% match was not found in the TM, and it seems that translators were free to choose which match to work with. Results were analyzed for 46 texts, 23 in each scenario.

Skadiņš et al.’s findings showed that the implementation of MT increased translation speed from 550 to 731 words per hour, on average, corresponding to an increase of 32.9%. The MT system used in the experiment was trained on TMs from a specific client and the experiment contained texts from both this and another client. Results showed that a higher increase in speed was seen for the texts from the client whose TMs were included in the training than for the texts from the other client, with increases in speed of 37% and 24%, respectively.

The study of Skadiņš et al. has high ecological validity, as it was conducted at the workplace and allowed translators to use their usual resources when translating. Also, it is highly interesting to see findings for smaller languages such as Latvian. However, since using MT was new to this company, it can be assumed that the translators’ experience with MT was low which may have impacted on the results. Further, it is unclear whether the match values of the TM matches provided in the different texts were comparable. Finally, when arguing for providing the translators with provenance information for each match, the authors stated that this allowed translators to pay more attention to MT matches. They argued that this was necessary since MT output may be inaccurate, ungrammatical and contain wrong terminology, whereas “[t]ranslators are not double-checking terminology, spelling and the grammar of TM suggestions, because the TM contains good quality data” (Skadiņš et al.

2011, p.37). This is, however, not supported by Guerberof Arenas’ study (2009) (cf. Section 3.4.2.3), in which a greater number of errors was found in TM matches than in MT matches, supposedly because the translators did not question the TM matches because they flowed more naturally.

Guerberof Arenas (2012) conducted an experiment with 24 professional translators and 3 reviewers. The 24 translators translated a text from English into Spanish, their native language. The source text came from the localization domain and comprised 2,124 words (618 words of TM segments from the 85-94% range, 757 words of MT segments and 749 words in no match segments). As such, Guerberof Arenas’ study differs from the typical MT-assisted TM environment in the same way as O’Brien (2007) and Guerberof Arenas (2009), i.e. in including no matches. The translators used a web-based tool to translate the text, and the MT engine employed was trained on TM material and three glossaries. As in Guerberof Arenas (2009), the translators could only see one segment at a time and did not know the origin of each segment. It was not possible for translators to return to previous segments or check their translations, and the glossary provided to the translators was not integrated in the CAT tool. During the assignment, translators worked from their home or office, i.e. in their typical environment. In terms of methods applied, the web-based tool measured the

time spent by each translator on each segment and can thus be classified as a type of limited

time spent by each translator on each segment and can thus be classified as a type of limited