• Ingen resultater fundet

Chapter 3. Theoretical framework

3.4 Literature review

3.4.2 Experimental studies of Machine Translation-assisted Translation Memory translation translation

3.4.2.2 Amount of editing

Another aspect of translators’ interaction with MT-assisted TM which has been investigated and which is relevant to the current thesis is the amount of editing performed by translators in different types of matches. The amount of editing performed in a match is a manifestation of the technical effort involved in editing a match (Krings 2001, cf. Section 2.2) and has been approached in different ways: by measuring the number of keystrokes performed by

translators and by measuring the difference between the provided matches and the target text using automatic evaluation metrics such as BLEU, GTM and TER (Koponen 2012).

Tatsumi (2010), in the part of her study where she compared editing of MT matches to editing of TM matches, measured the amount of editing performed on MT and TM matches by calculating the textual similarity between the match presented to the translator and the edited version by means of the GTM metric. The results showed that the amount of editing necessary for MT matches tended to be larger than the amount of editing needed for TM matches with match values above 75%. When comparing this to her results on speed, Tatsumi concluded that, although the amount of editing was larger for MT matches than for TM matches above 75%, the time taken to implement these changes was shorter for MT matches than for TM matches above 75%. This is highly interesting since there does not seem to be a direct connection between the time spent on matches and the amount of editing performed.

Guerberof Arenas (2012) also investigated the amount of editing implemented by the translators in the different segments, applying the so-called TER metric. Results showed that significantly more changes were made in the 85-94% TM matches than in the MT matches.

Thus, although TM and MT matches were edited with similar speeds (cf. Section 3.4.2.1), the translators implemented significantly more changes in TM matches than in MT matches.

Guerberof Arenas’ findings contradict Tatsumi’s findings which showed that translators edited more in MT matches than in TM matches above 75%. Guerberof Arenas (2012, p.105 ff.) argued that her findings reflected the fact that a number of MT segments were of such a quality that they could be accepted without changes, whereas fuzzy matches almost always required changes, also stating that the findings indicated that the quality of the MT output used in the experiment was high.

Federico et al. (2012) were also concerned with the amount of editing performed by translators. To calculate this, the authors used an edit-distance function that compared the match provided by the TM or MT engine with the final segment submitted by the translator, resulting in a number that reflected the similarity between the two. The authors interpreted the resulting so-called “similarity match” as an indication of the quality of the TM and MT matches. By subtracting the similarity match from 100%, the authors obtained the effort involved in editing the match. The results showed that the required effort decreased when translators were not only provided with TM matches, but also with MT matches, with decreases of 54.6% (English-German) and 78.5% (English-Italian) for the legal domain and 55.5% (English-German) and 74.2% (English-Italian) for the information technology domain.

The authors noted that a decrease in effort is a natural consequence of doubling the sources for the matches, but at the same time stated that the extent of the decrease proved the effectiveness of the MT engine used in the test. They did not, however, explain how they measured this.

Teixeira (2014b) was interested in the impact of the presence of metadata on typing effort.

Drawing on his keystroke logging data, Teixeira measured typing effort as the percent ratio between the number of keystrokes performed by the translator while editing a particular segment and the total number of characters in the resulting segment. Teixeira’s study showed that the presence of metadata meant an overall increase in typing effort, i.e. the translators made more changes in the Visual than in the Blind mode. This result is interesting in relation to the difference in presentation mode between the Visual and the Blind task mentioned above (cf. Section 3.4.2.1) which might also intuitively lead to a higher typing effort in the Visual mode because here, the translators needed to actively insert the translation suggestion in the editing area and edit it or they could type the translation from scratch or on top of the source text, whereas in the Blind task, the suggestions had been pre-inserted and were ready to be edited. When related to suggestion types, the study showed that metadata reduced the typing effort for 100%, increased it for 85-99% and 70-84% matches and had no significant effect for MT matches.

3.4.2.3 Quality

Several studies have been concerned with measuring the quality of segments and/or texts translated by means of MT-assisted TM. Quality is an important aspect of translators’

interaction with MT-assisted TM, since, for example, an increase in editing speed is not preferable per se if it is obtained to the detriment of translation quality. In this thesis, I do not measure the quality of translations produced by means of MT-assisted TM directly;

however, the quality aspect is inherent in many of the research questions. For example, RQ1 approaches quality when investigating the extent to which translators accept the matches they are presented with, and RQ1a takes an interactional perspective of quality when it investigates the characteristics of the translators’ interaction with the tool. Translation quality is also relevant in terms of, for example, RQ5, which looks into the amount of editing performed by translators in different matches, since the amount of editing is often taken to be inversely correlated with the quality of the provided matches. Although it does not attempt to measure the quality of the produced translations, RQ6 is also related to quality since it is concerned with the changes implemented during review of the translations.

Therefore, it is relevant to look into what other studies have discovered about quality in relation to MT-assisted TM translation.

Guerberof Arenas (2009) hypothesized that the final quality of the target segments translated using MT was not different from the final quality of the target segments translated using 80-90% TM matches and the segments translated from scratch. This she investigated through a quality evaluation of the target segments. These were checked for errors following the LISA standard. Results showed that errors were present in all

translators’ texts and in both TM and MT matches and in segments translated from scratch.

However, more than half of the total number of errors, 52%, were found in the TM segments, 27% were found in MT segments and 21% were found in segments translated from scratch. Guerberof Arenas suggested that the reason for the high number of errors in TM matches was that these matches flowed more naturally and that translators, therefore, did not question the text’s correctness, whereas errors in MT matches were typically rather obvious. Thus, the hypothesis was not supported since the quality of edited MT matches was notably different than the quality of edited TM matches and also different than, although closer to, the quality of segments translated from scratch. Finally, Guerberof Arenas found that translators’ technical experience did not seem to have an impact on translation quality.

Teixeira (2011) investigated the impact of the presence of provenance information on translation quality. Teixeira hypothesized that there was no significant difference in the quality level when provenance information was available compared to when it was not. In his study, quality was measured as a score given by two reviewers who assessed the quality of the four translations using a quality assessment grid and an error-count system. They were also told to score the translations holistically, i.e. on a scale from 0 to 10. Teixeira’s data showed that the quality of the texts was on a comparable level, but he also noted that quality assessment was probably not done properly.

Skadiņš et al. (2011), in addition to investigating the impact of integrating MT into a TM environment on editing speed, also investigated the impact on translation quality. To

investigate the latter, a professional editor evaluated the quality of each translation

according to the standard internal quality assessment procedure in the localization company in which the experiment was conducted. This resulted in an error score for each translation based on a weighing of errors. When evaluating the translations, the editor did not know which texts had been translated using the baseline or the MT scenario. Results showed that, while edited speed increased, the error score did the same for all translators. However, the authors concluded that, in spite of the increase in error scores, the translations still

remained at an acceptable level of quality. Thus, they concluded that integrating MT into a TM environment could increase productivity within the domain of localization without a critical reduction in quality.

Guerberof Arenas (2012) also addressed the question of quality in translations produced using MT-assisted TM. She hypothesized that the final quality of the edited MT matches was higher than the final quality of the edited 85-94% TM matches and lower than the segments translated from scratch, which would be in line with the findings of her pilot study. She also hypothesized that translators with higher overall editing speeds would make fewer errors than those with lower speeds. Quality was measured by three professional reviewers and according to the LISA QA Model. The first hypothesis was not supported since the number of errors in the no match category was significantly higher than in the TM and MT categories and there were no significant differences between TM and MT matches.10 Thus, the 2012 study contradicted the results of the 2009 pilot study in this regard. The second hypothesis was not supported either as no statistically significant differences were found between fast and slow translators with regard to errors. Thus, Guerberof Arenas concluded, “it is not clear that spending more time on a translation might give better quality results, although this could be the case for certain translators” (Guerberof Arenas 2012, p.245). Finally, based on results from the pilot study, Guerberof Arenas also posed the hypothesis that the

translators’ experience would not have an impact on the quality.11 This hypothesis was not supported since the data showed that translators with more experience made significantly fewer mistakes than those with less experience. Thus, this result contradicted the findings of the pilot study.

Läubli et al. (2013), in addition to comparing translation times in a TM-Only and a Post-Edit condition, also evaluated the quality of the target texts produced in the two conditions by student translators. Two independent experts evaluated all translations as well as a reference translation produced by a professional translator without knowing the origin of the translations or the translation condition. When performing the quality evaluation, the evaluators were asked to score each translation on five parameters (target language expression, target language grammar, target language syntax, semantic accuracy, and translation strategy) on a scale from 1 to 4. Results showed that the quality of the texts translated in the Post-Edit condition was consistent with or, in some cases, better than texts translated in the TM-Only condition. Higher quality of texts produced in the Post-Edit condition than in the TM-Only condition was mostly found in texts containing fully formed

10 These results are also reported in Guerberof Arenas (2014a).

11 The investigation of this hypothesis is also reported in Guerberof Arenas (2014b).

sentences as opposed to texts primarily consisting of bullet points. The authors also wanted to investigate whether the student translators preferred professional translations over translations produced in the study and therefore had them compare the translations of a selected number of segments. The analysis showed that the participating translators could not distinguish their translations produced in the Post-Edit condition from the reference translations, while they considered the reference translations to be better than the translations produced in the TM-Only condition. The findings of Läubli et al. are highly interesting. It is notable that the quality of translations produced by student translators using MT is consistent with or even better than professionally produced translations.

However, as pointed out by the authors themselves, the translations might have been produced under very different conditions; for example, the professional translators might have been under time pressure when producing the translations. On the other hand, the quality of the translations produced by students might also have been negatively influenced by the fact that the translations were not meant to be sold to a client afterwards. Finally, it is not clear what CAT tools the professional translators employed, if any.

Teixeira (2014b), in addition to translation time, was also interested in the effect of the presence of metadata on error scores. He thus had two reviewers assess the translations produced in the experiment and calculated error scores as the number of errors per 100 source words. The analysis showed that the presence of metadata did not affect error scores, although most translators thought they made fewest errors in the Visual task.