A Mind @ Play

random thoughts to oil the mind

memoQ and XSLT: Fun with Namespaces

Using memoQ to translate standard XLIFF (XML Localisation Interchange File Format) files can be made that bit more user friendly when you take advantage of the built-in feature to use XSLT transformations. Since I can’t get my head around namespaces, my simple transformation ended up strewn with unreadable references to local-name() nodes. As ever, there is an easier way.

Take a standard XLIFF file along the lines of:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2">
   <file source-language="en" datatype="plaintext" original="Project">
      <header/>
      <body>
         <trans-unit id="string">
            <source>This is the source content.</source>
         </trans-unit>
      </body>
   </file>
</xliff>

We can identify the nodes in the source using standard XPath syntax having defined the namespaces in the header:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xlf="urn:oasis:names:tc:xliff:document:1.2"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   exclude-result-prefixes="xsl xlf xsi"
>
<xsl:output method="html" omit-xml-declaration="yes" />
<xsl:template match="/">
<xsl:text disable-output-escaping='yes'><!DOCTYPE html></xsl:text>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
   <link rel="stylesheet" type="text/css" href="yourStyles.css" />
   <title>XML Preview</title>
</head>
<body>
   <h1><xsl:value-of select="xlf:xliff/xlf:file/@original"/></h1>
   <xsl:for-each select="xlf:xliff/xlf:file/xlf:body/xlf:trans-unit">
      <div class="id"><xsl:value-of select="@id"/></div>
      <div class="content"><xsl:value-of select="xlf:source"/></div><br>   </xsl:for-each>
</body><br></html><br></xsl:template><br></xsl:stylesheet>

This produces a basic, ugly but ultimately workable HTML file which memoQ can display in its preview pane:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
   <head>
      <link rel="stylesheet" type="text/css" href="yourStyles.css"/>
      <title>XML Preview</title>
   </head>
   <body>
      <h1>messages</h1>
      <div class="id">string</div>
      <div class="content">This is the source content.</div>
   </body>
</html>

Obviously you can add references to any other information available in your XLIFF files, and then serve and style up the resulting HTML in any desired shape or form, but this basic scaffolding might help someone out there avoid the namespace minefield I ran into!

[Image courtesy of @emilep]

Ein kleiner Kniff für die Qualitätssicherung bei memoQ

This post is also available in English.

Die Übersetzungssoftware memoQ verfügt über eine nützliche Funktion als teil des QA-Checks, wodurch in der Zielsprache nach verdoppelten Wörtern geprüft wird. In meinem Fall hat es schon mehrmals auf kleine Tippfehler hingewiesen, wo ich versehentlich ein „and and“ oder ein „to to“ geschrieben habe. Dennoch bleibt der Vorteil dieses Checks etwas eingeschränkt, wenn man in seiner Sprache regelmäßig solche Formen hat, die diese Verdoppelung verlangen. Im Deutschen denkt man an Sätze, die die Wörter „die die“ verlangen. Der französische Übersetzer wiederum rauft sich die Haare, als ihm memoQ zum zigsten Mal ein „nous nous“ ankreidet.

Doch mithilfe der relativ neu eingebauten Regex-Funktion kann man dieses Problem tatsächlich beheben. Editiert man seinen Regelsatz für die Qualitätssicherung, kann man den Standardcheck unter dem Konsistenz-Reiter ausschalten, dafür unter dem Regex-Reiter eine neue Regel erstellen, die diesen Check ersetzt aber Rücksicht auf Ausnahmen nimmt. Für Französisch zum Beispiel kann man die folgende Regel als Forbidden regex match in target((Leider sind die Hilfsseiten von Kilgray auf Deutsch nicht aktuell, daher hier die englischen Namen.)) eingeben:

(?i)(?![nv]ous\b)(\b\S+\b)\s+\b\1\b

Wenn aktiviert, prüft diese Regel weiterhin nach doppelten Wörtern in der Zielsprache, einschließlich üblicher Interpunktionszeichen wie bspw. Apostrophen, ignoriert jedoch jeder Fall von „nous nous“ oder „vous vous“. Die Ausnahmen in der Regel vorne kann man dann beliebig erweitern, je nach Bedarf. Die Regel ist bestimmt nicht fehlerlos, aber sie kann die Anzahl der falschen Warnungen enorm verringern, ohne dass man auf diesen Check komplett verzichten muss.

[Foto von Ilya Pavlov auf Unsplash]

memoQ QA Check Tweak

Dieser Eintrag ist auch auf Deutsch verfügbar.

memoQ has a handy little feature as part of its QA check which warns you whenever you double up a word in the target language. I’ve had it catch numerous little and ands and to tos which slip into my work on occasion. However certain combinations of doubled up words are fairly commonplace, which can lead to this feature producing lots of unnecessary false errors. A classic example in English might be two hads in a sentence like ‘I had had enough,’ but that pales in comparison to a language like French, which sees plenty of doubled up words in pronominal verbs (nous nous lavons, vous vous souvenez etc.)

One way to fix this is to make use of the relatively new regex feature built into the QA check. Untick the option to check for duplicate words in the target under the Consistency tab. Then under the Regex tab we can replicate this functionality, while including our own exception to the rule. Add a new rule of the type Forbidden regex match in target, give it a relevant description, and then add this target regex:

(?i)(?![nv]ous\b)(\b\S+\b)\s+\b\1\b

When active, this rule will continue to highlight any duplicate words in the translation, including all the usual punctuation marks, but ignores any occurrences of nous nous or vous vous. Obviously these exceptions at the front can be replaced with whatever is required in the target language. The rule isn’t by any means flawless, and will for example also complain about repeated sequences of numbers, but it can help to reduce the number of false positives without having to abandon the check altogether.

[Photo by Ilya Pavlov on Unsplash]

On Lady Mondegreen’s Eggcorns

Imperfection is part and parcel of how we communicate, and one of the beautiful things about the evolution of language is how little imperfections can create entirely new constructs, as words and phrases are misheard, misunderstood, misinterpreted and misstated. One of my favourite examples in this regard is the ‘mondegreen’, a term normally used to denote a misheard song lyric, although it originated with a line of poetry:

Ye Highlands and ye Lowlands,
Oh, where hae ye been?
They hae slain the Earl o’ Moray,
And Lady Mondegreen.Percy’s Reliques of Ancient English Poetry

The poor victim Lady Mondegreen was in fact Sylvia Wright’s interpretation of hearing the true line: And laid him on the green.

However, aside from causing amusement and consternation, there’s only so much a misheard lyric can contribute to the language. But a word I came across today covers a much broader spectrum for when people mishear words and parse them through their own filters to make sense of the noise: eggcorns. The word itself has a cute origin: when you’re told for the first time that the egg-shaped seed in your hand is an ‘acorn’, thinking you heard ‘eggcorn’ seems a natural enough assumption.

Here’s a great list of some of the more common eggcorns around. It’s particularly interesting when more archaic words end up being given a new lease of life, such as when talking about testing your metal, or transforming the Spanish cucaracha into the more familiar cockroach.

[Image courtesy of Tamara Menzi @ unsplash.com]

Language Weirdness

Dieser Eintrag ist auch auf Deutsch verfügbar.

In the weird and wonderful world of words, which world of words is the weirdest? And if we replace ‘weird’ with ‘hard’, we find one of those eternal questions facing language learners: which language is more difficult?

Page 10 of 67

Powered by WordPress & Theme by Anders Norén