Wordfast - Getting started, building TM from old translations |
| Användare | Trådens avsändare: twintrad Wordfast - Getting started, building TM from old translations | twintrad Franska till Engelska |
Hello
I work inhouse at a company and now need a CAT tool. I was looking into Wordfast and was wondering if there was a way to build TM from old translations (source/target versions)?
Thanks in advance
Melinda
| | | | Gerard de Noord Frankrike
Medlem (2003) Tyska till Nederländska + ... | | R.M. Susil Premaratne Sri Lanka
Medlem (2007) Singalesiska till Engelska + ... |
I suggest that you visit the website www.wordfast.com and forward your question.
They will definitely give a satisfactory reply/
| | | | twintrad Franska till Engelska | | Wordfast, getting started... | Jul 24 |
Thanks both of you!
Melinda
| | | | FarkasAndras Ungern Engelska till Ungerska + ... | | The best I know of... | Jul 24 |
... is hunalign. I think it is the only aligner that reliably detects things like when a paragraph is missing in one of the texts. (Alignment is the "official" name of what you need to get done here.)
If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) invariably use tools like hunalign and NOT winalign or plustools. I haven't used plustools, it could be reasonably good for all I know. Winalign isn't. But then even if the plustool aligner works well, it will only provide an alignment based on the wordfast segmentation of the documents. If the segmentation doesn't match with great reliability - and it won't - then you'll have an waful lot of correcting to do, because if one segment is off somewhere, everything after that will be out of alignment until you correct it. Hunalign may mis-align segments, but it automatically corrects the error further down the line.
Google hunalign, read the description on the site, and, for preprocessing, use the sentence boundary detector from here: http://www.statmt.org/europarl/v3/tools.tgz
It's command line so it won't do fancy graphics... bu then I prefer fancy performance to fancy graphics.
Basic workflow description: you convert your files to txt, run the europarl tool to chop it into sentences, feed them to hunalign, copy the output to excel, make corrections, delete unnecessary bits, and insert tags to make a standard tmx file (or wordfast translation memory) out of it, copy to notepad, save and use in WF.
All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.
Hunalign performs much better if you feed in a bilingual dictionary/glossary.
All of this requires what I consider fairly basic computer skills and some time investment. Nag me with questions if you need to (read manuals and google first).
If people are interested in the whole procedure I may write up an article about how I did it. Also, if someone has a large amount of material the'd like aligned and no computer skills or time, we may be able to work something out.
[Edited at 2008-07-24 13:11]
| | | | Milan Condak Tjeckien Engelska till Tjeckiska | | I am using hunalign and +Tools/+Align | Jul 24 |
FarkasAndras wrote:
If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis)
Basic workflow description:
you convert your files to txt,
run the europarl tool to chop it into sentences,
feed them to hunalign,
copy the output to excel,
make corrections,
delete unnecessary bits,
and insert tags to make a standard tmx file
(or wordfast translation memory) out of it,
copy to notepad, save and use in WF.
All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.
Hunalign performs much better if you feed in a bilingual dictionary/glossary.
|
|
My workflow description:
I convert/save my files to/as txt,
sometimes Extract them into sentences with Wordfast/Tools/Extract,
feed them to hunalign = I use short editable bat file.
I open the output in MS Word (Excel cell has limited size), convert text to table and I delete 3rd column with index.
I break/split the table to 100-pages files.
I run PlusTools/+Align and open one short file with table created with PlusTools to activate +Align menu, I open file for correction and close short file.
I make corrections (mostly split some segment and delete tildas),
I create Wordfast TM with button Create TM. I merge all created TMs.
-
I tested Hunalign without the bilingual glossary only with "null.dic" on all EU languages in pairs with Czech.
I thank to authors of Hunalign for this free tool.
Milan
[Edited at 2008-07-24 19:09]
| | | | |
| Milan Condak Tjeckien Engelska till Tjeckiska | | Example of using Hunalign and PlusTools/+Align | Aug 2 |
Milan Condak wrote:
I tested Hunalign without the bilingual glossary only "null.dic" with Czech.
Milan |
|
Here is example of aligment EN text + CS (machine translation)
http://www.condak.net/tools/hunalign2/en/00.html
Milan
| | | | |