Looking for a tool Thread poster: Brandis (X)
| Brandis (X) Local time: 02:22 English to German + ...
Hi all! I am searching for a tool, using which complete website (source) content can be extracted, format is ofcourse .html. Here I have various websites, automobiles, medical, etc., I thought a tool like this would be wonderful, especially to go about pre-planned TMs and develope the target content in course of time. I shall appreciate all help Regards, Brandis | | | Judy Rojas Chile Local time: 20:22 Spanish to English + ... | Brandis (X) Local time: 02:22 English to German + ... TOPIC STARTER I know webreaper | Sep 23, 2004 |
Hi I know this tool already. I am using others, but what I am searching for is a tool for source terminology extract function from multiple webpages pertaining to one topic or product, with a view to build professional TMs.But thank you.A closer description is Trados Tageditor, where one could extract terminology from multiple bi-lingual files, i am in search of something similar, only as a separate tool. brandis
[Edited at 2004-09-23 01:13] | | |
Hello Brandis You might like to try Fusion. It has a terminology feature that I think would suit your needs. Best regards, Luciano Monteiro | |
|
|
Marc P (X) Local time: 02:22 German to English + ... Website retrieval and translation | Sep 23, 2004 |
Here's one way of doing it: First, retrieve the web site with wget. For example, if you want to retrieve the OmegaT web site at www.omegat.org/omegat/omegat.html, you enter: wget http://www.omegat.org/omegat/omegat.html -r -p on the command line. The -r opt... See more Here's one way of doing it: First, retrieve the web site with wget. For example, if you want to retrieve the OmegaT web site at www.omegat.org/omegat/omegat.html, you enter: wget http://www.omegat.org/omegat/omegat.html -r -p on the command line. The -r option causes folders to be saved recursively (i.e. sub-folders will be saved), the -p option causes any files needed for complete display of the pages to be saved. Then you create a new project in OmegaT and place all the files you have downloaded in the /source folder of that project exactly as you downloaded them, i.e. with the same folder structure. (You can of course create the empty project first, then on the command line, switch to the /source folder, and then download the web site into it directly.) When you have finished translating the html files in OmegaT, compiling the project in OmegaT will reproduce the structure with the translated files in the /target folder. Get wget from: http://wget.sunsite.dk/ and OmegaT (latest version 1.4.3 is just out, September 2004) from: http://sourceforge.net/projects/omegat wget and OmegaT both run on both Linux and Windows. Marc ▲ Collapse | | | Brandis (X) Local time: 02:22 English to German + ... TOPIC STARTER
Luciano Monteiro wrote: Hello Brandis You might like to try Fusion. It has a terminology feature that I think would suit your needs. Best regards, Luciano Monteiro But fusion doesn´t cover website localisation aspect directly, one would need further instrumentation to reproduce a target = source website, addtionally fusion limits term extraction to .doc files only. One could certainly convert .html files to .doc files, and process further, but the work involved is not feasible, if one does it industrially. For large docs or multiple documents, fusion in that sense is probably the best there is. Rgds, Brandis | | |
But fusion doesn´t cover website localisation aspect directly, one would need further instrumentation to reproduce a target = source website, addtionally fusion limits term extraction to .doc files only. One could certainly convert .html files to .doc files, and process further, but the work involved is not feasible, if one does it industrially. For large docs or multiple documents, fusion in that sense is probably the best there is. Rgds, Brandis SDLX can do web formats, html, and html like files (this week I was translating chunks of html files {incomplete html code} which its web formats filter accepted happily), and many other formats, including XML and SGML, as well as RC and some programming languages files. It will not download a web site for you but other than that it can handle translation of tagged files pretty well. An until Sept. 30 it is available at half price. For more information go to http://www.sdl.com/intltransday HTH Piotr | | | Brandis (X) Local time: 02:22 English to German + ... TOPIC STARTER
syntaxpb wrote: But fusion doesn´t cover website localisation aspect directly, one would need further instrumentation to reproduce a target = source website, addtionally fusion limits term extraction to .doc files only. One could certainly convert .html files to .doc files, and process further, but the work involved is not feasible, if one does it industrially. For large docs or multiple documents, fusion in that sense is probably the best there is. Rgds, Brandis SDLX can do web formats, html, and html like files (this week I was translating chunks of html files {incomplete html code} which its web formats filter accepted happily), and many other formats, including XML and SGML, as well as RC and some programming languages files. It will not download a web site for you but other than that it can handle translation of tagged files pretty well. An until Sept. 30 it is available at half price. For more information go to http://www.sdl.com/intltransday HTH Piotr I am probably not clear with regard to my posting. I was infact looking for a probable free/shareware tool only for the puspose of extracting one worded webcontent. If you know any,I shall be thankful for all help. Regards, brandis | |
|
|
Terminology lists? | Sep 24, 2004 |
Brandis wrote: Piotr I am probably not clear with regard to my posting. I was infact looking for a probable free/shareware tool only for the puspose of extracting one worded webcontent. If you know any,I shall be thankful for all help. Regards, brandis[/quote] Do you mean web sites that contain terminology lists from different areas? If yes, I don't think there is a universal tool for this specific task, because these lists can be in different formats, e.g. a html table, separate paragraphs and lists (ordered and unordered). Piotr | | | Brandis (X) Local time: 02:22 English to German + ... TOPIC STARTER I do not mean that | Sep 24, 2004 |
syntaxpb wrote: Brandis wrote: Piotr I am probably not clear with regard to my posting. I was infact looking for a probable free/shareware tool only for the puspose of extracting one worded webcontent. If you know any,I shall be thankful for all help. Regards, brandis Do you mean web sites that contain terminology lists from different areas? If yes, I don't think there is a universal tool for this specific task, because these lists can be in different formats, e.g. a html table, separate paragraphs and lists (ordered and unordered). Piotr [/quote]Hi! again a small correction. This could be any website. For example, Metal working websites, here you may find anywhere from 100 to a few thousand, all use some standard terminology in their product presentation or descriptions via web,if one could extract that type of content as to build monolongual glossary initially switch to target webs and compare, one would have a field specific glossary, I guess. It is that kind of a tool I am looking for. Sofar in case of fusion (doesn´t process .html files) we have a wonderful term extraction facility basing on the files fed to fusion, whereas other tools actually require you of doing the translation in order to generate a TM. My search is hence two-fold, term extraction (monolingual) using a functinality as in fusion, but extracting from websites. As in my case my outsourcer either indicates the website or sends me the website for local processing and I start with Trados, as I cannot process these sites directly in Fusion, despite it´s term extraction ability. Sometimes my outsourcer gives me a TM (5 - 10%) of the file prepared and fights over the price. Another point is also, that most of the webcontent is a global publication ( see kudoz , mostly you see webreferences), so the idea is, I guess it is obvious now. Regards, Brandis Regards, Brandis | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Looking for a tool Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |