Website text extractor
Thread poster: Chopkins
Chopkins
Chopkins  Identity Verified
France
Local time: 20:34
French to English
May 22, 2017

Hello everyone,

I've had a few requests the past couple of months asking that I provide quotes for my services (translation of their websites). Although I've been in contact with a few who have been particularly forthcoming (in providing the source texts), others simply request that I use the text(s) directly from their websites.

I've seen previous threads but they are relatively old (dating back to 2007). Is there any evolution in the past 10 years where I can vacuum/
... See more
Hello everyone,

I've had a few requests the past couple of months asking that I provide quotes for my services (translation of their websites). Although I've been in contact with a few who have been particularly forthcoming (in providing the source texts), others simply request that I use the text(s) directly from their websites.

I've seen previous threads but they are relatively old (dating back to 2007). Is there any evolution in the past 10 years where I can vacuum/extract text directly from the sites so that I can then obtain a more precise word analysis with my CAT tool.

Thanks for any help that you may be able to provide me,

Chopkins
Collapse


 
José Henrique Lamensdorf
José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 16:34
English to Portuguese
+ ...
In memoriam
Try these... May 22, 2017

HTTrack - http://www.httrack.com - freeware - to download entire web sites
CatsCradle - https://www.stormdance.net/software/catscradle/overview.htm - 30-day demo - to count words and translate; it has its own built-in CAT tool


 
Elif Baykara Narbay
Elif Baykara Narbay  Identity Verified
Türkiye
Local time: 22:34
German to Turkish
+ ...
Hi! May 22, 2017

Did you check the more recent post below?

http://deu.proz.com/forum/general_technical_issues/241763-how_to_count_the_number_of_words_on_a_website_suggestions_needed.html?print=1

The last post is dated 15 Feb 2013.


 
Maija Cirule
Maija Cirule  Identity Verified
Latvia
Local time: 21:34
German to English
+ ...
I would recommend May 22, 2017

Chopkins wrote:

Hello everyone,

I've had a few requests the past couple of months asking that I provide quotes for my services (translation of their websites). Although I've been in contact with a few who have been particularly forthcoming (in providing the source texts), others simply request that I use the text(s) directly from their websites.

I've seen previous threads but they are relatively old (dating back to 2007). Is there any evolution in the past 10 years where I can vacuum/extract text directly from the sites so that I can then obtain a more precise word analysis with my CAT tool.

Thanks for any help that you may be able to provide me,

Chopkins


CatsCradle
It is a rather sophisticated program but in case of large volumes it is one of the best aids with embedded CAT tool..

[Edited at 2017-05-22 16:18 GMT]


 
Chopkins
Chopkins  Identity Verified
France
Local time: 20:34
French to English
TOPIC STARTER
Have CatsCradle May 22, 2017

José Henrique Lamensdorf wrote:

HTTrack - http://www.httrack.com - freeware - to download entire web sites
CatsCradle - https://www.stormdance.net/software/catscradle/overview.htm - 30-day demo - to count words and translate; it has its own built-in CAT tool


Hi José,

Thank you very much for you reply.

I remember coming across your recommendation a few months ago and did wind up taking CatsCradle.

I like the program but have issues with subfolders and unnecessary content so I would like to change and just obtain a program which extracts text.

I'm possibly considering HTTrack if another proposition doesn't pop up.

Thank you for your suggestions,

Chopkins


 
Chopkins
Chopkins  Identity Verified
France
Local time: 20:34
French to English
TOPIC STARTER
Thank you!!! May 22, 2017

Elif Baykara wrote:

Did you check the more recent post below?

http://deu.proz.com/forum/general_technical_issues/241763-how_to_count_the_number_of_words_on_a_website_suggestions_needed.html?print=1

The last post is dated 15 Feb 2013.


I think I may go with the last suggestion from the thread. Thank you for digging up this one from the forum archives

Thanks again,

Chopkins


 
Chopkins
Chopkins  Identity Verified
France
Local time: 20:34
French to English
TOPIC STARTER
Curious to know if newer or better programs exists... May 22, 2017

Maija Cirule wrote:

Chopkins wrote:

Hello everyone,

I've had a few requests the past couple of months asking that I provide quotes for my services (translation of their websites). Although I've been in contact with a few who have been particularly forthcoming (in providing the source texts), others simply request that I use the text(s) directly from their websites.

I've seen previous threads but they are relatively old (dating back to 2007). Is there any evolution in the past 10 years where I can vacuum/extract text directly from the sites so that I can then obtain a more precise word analysis with my CAT tool.

Thanks for any help that you may be able to provide me,

Chopkins


CatsCradle
It is a rather sophisticated program but in case of large volumes it is one of the best aids with embedded CAT tool..

[Edited at 2017-05-22 16:18 GMT]


Maija,

I have CatsCradle and was somewhat happy with it on a couple of projects; however, given its age, I wanted to know if anything else is worthwhile.

Thanks again for your suggestion!

-Chopkins


 
neilmac
neilmac
Spain
Local time: 20:34
Spanish to English
+ ...
Charge them extra May 24, 2017

Chopkins wrote:

Hello everyone,

... others simply request that I use the text(s) directly from their websites.


With this type of clients, I just tell them the fee will be roughly twice what it would be for normal text in a normal Word-compatible format. If that doesn't get them rooting about their back office to find a workable copy for you, they must have more money than sense.


 
DZiW (X)
DZiW (X)
Ukraine
English to Russian
+ ...
sure May 24, 2017

Just text and schemes/pictures(yayks!) translation won't do for it requires a proper layout, and they should consider culture-related peculiarities and how to redirect changeable/dynamic content for each language.

If there're just two or three languages, then straightforward sub-domains approach may be ok, but it's a separate copy to handle and process.

Anyway, extra work requires extra payment)


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 15:34
Member (2008)
French to English
+ ...
Get client to extract May 24, 2017

Don't forget that the text in a website is often much more than what's just visible when you view the page. There is internal text such as the content of menus, drop down lists, tool tips, etc. Sometimes there is text hidden in javascript which needs a programmer to extract. And sometimes changing such text can damage the page so that it doesn't work.

I will usually insist that the client get their webmaster to extract the text into a Word document, because it will usually take t
... See more
Don't forget that the text in a website is often much more than what's just visible when you view the page. There is internal text such as the content of menus, drop down lists, tool tips, etc. Sometimes there is text hidden in javascript which needs a programmer to extract. And sometimes changing such text can damage the page so that it doesn't work.

I will usually insist that the client get their webmaster to extract the text into a Word document, because it will usually take the webmaster's expertise to put the translation into the right places.
Collapse


 
Volodymyr Pedchenko
Volodymyr Pedchenko
Local time: 21:34
English to Ukrainian
+ ...
Anycount 3D downloads web-sites and counts words, characters and lines May 24, 2017

Hello Chopkins,

We have released AnyCount 3D on the New Year's Eve. Its main difference from previous versions is the ability to download and count web-sites, that's actually what 3rd dimension in its name stands for.

Add from web and get word count

The web-site copy is stored for your use.

You are welcome to download it at http://www.wordcountsoftware.com/

As it is a novelty for translation world, we are open for suggestions on how to improve it.

Kind regards,
Vladimir.

[Edited at 2017-05-24 19:05 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Website text extractor






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »