Update on TMF data repository

bitstrange · #3191

NomoneyNohoney wrote:I think we should remain collectively as Fools, both as homage to our origins and for continuity with the new site.

Remember though, TMF has a number of registered trademarks, and "Fool" seems to be one of them:
http://www.fool.co.uk/help/legal-information/

RandomAmbler · #4613

modellingman wrote:
Although RandomAmbler's scraping tool (see several posts earlier on this thread) is certainly quick no one is presently sure whether it is up to this rate of activity or indeed whether the server it is running from or the Fool server it is targeting will tolerate such a rate over sustained periods. In any event, as I'm sure its author would acknowledge, it would need quite a bit of refinement to get it to the point where it could scrape to produce output suitable for loading into a DB (either directly or via intermediate text files).

You're not wrong modellingman. I've been spending some time today converting the tool to export posts into a pdf archive, rather than saving them individually, plus adding some useful information for sorting such as the post number on the board. On the whole this works fine but I have noticed the odd post being overlooked when extracting large numbers in one go (1000+) - probably because the TMF site hasn't responded promptly or a missing page response came back. Dealing with these niggling but important issues is the tricky bit as it tends to require much more edge-case code than the happy path.

Anyway if stooz has managed to scrape the entire set of discussion boards then that's really something - as I now know only too well!

Damian

ps FWIW the tool is still here: https://damiancannon.github.io/MotleyFoolDownloader/

The Lemon Fool

Donate to Remove ads

Got a credit card? use our Credit Card & Finance Calculators

Update on TMF data repository

Re: Update on TMF data repository

Re: Update on TMF data repository

Who is online