Donate to Remove ads

Got a credit card? use our Credit Card & Finance Calculators

Thanks to eyeball08,Wondergirly,bofh,johnstevens77,Bhoddhisatva, for Donating to support the site

Update on TMF data repository

Formerly "Lemon Fool - Improve the Recipe" repurposed as Room 102 (see above).
bitstrange
Posts: 24
Joined: November 4th, 2016, 12:18 pm
Has thanked: 2 times
Been thanked: 3 times

Re: Update on TMF data repository

#3191

Postby bitstrange » November 10th, 2016, 10:46 am

NomoneyNohoney wrote:I think we should remain collectively as Fools, both as homage to our origins and for continuity with the new site.


Remember though, TMF has a number of registered trademarks, and "Fool" seems to be one of them:
http://www.fool.co.uk/help/legal-information/

RandomAmbler
Posts: 8
Joined: November 4th, 2016, 11:00 am

Re: Update on TMF data repository

#4613

Postby RandomAmbler » November 13th, 2016, 8:41 pm

modellingman wrote:
Although RandomAmbler's scraping tool (see several posts earlier on this thread) is certainly quick no one is presently sure whether it is up to this rate of activity or indeed whether the server it is running from or the Fool server it is targeting will tolerate such a rate over sustained periods. In any event, as I'm sure its author would acknowledge, it would need quite a bit of refinement to get it to the point where it could scrape to produce output suitable for loading into a DB (either directly or via intermediate text files).



You're not wrong modellingman. I've been spending some time today converting the tool to export posts into a pdf archive, rather than saving them individually, plus adding some useful information for sorting such as the post number on the board. On the whole this works fine but I have noticed the odd post being overlooked when extracting large numbers in one go (1000+) - probably because the TMF site hasn't responded promptly or a missing page response came back. Dealing with these niggling but important issues is the tricky bit as it tends to require much more edge-case code than the happy path.

Anyway if stooz has managed to scrape the entire set of discussion boards then that's really something - as I now know only too well!

Damian

ps FWIW the tool is still here: https://damiancannon.github.io/MotleyFoolDownloader/


Return to “Room 102 - Site Issues, Complaints & General Chat”

Who is online

Users browsing this forum: No registered users and 38 guests