Page 1 of 1
Wayback Machine
Posted: Sat Aug 02, 2014 8:19 pm
by Deltharien
So I have a question for the nerds. I found a cached copy of vgtact.com on the wayback machine and most of the pages are intact. If I save them manually (save link as...) then I get a working copy of the page, minus the images. Vgtact has loads of sub-pages images, so I tried to mirror the site using wget, which failed, and then httrack, which also failed, just not as miserably. There's a freaky redirect to
https://archive.org/.../index.php/Main_Page that gets triggered by my attempts - meaning as the links are crawled, I just end up with folders full of copies of 'Main_Page'.
Any advice on how to accomplish my end goal? Or has someone already done this? Thanks.
Re: Wayback Machine
Posted: Sat Aug 02, 2014 11:18 pm
by John Adams
HTTrack has an option on the Spiders tab (I think) about following robot rules. You can choose to ignore those, which may be rejecting the link searches. I never had a lot of luck HTTracking wayback, so the VGOPlayers pieces you see were literally me saving as every page, then pointing directly to each image element (web slices) on there or vanguardthegame.com til I got what I needed. Exhausting. Hope I don't need more, since vanguardthegame is also now gone
Re: Wayback Machine
Posted: Sun Aug 03, 2014 4:33 am
by falloutdc
Same httrack failed on vanguardthegame as it uses css files to create references to images and httrack does not modify these, had to download literally hundreds of images by hand..
The forums it downloaded fine however (some css files are missing i think but these are on the waybackarchive)
Back to the poster as you did not gave a link i did a search and the third result was this
https://web.archive.org/web/20140803113 ... ive.65538/
So send a tell to zewtastic if he still has the database
If not it should be possible to create rulesets to stop crawling certain links
Re: Wayback Machine
Posted: Sun Aug 03, 2014 7:38 am
by Deltharien
Thanks for the replies. I'd love to see the data from both vgtact and vanguardcrafters resurrected. The newer sites were a valiant effort but still missing so much.