Wayback Machine
Moderator: Community Managers
-
- Posts: 52
- Joined: Sun May 18, 2014 12:19 pm
Wayback Machine
So I have a question for the nerds. I found a cached copy of vgtact.com on the wayback machine and most of the pages are intact. If I save them manually (save link as...) then I get a working copy of the page, minus the images. Vgtact has loads of sub-pages images, so I tried to mirror the site using wget, which failed, and then httrack, which also failed, just not as miserably. There's a freaky redirect to https://archive.org/.../index.php/Main_Page that gets triggered by my attempts - meaning as the links are crawled, I just end up with folders full of copies of 'Main_Page'.
Any advice on how to accomplish my end goal? Or has someone already done this? Thanks.
Any advice on how to accomplish my end goal? Or has someone already done this? Thanks.
- John Adams
- Retired
- Posts: 4581
- Joined: Wed Aug 28, 2013 9:40 am
- Location: Phoenix, AZ.
- Contact:
Re: Wayback Machine
HTTrack has an option on the Spiders tab (I think) about following robot rules. You can choose to ignore those, which may be rejecting the link searches. I never had a lot of luck HTTracking wayback, so the VGOPlayers pieces you see were literally me saving as every page, then pointing directly to each image element (web slices) on there or vanguardthegame.com til I got what I needed. Exhausting. Hope I don't need more, since vanguardthegame is also now gone
Re: Wayback Machine
Same httrack failed on vanguardthegame as it uses css files to create references to images and httrack does not modify these, had to download literally hundreds of images by hand..
The forums it downloaded fine however (some css files are missing i think but these are on the waybackarchive)
Back to the poster as you did not gave a link i did a search and the third result was this
https://web.archive.org/web/20140803113 ... ive.65538/
So send a tell to zewtastic if he still has the database
If not it should be possible to create rulesets to stop crawling certain links
The forums it downloaded fine however (some css files are missing i think but these are on the waybackarchive)
Back to the poster as you did not gave a link i did a search and the third result was this
https://web.archive.org/web/20140803113 ... ive.65538/
So send a tell to zewtastic if he still has the database
If not it should be possible to create rulesets to stop crawling certain links
-
- Posts: 52
- Joined: Sun May 18, 2014 12:19 pm
Re: Wayback Machine
Thanks for the replies. I'd love to see the data from both vgtact and vanguardcrafters resurrected. The newer sites were a valiant effort but still missing so much.