Since I had been using seed
directory there was a sample web browser. I came up with a small plan to write a few hacks to the code which would add a 'mirror' button which would trigger the analysis of the site, and downloading of all the assest (and name them as we needed).
While I did attempt to get this outsourced, it turned out that as the project was over-running horrifically, me spending one day would be far more effecient that waiting for the remote developer to complete it.
The code and process is pretty simple.
You fire up the browser, go to the site you want to download from (eg. backpack), log in, then press the mirror button. What happens then is
* request the list of links from the page by running a method in inject.js, and returning json to the console (via console.log)
* iterate through the link list, again calling a method from inject.js, to use xmlhttprequest and find out if they are HTML (in which case they are flag for future parsing), or downloadable.
* If it's downloadable, request the file and return via console.log() a json encoded array of bytes (yes I know it's not amazingly efficient, but it worked...), and save the to the file system.
* then iterate through the links, going through the process again....
It uses a few folders to store the current state (so for example if it crashes or you kill it, as it can eat through memory), It can carry on where it left off.
Using this tool we managed to rescue almost 10Gb of data from backpack..