Published 2012-01-14 00:00:00

One of the great things about the internet is the availability of cheap or free services online, so many clients are using gmail, dropbox, github etc. for their business operations. But all to often they forget that these services are often playing the oldest game in the technology industry. "Vendor Lock-in".

While the ones I mentioned are not to bad, you can cheaply and easily rescue or backup your data to another location, or move to an alternative provider. Not all of them are like that. 

We are in the middle of a migration project from Netsuite (It's a SAS Oracle based ERP system) to Xtuple, which is a open source ERP system, based around postgresql. This is a slow and painfull migration, as there is no standard for ERP data, and exporting is slow and clumsy over SOAP. Anyway, as a plesant distraction from this large migration, the same client also wanted us to look at migrating from backpack, a 37 signals product.

Backpack, unlike all the SAS systems I mentioned has deliberately made it hard, or practically impossible to migrate from their services. The primary offering of backpack is a online file storage service that you can permit clients or suppliers the ability to do share files and folders. It is only web based (unlike dropbox or box.net), and there is no desktop client that you can use to access the files other than the web interface.

When I started looking at how the company could extract the data, I tried out a few of the classic tools, like wget and httrack however the strong use of javascript, and the convoluted login system with login keys ensured that those kind of tools did not work. The other requirement was the ability to organise the files into folder, by just mirroring the site, you would just end up with thousands of folders called asset/123123/ where the number is probably the UID of the database record.

So how to rescue the data... Read on for the trick..

  Since I had been using seed (the gnome javascript engine using the webkit backend), i had remembered in the seed-example directory there was a sample web browser. I came up with a small plan to write a few hacks to the code which would add a 'mirror' button which would trigger the analysis of the site, and downloading of all the assest (and name them as we needed).

While I did attempt to get this outsourced, it turned out that as the project was over-running horrifically, me spending one day would be far more effecient that waiting for the remote developer to complete it.


The code and process is pretty simple.

You fire up the browser, go to the site you want to download from (eg. backpack), log in, then press the mirror button. What happens then is

* it injects some javascript into the active page (inject.js), this contains all the routines to do the following tasks.
* request the list of links from the page by running a method in inject.js, and returning json to the console (via console.log)
* iterate through the link list, again calling a method from inject.js, to use xmlhttprequest and find out if they are HTML   (in which case they are flag for future parsing), or downloadable.
* If it's downloadable, request the file and return via console.log() a json encoded array of bytes (yes I know it's not amazingly efficient, but it worked...), and save the to the file system.
* then iterate through the links, going through the process again....

It uses a few folders to store the current state (so for example if it crashes or you kill it, as it can eat through memory), It can carry on where it left off.

Using this tool we managed to rescue almost 10Gb of data from backpack..



Comments

Dell Support
This blog shares some good quality information which is very useful for my project work. Thanks for sharing all this precious information with us. I would like to read out some more posts from you.
#0 - Dell Customer Support ( Link) on 2017-11-11 13:47:11 Delete Comment
Dell Support
This blog shares some good quality information which is very useful for my project work. Thanks for sharing all this precious information with us. I would like to read out some more posts from you.
#1 - Dell Customer Support ( Link) on 2017-11-11 13:59:28 Delete Comment
write my essay online

Thank you so much for letting us know about this.
I will be here to check more posts like this.
Contact us for any writing purpose
#2 - Logansver Iline ( Link) on 2018-02-27 13:53:14 Delete Comment
mp3 youtube
If you're an author seeking to drive brand-new profits from day one as well as keep your users engaged as well as on your website http://myglobalclip.com our free widget is for you!
#3 - aarav ( Link) on 2018-03-24 17:50:01 Delete Comment
Just an advice
https://www.google.com will help you for sure :D
#4 - Jacob ( Link) on 2018-03-30 19:04:03 Delete Comment
play sudoku
Best place to play sudoku online free if you want to play free sudoku online so click on our site and play.
#5 - sudoku play online ( Link) on 2018-06-19 12:30:24 Delete Comment

Add Your Comment

Follow us on