Using rclone to backup data

If you have a Stony Brook email address, this also gives you access to a Google Drive account with virtually unlimited storage (the only restriction is that no single file can exceed 5 TB).  This article will explain how to backup your data on SeaWulf to Google Drive using rclone

Audience: Faculty, Researchers and Staff

This KB Article References: High Performance Computing
This Information is Intended for: Faculty, Researchers, Staff
Last Updated: August 02, 2017

Setting Up Rclone

In order to backup your data onto Google Drive using rclone, first load the following modules:

module load shared
module load rclone/1.36

When using rclone for the first time, you will need to go through a one-time configuration process.  The following steps (modified from the rclone documentation) will guide you through this process.  Note that we are using the configuration name "my_backup" in this guide, but you may choose whatever name you wish.  

In the shell, type the following to bring up the interactive configuration process:

rclone config

Next, go through each step in the setup process, as indicated by the bolded answers to each question below.

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n
name> my_backup
Type of storage to configure.
Choose a number from below, or type in your own value
 1 / Amazon Drive
   \ "amazon cloud drive"
 2 / Amazon S3 (also Dreamhost, Ceph, Minio)
   \ "s3"
 3 / Backblaze B2
   \ "b2"
 4 / Dropbox
   \ "dropbox"
 5 / Encrypt/Decrypt a remote
   \ "crypt"
 6 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
 7 / Google Drive
   \ "drive"
 8 / Hubic
   \ "hubic"
 9 / Local Disk
   \ "local"
10 / Microsoft OneDrive
   \ "onedrive"
11 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
12 / SSH/SFTP Connection
   \ "sftp"
13 / Yandex Disk
   \ "yandex"
Storage> 7
Google Application Client Id - leave blank normally.
client_id> <leave blank and hit enter>
Google Application Client Secret - leave blank normally.
client_secret> <leave blank and hit enter>
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> n

At  this point in the process you will be asked to log into your Google account.  An internet browser should open automatically.  If this does not work, copy and paste the link provided into a browser window, log into your Google account, and click the "Allow" button.  Then, copy the verification code and paste it into the shell prompt.  Note that you may receive an error message stating, "Failed to save new token in config file: section 'my_backup' not found".  This error can be disregarded.  

From here, continue following the interactive process to complete the configuration:

[my_backup]
client_id =
client_secret =
token = {"access_token":"xxxx.x.xxxxx_xxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","RefreshToken":"1/xxxxxxxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxx","token_type":"Bearer","expiry":"2017-07-12T16:46:29.381523567-04:00"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
my_backup            drive

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

Now that the configuration process is complete, you are almost ready to back up your data.  Before you do, however, you should go back to your browser, navigate to Google Drive, and create a folder to store your backed up data.  For the purposes of this guide, we will use a folder called "seawulf_backup".  


Backing Up Data

Next, navigate to the directory on SeaWulf that contains the files and/or folders that you would like to backup.  To copy a single file to your Google Drive, type the following in the shell:

rclone copy ./myfile.txt my_backup:seawulf_backup

To copy a directory and all of its contents to Google Drive, using the following:

rclone copy ./mydir/ my_backup:seawulf_backup/mydir

Note that there are several optional rclone arguments that you can set.  Two important options include:

--transfers=N (default N=4)
--drive-chunk-size=SIZE (default SIZE=8192)

Increasing the values for these settings may increase transfer rates.  

Although the speed at which rclone is able to copy data to Google Drive is dependent on a variety of factors (including settings used, available bandwidth, etc.), our benchmarks suggest that you may see single file transfer speeds around 350-450 megabits per second.  

However, Google limits the number of files that can be simultaneously transferred.  Thus, if you wish to backup a directory with a large number of small files, the transfer rate may be much slower.  Because of this, it may be useful to create a compressed tarball archive file of any directories with a large number of files prior to using rclone.  To do this, type the following in the shell:

tar -zcvf mydir.tar.gz ./mydir

This compressed archive file can then be copied to Google Drive with rclone as before.

Some sample rclone scripts with additional options can also be found in the following SeaWulf directory:

/gpfs/projects/samples/rclone

 

Additional Information


There are no additional resources available for this article.

Getting Help


The Division of Information Technology provides support on all of our services. If you require assistance please submit a support ticket through the IT Service Management system.

Submit A Ticket

Supported By


IACS Support System