Box.com bulk transfer

From UABgrid Documentation
(Difference between revisions)
Jump to: navigation, search
(start list of FTP programs to push data to box.com)
 
m (Rclone)
 
(30 intermediate revisions by 4 users not shown)
Line 5: Line 5:
 
== Warning: size limitations ==
 
== Warning: size limitations ==
  
Box.com claims to have a 5G max filesize limit, which will be removed in 2016.  
+
Box.com claims to have a '''''5G max filesize''''' limit
 +
* Now '''''15G max filesize''''' (2/18/016; private email), but "that file size limit is still considered to be in a beta phase"
 +
 
 +
 
 +
There was a rumor this would be increased in early 2016.  
 +
Offically, this is all we know: https://community.box.com/t5/Managing-Your-Content/What-s-the-maximum-file-size-I-can-upload/ta-p/307
 +
 
 
If you need to work around this, you can use the Linux "[http://ss64.com/bash/split.html split]" utility
 
If you need to work around this, you can use the Linux "[http://ss64.com/bash/split.html split]" utility
  
<nowiki>:Indented line
+
{| class="wikitable"
split --bytes=4000m big_file.fastq.gz big_file.fastq.gz.split4g.
+
|-
</nowiki>
+
| # chop file into 4G pieces <br />
 +
split \<br />
 +
--bytes=4000m \<br />
 +
big_file.fastq.gz \<br />
 +
big_file.fastq.gz.split4g.<br />
 +
|-
 +
| # record checksums of original and chunks<br />
 +
md5sum \<br />
 +
big_file.fastq.gz \<br />
 +
big_file.fastq.gz.split4g.* \<br />
 +
> big_file.fastq.gz.md5
 +
|}
  
 
== Warning: time stamps ==
 
== Warning: time stamps ==
Line 19: Line 36:
 
|-
 
|-
 
! FTP client
 
! FTP client
 +
! cost
 
! platform
 
! platform
! modification
+
! preserve mod_date
! creation
+
! preserve create_date
|-
+
| SmartFTP
+
| GUI/Win Only/$$
+
| yes
+
| can be enabled
+
 
|-
 
|-
 
| lftp
 
| lftp
| cmd_line/linux/free
+
| free
 +
| linux/cmd_line
 
| yes
 
| yes
 
| no
 
| no
 +
|-
 +
| SmartFTP
 +
| $$
 +
| Win Only/GUI
 +
| yes
 +
| can be enabled
 
|-
 
|-
 
| fileZilla
 
| fileZilla
| GUI/linux+win/free
+
| free
 +
| linux & win/GUI_only
 
| can be enabled
 
| can be enabled
| no
+
| [https://trac.filezilla-project.org/ticket/2347 no]
 
|-
 
|-
 
| ftp_ssl
 
| ftp_ssl
| cmd_line/linux/free
+
| free
 +
| linux/cmd_line
 
| yes
 
| yes
 
| no
 
| no
 
|}
 
|}
 +
 +
Filezilla on create times
 +
* request closed, no plans to fix: https://trac.filezilla-project.org/ticket/2347
 +
 +
== Warning: Shared-to-you folders can't be moved ==
 +
 +
If someone creates a folder and shares it to you (as an reader, editor, co-owner, etc), it will live in your top level directory, and you will NOT be able to move it into any subfolder.
 +
 +
"Currently, users can't rearrange their own view of folders they are invited to collaborate within. As you note, when someone invites you to collaborate in a folder that you have never had access to before, you will see that folder on your root level." [box.com]
 +
 +
'''Workaround''': if your collaborator makes you the full owner of the folder, then you will be able to move it.
 +
 +
'''Fix timeframe''': "We've heard requests that people be able to rearrange their views before, and this is being considered as part of a larger product experience change next year" [box.com]
 +
 +
== Rclone ==
 +
 +
Check out our [https://youtu.be/UbFJV9TO4KE YouTube video: Setting up rclone for Box.com file transfer to/from Cheaha ]
 +
 +
===== rclone config =====
 +
Data Transfer Cheaha to BOX. In the terminal (inside the VNC session), load the module rclone
 +
module load rclone/1.48.0
 +
The initial setup for Box involves getting a token from Box rclone config walks you through it.
 +
Here is an example of how to make a remote called remote. First run:
 +
  rclone config
 +
This will guide you through an interactive setup process:
 +
No remotes found - make a new one
 +
n) New remote
 +
s) Set configuration password
 +
q) Quit config
 +
n/s/q> n
 +
name> remote
 +
Type of storage to configure.
 +
Choose a number from below, or type in your own value
 +
[snip]
 +
XX / Box
 +
    \ "box"
 +
[snip]
 +
Storage> box
 +
Box App Client Id - leave blank normally.
 +
client_id>
 +
Box App Client Secret - leave blank normally.
 +
client_secret>
 +
Edit advanced config? (y/n)
 +
y) Yes
 +
n) No
 +
y/n> n
 +
Remote config
 +
Use auto config?
 +
  * Say Y if not sure
 +
  * Say N if you are working on a remote or headless machine
 +
y) Yes
 +
n) No
 +
y/n> y
 +
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
 +
Log in and authorize rclone for access
 +
Waiting for code...
 +
Got code
 +
--------------------
 +
[remote]
 +
client_id =
 +
client_secret =
 +
token = {"access_token":"XXX","token_type":"bearer","refresh_token":"XXX","expiry":"XXX"}
 +
--------------------
 +
y) Yes this is OK
 +
e) Edit this remote
 +
d) Delete this remote
 +
y/e/d> y
 +
 +
===== rclone ls =====
 +
List the objects in the path with size and path.
 +
rclone ls remote:path [flags]
 +
  -h, --help  help for ls
 +
 +
Note that ls and lsl recurse by default - use “--max-depth 1” to stop the recursion.
 +
There are several options related list commands
 +
*ls to list size and path of objects only
 +
*lsl to list modification time, size and path of objects only
 +
*lsd to list directories only
 +
 +
The commands do not recurse by default - use “-R” to make them recurse.
 +
Listing a nonexistent directory will produce an error.
 +
 +
===== rclone mkdir =====
 +
Make the path if it doesn’t already exist.
 +
rclone mkdir remote:path [flags]
 +
  -h, --help  help for mkdir
 +
 +
===== rclone copy =====
 +
Copy files from source to dest, skipping already copied.
 +
Note: Use the -P/--progress flag to view real-time transfer statistics
 +
rclone copy source:sourcepath dest:destpath
 +
 +
===== rclone copy directory =====
 +
Pending issue : https://github.com/rclone/rclone/issues/1228
 +
Work around solution
 +
rclone copy source:sourcepath "dest:DIR"
 +
  DIR when passed as a variable creates a remote directory
 +
 +
===== rclone delete =====
 +
Remove the contents of path.
 +
rclone delete only deletes objects but leaves the directory structure alone. If you want to delete a directory and all of its contents use rclone purge
 +
rclone delete remote:path [flags]
 +
  -h, --help  help for delete
 +
 +
===== rclone purge =====
 +
Remove the path and all of its contents.
 +
rclone purge remote:path [flags]
 +
  -h, --help  help for purge
 +
 +
== lftp mirror -R examples (UPload) ==
 +
 +
lftp mirror
 +
* "mirror" copies directory hierarchies DOWN from box.com to local
 +
* "mirror -R" copies directory hierarchies UP from local to box.com
 +
 +
error handling
 +
* the box server frequently looses connection (fails) on particular files
 +
* just re-run the "mirror -R" and it will upload only new/failed files..
 +
 +
=== Interactive lftp ===
 +
{| class="wikitable" border="1"
 +
|-
 +
| lftp ftp.box.com<br />
 +
> user ''BLAZERID''@uab.edu ''users_BOX_external_password''<br />
 +
> mirror --parallel=10 -R ''local_src_dir'' ''box_dest_dir''
 +
|}
 +
 +
=== Single-line lftp (non-shared box) ===
 +
 +
'''Warning''': This one makes the password visible to "ps" so should only be used on personal machines
 +
{| class="wikitable" border="1"
 +
|-
 +
| lftp -u ''BLAZERID''@uab.edu,''users_BOX_external_password'' ftp.box.com << EOF<br />
 +
mirror --parallel=10 -R ''local_src_dir'' ''box_dest_dir''<br />
 +
EOF
 +
|}
 +
 +
=== scripted lftp ===
 +
{| class="wikitable" border="1"
 +
|-
 +
| cat > box_upload.lftp << EOF<br />
 +
open ftp.box.com<br />
 +
user ''BLAZERID''@uab.edu ''users_BOX_external_password''<br />
 +
mirror --parallel=10 -R ''local_src_dir'' /''box_dest_dir''<br />
 +
EOF<br />
 +
chmod 700 box_upload.lftp<br />
 +
lftp -f box_upload.lftp ; echo lftp_RC=$?
 +
|}
 +
 +
=== scripted lftp - externalize password ===
 +
 +
file ~/.netrc contains your box external password once (works for wget, lftp, etc)
 +
{| class="wikitable" border="1"
 +
|-
 +
| cat >> ~/.netrc << EOF<br />
 +
machine ftp.box.com<br />
 +
login    ''BLAZERID''@uab.edu<br />
 +
password  user_Box_External_PW  <br />
 +
<br />
 +
EOF<br />
 +
<br />
 +
chmod 700 ~/.netrc<br />
 +
|}
 +
 +
then for each transfer, you create a local .lftp file w/o a password.
 +
Much more secure and easy to keep up to date.
 +
{| class="wikitable" border="1"
 +
|-
 +
| cat > box_upload.lftp << EOF<br />
 +
open ftp.box.com<br />
 +
mirror -R ''local_src_dir'' /''box_dest_dir''<br />
 +
EOF<br />
 +
<br />
 +
lftp -f box_upload.lftp ; echo lftp_RC=$?
 +
|}
 +
 +
== lftp mirror examples (DOWNload) ==
 +
 +
=== scripted lftp ===
 +
 +
Arguments
 +
* --loop keep restarting until there are no new files left to download - helps if someone else is uploading to that directory while you're downloading it!
 +
* -v verbose level 1: includes bytes transferred and transfer speed.
 +
* --parallel=10 use 10 concurrent TCP/IP connections (much faster)
 +
{| class="wikitable" border="1"
 +
|-
 +
| cat > box_download.lftp << EOF<br />
 +
open ftp.box.com<br />
 +
user ''BLAZERID''@uab.edu ''users_BOX_external_password''<br />
 +
mirror --loop -v --parallel=10  /''box_remote_src_dir'' ''local_dest_dir''<br />
 +
EOF<br />
 +
chmod 700 box_download.lftp<br />
 +
lftp -f box_download.lftp ; echo lftp_RC=$?
 +
|}
 +
 +
== Issues to resolve ==
 +
 +
* Routing over Iternet2
 +
  * we see our traffic randomly going over the commodity internet
 +
 +
== Linux Support ==
 +
Unfortunately, Box doesn't provide a Linux client (is it on the road map?).

Latest revision as of 15:22, 15 October 2019

UAB has an Enterprise contract with Box.com, which is currently in BETA.

This page describes what we have learned about doing bulk-transfers of data.

Contents

[edit] Warning: size limitations

Box.com claims to have a 5G max filesize limit

  • Now 15G max filesize (2/18/016; private email), but "that file size limit is still considered to be in a beta phase"


There was a rumor this would be increased in early 2016. Offically, this is all we know: https://community.box.com/t5/Managing-Your-Content/What-s-the-maximum-file-size-I-can-upload/ta-p/307

If you need to work around this, you can use the Linux "split" utility

# chop file into 4G pieces

split \
--bytes=4000m \
big_file.fastq.gz \
big_file.fastq.gz.split4g.

# record checksums of original and chunks

md5sum \
big_file.fastq.gz \
big_file.fastq.gz.split4g.* \
> big_file.fastq.gz.md5

[edit] Warning: time stamps

When using an FTP client to transfer data up, it is easy to lose both modification and creation timestamps. In particular, many clients will (optionally) preserve modification time, but few will (optionally) preserve creation date.

FTP client cost platform preserve mod_date preserve create_date
lftp free linux/cmd_line yes no
SmartFTP $$ Win Only/GUI yes can be enabled
fileZilla free linux & win/GUI_only can be enabled no
ftp_ssl free linux/cmd_line yes no

Filezilla on create times

[edit] Warning: Shared-to-you folders can't be moved

If someone creates a folder and shares it to you (as an reader, editor, co-owner, etc), it will live in your top level directory, and you will NOT be able to move it into any subfolder.

"Currently, users can't rearrange their own view of folders they are invited to collaborate within. As you note, when someone invites you to collaborate in a folder that you have never had access to before, you will see that folder on your root level." [box.com]

Workaround: if your collaborator makes you the full owner of the folder, then you will be able to move it.

Fix timeframe: "We've heard requests that people be able to rearrange their views before, and this is being considered as part of a larger product experience change next year" [box.com]

[edit] Rclone

Check out our YouTube video: Setting up rclone for Box.com file transfer to/from Cheaha

[edit] rclone config

Data Transfer Cheaha to BOX. In the terminal (inside the VNC session), load the module rclone

module load rclone/1.48.0

The initial setup for Box involves getting a token from Box rclone config walks you through it. Here is an example of how to make a remote called remote. First run:

 rclone config

This will guide you through an interactive setup process:

No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
Type of storage to configure.
Choose a number from below, or type in your own value
[snip]
XX / Box
   \ "box"
[snip]
Storage> box
Box App Client Id - leave blank normally.
client_id> 
Box App Client Secret - leave blank normally.
client_secret> 
Edit advanced config? (y/n)
y) Yes
n) No
y/n> n
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes
n) No
y/n> y
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code
--------------------
[remote]
client_id = 
client_secret = 
token = {"access_token":"XXX","token_type":"bearer","refresh_token":"XXX","expiry":"XXX"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
[edit] rclone ls

List the objects in the path with size and path.

rclone ls remote:path [flags]
 -h, --help   help for ls

Note that ls and lsl recurse by default - use “--max-depth 1” to stop the recursion. There are several options related list commands

  • ls to list size and path of objects only
  • lsl to list modification time, size and path of objects only
  • lsd to list directories only

The commands do not recurse by default - use “-R” to make them recurse. Listing a nonexistent directory will produce an error.

[edit] rclone mkdir

Make the path if it doesn’t already exist.

rclone mkdir remote:path [flags]
  -h, --help   help for mkdir
[edit] rclone copy

Copy files from source to dest, skipping already copied. Note: Use the -P/--progress flag to view real-time transfer statistics

rclone copy source:sourcepath dest:destpath
[edit] rclone copy directory

Pending issue : https://github.com/rclone/rclone/issues/1228 Work around solution

rclone copy source:sourcepath "dest:DIR" 
 DIR when passed as a variable creates a remote directory
[edit] rclone delete

Remove the contents of path. rclone delete only deletes objects but leaves the directory structure alone. If you want to delete a directory and all of its contents use rclone purge

rclone delete remote:path [flags]
  -h, --help   help for delete
[edit] rclone purge

Remove the path and all of its contents.

rclone purge remote:path [flags]
  -h, --help   help for purge

[edit] lftp mirror -R examples (UPload)

lftp mirror

  • "mirror" copies directory hierarchies DOWN from box.com to local
  • "mirror -R" copies directory hierarchies UP from local to box.com

error handling

  • the box server frequently looses connection (fails) on particular files
  • just re-run the "mirror -R" and it will upload only new/failed files..

[edit] Interactive lftp

lftp ftp.box.com

> user BLAZERID@uab.edu users_BOX_external_password
> mirror --parallel=10 -R local_src_dir box_dest_dir

[edit] Single-line lftp (non-shared box)

Warning: This one makes the password visible to "ps" so should only be used on personal machines

lftp -u BLAZERID@uab.edu,users_BOX_external_password ftp.box.com << EOF

mirror --parallel=10 -R local_src_dir box_dest_dir
EOF

[edit] scripted lftp

cat > box_upload.lftp << EOF

open ftp.box.com
user BLAZERID@uab.edu users_BOX_external_password
mirror --parallel=10 -R local_src_dir /box_dest_dir
EOF
chmod 700 box_upload.lftp
lftp -f box_upload.lftp ; echo lftp_RC=$?

[edit] scripted lftp - externalize password

file ~/.netrc contains your box external password once (works for wget, lftp, etc)

cat >> ~/.netrc << EOF

machine ftp.box.com
login BLAZERID@uab.edu
password user_Box_External_PW

EOF

chmod 700 ~/.netrc

then for each transfer, you create a local .lftp file w/o a password. Much more secure and easy to keep up to date.

cat > box_upload.lftp << EOF

open ftp.box.com
mirror -R local_src_dir /box_dest_dir
EOF

lftp -f box_upload.lftp ; echo lftp_RC=$?

[edit] lftp mirror examples (DOWNload)

[edit] scripted lftp

Arguments

  • --loop keep restarting until there are no new files left to download - helps if someone else is uploading to that directory while you're downloading it!
  • -v verbose level 1: includes bytes transferred and transfer speed.
  • --parallel=10 use 10 concurrent TCP/IP connections (much faster)
cat > box_download.lftp << EOF

open ftp.box.com
user BLAZERID@uab.edu users_BOX_external_password
mirror --loop -v --parallel=10 /box_remote_src_dir local_dest_dir
EOF
chmod 700 box_download.lftp
lftp -f box_download.lftp ; echo lftp_RC=$?

[edit] Issues to resolve

  • Routing over Iternet2
 * we see our traffic randomly going over the commodity internet

[edit] Linux Support

Unfortunately, Box doesn't provide a Linux client (is it on the road map?).

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox