Box.com bulk transfer: Difference between revisions

From Cheaha
Jump to navigation Jump to search
Line 72: Line 72:
'''Fix timeframe''': "We've heard requests that people be able to rearrange their views before, and this is being considered as part of a larger product experience change next year" [box.com]
'''Fix timeframe''': "We've heard requests that people be able to rearrange their views before, and this is being considered as part of a larger product experience change next year" [box.com]


== lftp mirror -R example ==
== lftp mirror -R examples (UPload) ==


lftp mirror
lftp mirror
Line 103: Line 103:
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-  
|-  
| cat > box_test.lftp << EOF<br />
| cat > box_upload.lftp << EOF<br />
open ftp.box.com<br />
open ftp.box.com<br />
user ''BLAZERID''@uab.edu ''users_BOX_external_password''<br />
user ''BLAZERID''@uab.edu ''users_BOX_external_password''<br />
mirror --parallel=10 -R ''local_src_dir'' /''box_dest_dir''<br />
mirror --parallel=10 -R ''local_src_dir'' /''box_dest_dir''<br />
EOF<br />
EOF<br />
chmod 700 box_test.lftp<br />
chmod 700 box_upload.lftp<br />
lftp -f box_test.lftp ; echo lftp_RC=$?
lftp -f box_upload.lftp ; echo lftp_RC=$?
|}
 
== lftp mirror examples (DOWNload) ==
 
=== scripted lftp ===
 
Arguments
* --loop keep restarting until there are no new files left to download - helps if someone else is uploading to that directory while you're downloading it!
* -v verbose level 1: includes bytes transferred and transfer speed.
* --parallel=10 use 10 concurrent TCP/IP connections (much faster)
{| class="wikitable" border="1"
|-
| cat > box_download.lftp << EOF<br />
open ftp.box.com<br />
user ''BLAZERID''@uab.edu ''users_BOX_external_password''<br />
mirror --loop -v --parallel=10  /''box_remote_src_dir'' ''local_dest_dir''<br />
EOF<br />
chmod 700 box_download.lftp<br />
lftp -f box_download.lftp ; echo lftp_RC=$?
|}
|}

Revision as of 16:47, 15 January 2016

UAB has an Enterprise contract with Box.com, which is currently in BETA.

This page describes what we have learned about doing bulk-transfers of data.

Warning: size limitations

Box.com claims to have a 5G max filesize limit.

There was a rumor this would be increased in early 2016. Offically, this is all we know: https://community.box.com/t5/Managing-Your-Content/What-s-the-maximum-file-size-I-can-upload/ta-p/307

If you need to work around this, you can use the Linux "split" utility

# chop file into 4G pieces

split \
--bytes=4000m \
big_file.fastq.gz \
big_file.fastq.gz.split4g.

# record checksums of original and chunks

md5sum \
big_file.fastq.gz \
big_file.fastq.gz.split4g.* \
> big_file.fastq.gz.md5

Warning: time stamps

When using an FTP client to transfer data up, it is easy to lose both modification and creation timestamps. In particular, many clients will (optionally) preserve modification time, but few will (optionally) preserve creation date.

FTP client platform modification creation
SmartFTP GUI/Win Only/$$ yes can be enabled
lftp cmd_line/linux/free yes no
fileZilla GUI/linux+win/free can be enabled no
ftp_ssl cmd_line/linux/free yes no

Filezilla on create times

Warning: Shared-to-you folders can't be moved

If someone creates a folder and shares it to you (as an reader, editor, co-owner, etc), it will live in your top level directory, and you will NOT be able to move it into any subfolder.

"Currently, users can't rearrange their own view of folders they are invited to collaborate within. As you note, when someone invites you to collaborate in a folder that you have never had access to before, you will see that folder on your root level." [box.com]

Workaround: if your collaborator makes you the full owner of the folder, then you will be able to move it.

Fix timeframe: "We've heard requests that people be able to rearrange their views before, and this is being considered as part of a larger product experience change next year" [box.com]

lftp mirror -R examples (UPload)

lftp mirror

  • "mirror" copies directory hierarchies DOWN from box.com to local
  • "mirror -R" copies directory hierarchies UP from local to box.com

error handling

  • the box server frequently looses connection (fails) on particular files
  • just re-run the "mirror -R" and it will upload only new/failed files..

Interactive lftp

lftp ftp.box.com

> user BLAZERID@uab.edu users_BOX_external_password
> mirror --parallel=10 -R local_src_dir box_dest_dir

Single-line lftp (non-shared box)

Warning: This one makes the password visible to "ps" so should only be used on personal machines

lftp -u BLAZERID@uab.edu,users_BOX_external_password ftp.box.com << EOF

mirror --parallel=10 -R local_src_dir box_dest_dir
EOF

scripted lftp

cat > box_upload.lftp << EOF

open ftp.box.com
user BLAZERID@uab.edu users_BOX_external_password
mirror --parallel=10 -R local_src_dir /box_dest_dir
EOF
chmod 700 box_upload.lftp
lftp -f box_upload.lftp ; echo lftp_RC=$?

lftp mirror examples (DOWNload)

scripted lftp

Arguments

  • --loop keep restarting until there are no new files left to download - helps if someone else is uploading to that directory while you're downloading it!
  • -v verbose level 1: includes bytes transferred and transfer speed.
  • --parallel=10 use 10 concurrent TCP/IP connections (much faster)
cat > box_download.lftp << EOF

open ftp.box.com
user BLAZERID@uab.edu users_BOX_external_password
mirror --loop -v --parallel=10 /box_remote_src_dir local_dest_dir
EOF
chmod 700 box_download.lftp
lftp -f box_download.lftp ; echo lftp_RC=$?