Data Transfer
Moving Data
NOTE: On October 1st, 2019 the ncsa#Nearline endpoint will be read-only in preparation for the end of the NSF allocated phase of Blue Waters on December 19, 2019. Users are advised to review their storage needs in light of this change. |
NCSA recommends using Globus Online for data transfer. Currently, there are two endpoints supported: ncsa#BlueWaters and ncsa#Nearline. The ncsa#BlueWaters endpoint transfers data to the Blue Waters Lustre file system and the ncsa#Nearline endpoint accesses the Blue Waters Nearline storage system. For each endpoint, the Globus Online environment will choose the "least busy" server for each transfer to distribute the data movement load so as to move the data as efficiently as possible. The Nearline storage system automatically migrates data to a tape subsystem. Use the ncsa#BlueWaters endpoint to transfer data into and out of your home and project spaces. Use the ncsa#Nearline endpoint to transfer data to/from the Nearline tape system.
Helpful Tips
- Do not use scp/sftp/globus-url-copy on the login nodes to transfer data. Performance will be far below that of Globus Online and the network load will impact other users.
- Use Globus Connect if you need to transfer files with your local office machine.
- Globus Online provides an interactive web interface as well as a command line interface. For more information please refer to the User Documentation or contact help+bw@ncsa.illinois.edu .
- See the 2019 Blue Waters Symposium presentation on Nearline Data Retrieval for a recent discussion on retrieving data from Nearline during the final months of Blue Waters general availability.
Refer to the documentation https://bluewaters-archive.ncsa.illinois.edu/data-transfer-doc for additional information on data transfer.
Sharing Data
The Blue Waters Pilot Data Sharing Service (DSS) is not currently avabile (as of 1/8/2016), The Blue Waters data team is reevaluating the services and assessing the requriments of the user community. Blue Waters partners are still able to share data using Globus Online from their projects share directory. For partners who participated in the pilot project and are currently sharing data, the catalogue listing of your shared data are no longer visible on the portal. The data still resides on the system in its current locations and is accessible via Globus Online sharing.
Contact help+bw@ncsa.illinois.edu for any concerns or questions.
Import/Export (IE) System
The Blue Waters system environment provides 28 dedicated Import/Export (IE) servers to efficiently manage and move data via Globus Online. The Import/Export (IE) servers are provisioned to offload the data movement transactions away from the external login nodes. Data transfer rates through the IE environment have been measured at greater than 100GB/s aggregate for read and write. The IE servers (Dell R720) are configured with dual socket 16 core processors, 192 GB of memory, FDR InfiniBand (IB) card, and a 40 Gigabit Ethernet card dedicated for data transfer. The following schematic shows the relationship of the Blue Waters Lustre filesystem, IE servers and the local and wide area networks (LAN ad WAN respectively).
Nearline (Tape) System
The Nearline system provides access to a mass storage system. The Nearline system is managed by IBM's HPSS software. There are 50 "mover" nodes that perform the main function of moving data into and out of the Nearline system. The mover nodes interact directly with the Globus Online service to share the data transfer load as efficiently as possible. Inside the system, the mover nodes access the tape libraries via 8GB Fibre Channel connections to 366 IBM TS1140 tape drives. The tape drives are spread across four separate dual arm libraries each with more than 15,800 slots. Each tape can hold 4TB of data. The HPSS mover environment includes Dell 720 machines each with 16 processors, 256GB of memory, two 8 Gigabit Fibre Channel (FC) cards, two 40 Gigabit Ethernet cards bonded, 1 FDR IB card. The mover systems talk over a separate management network with two 64 core IBM machines (in an active/passive failover configuration) for the DB2 portion of the HPSS system. The basic relationships between the Nearline system, Blue Waters and the rest of the world are shown below.