FAQ: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(→‎What kind of security environment do you provide?: Fix linux link and add GNU reference)
Line 358: Line 358:
=== What are the security features of the command line interface? ===
=== What are the security features of the command line interface? ===


Access to the command line interface is provided by SSH via [[Cheaha]].  SSH requires you to use your system username and password.  It grants you access to processes and files owned by you. SSH provides programmatic control over your resources.  SSH is most common with users and developers of high-performance computing (HPC).  By default, you are assigned a personal directory (i.e. your home directory) and a scratch directory (temporary, high-speed storage for large files on which you are computing).  These are the only storage locations to which you have write access.  All commands you execute (processes you run) will operate under your user identity and be restricted by file access permissions.
Access to the command line interface is provided by SSH via [[Cheaha]].  SSH requires you to use your system username and password.  It grants you access to processes and files owned by you. SSH provides programmatic control of your files and processes.  SSH is most common with users and developers of high-performance computing (HPC).  By default, you are assigned a personal directory (i.e. your home directory) and a scratch directory (temporary, high-speed storage for large files on which you are computing).  These are the only storage locations to which you have write access.  All commands you execute (processes you run) will operate under your user identity and be restricted by file access permissions.


=== What is the security configuration for the desktop file sharing interface? ===
=== What is the security configuration for the desktop file sharing interface? ===

Revision as of 15:14, 14 June 2012

A FAQ for things you might like to know

Networking Questions

General

What type of networking is used on campus?

The campus network is an Ethernet packet-based network.

What is Ethernet?

Ethernet is a family of packet-based [[wikipedia:computer network]|computer networking]ing technologies for local area and wide area networks (LANs and WANs). Most laptops, desktop computers, server computers, cable modems and DSL modems have a built-in support for Ethernet networks. For more information and history, read the Wikipedia entry on Ethernet.

(Credits Wikipedia:Ethernet April 08, 2011)

What is the recommended configuration for a researcher's network connection?

It depends on the work that you do. If your work frequently involves moving data sets to and from your computer for visualization, analysis, or collaboration, you should seriously consider a 100Mbs full-duplex network connection as your baseline.

What the difference between Mbs and MBs?

"Mbs" stands for "megabits per second". "MBs" stands for "megabytes per second". A lower-case "b" designates bits (1's and 0's) and an upper-case "B" designates bytes. 1 byte equals 8 bits.

Bits are used to measure network data transfer rates in seconds and bytes are used to measure data storage sizes. When stored data is moved across a network, however, it is convenient to consider transfer times measured in the number of bytes of stored data moved in one second.

What do 10Mbs, 100Mbs, and 1Gbs mean?

Network speeds are listed by the number of bits (1's and 0's) they can transfer in one second. Modern networks transfer millions of bits per second, designated "Mbs" and read "mega-bits per second". Common network speeds are 10Mbs, 100Mbs, and 1000Mbs. 1000 megabits are equal to 1 gigabit, and 1000Mbs is typically written "1Gbs" and read "one gigabit per second" (1 billion bits per second).

How fast are 10Mbs, 100Mbs, and 1Gbs networks?

To get a sense for the performance of different network speeds, it's easiest to use the following rules of thumb for comparing network speeds to data set sizes and their transfer time:

  • 10Mbs can transfer 1MBs
  • 100Mbs can transfer 10MBs
  • 1000Mbs (1Gbs) can transfer 100MBs

A CDROM can hold 700MB of data. Transferring this much data would take about 7 seconds on a 1Gbs network, 70 seconds (more than 1 minute) to transfer on a 100Mbs network, and 700 seconds (more than 10 minutes) to transfer on a 10Mbs network.

What's the justification for this transfer rate rule of thumb?

The logic for this metric is that a 10Mbs (10 mega-bit per second) network connection will move 10 million bits per second. Data is measured in 8-bit bytes and the rule of thumb for Ethernet is that performance peaks at 80% capacity. This provides the easy conversion factor of 10Mbs=1MBs. Note that the lower-case "b" means "bits" and upper-case "B" means bytes, ie. 8 bits. The network speeds scale up easily by factors of 10. So 100 megabit per second connection is capable of transferring 10 megabytes per second, and a 1000 megabit per second is capable of transferring 100 megabytes per second.

Theoretically, a 100Mbs connection will transfer 100 million bits in one second, or about 10 megabytes (MB) per second. This means you would be able to transfer a CD's worth of data (about 700MB) in about 70 seconds, about 1 minute. (Compare this to a 10x slower connection of 10Mbs and it would take 700 seconds

Network Structure

How much network bandwidth is available is available on campus?

Individual network connections at 10Mbs, 100Mbs, or 1Gbs speeds can be delivered to any location on the campus network at standard rates. Additionally, wireless network connectivity is available across campus.

What does the campus network look like?

The campus network can be visualized as a collection of network trees, roughly one per building, with the root of each tree connecting to an expandable high bandwidth core network backplane (currently running at 10Gbs).

The depth of each individual tree is determined by the physical layout of and number of network ports in each building. Each tree is typically no more than three layers deep, including the leaf nodes. The leaf nodes are the end-user connections, i.e. wired wall ports or wifi connections. The internal nodes of each tree are network switches and the switches are connected to the next layer via fast connections (currently running at 1Gbs).

Each tree (each building) connects to the core network backplane via a fast connection (currently running at 1Gbs). At this core network connection, the data packets are routed to their final destination on- or off-campus.

How is the campus network connected to off-campus networks?

The campus core network backplane is connected to off-campus networks like the commercial Internet (Google, Facebook, Amazon) and national high bandwidth research networks (Internet2 and NLR) which provide high speed connections to research institutions and labs across the country. The fastest network route to a specific off-campus destination is chosen automatically as the network packets move off-campus.

Custom configurations to meet unique research needs or specific performance targets can be designed. This requires advanced planning and an understanding of the proposed research workloads and workflow. Please contact Research Computing. The cost for these customizations can often be included in research proposals.

Ordering Information

How do I order or upgrade a network connection?

Computer data connections are ordered from UAB IT Telecommunications Services via their service request form.

To place an order you will need to provide a general ledger account number for billing and identify the location (building address) of the service request. The wall-jack identification number for the network connection will be needed to complete the service request and can be entered on the form.

If you have questions please contact UABCOMM@uab.edu or call 4-0503.

Who pays for my network connection?

You do.

Network connections are accounted for via a federally regulated service center run by UAB IT. The rates are set based on the cost to deliver the service. Money to pay for network connectivity can come from any legitimate source: directly through grants, indirect grant funds routed to departments, or other departmental or research support funds.

How much do network connections cost?

Standard service center rates apply to all network connections (10Mbs, 100Mbs, and 1Gbs). Discounted rates for upgrading existing connections to higher data rates are available. Additionally, network switches can be ordered at a fixed lease rate to supply many network connections to an area.

Please contact UAB IT Telecommunications for rates at UABCOMM@uab.edu or call 4-0503.

Network Performance

How do I measure my campus network connection speed?

The UAB IT SpeedTest server speedtest.dpo.uab.edu will run a data transfer test from your computer to the SpeedTest server and rate the performance of a data connection.

How do I measure my network bandwidth from my computer to Cheaha?

You can test the data transfer performance between your computer and Cheaha by using iperf. To run an iperf test you will need to install iperf on your desktop. Iperf is readily available on Windows, Mac, and Linux. It is already installed on Cheaha.

To run a 30 second data transfer test moving data from Cheaha to your computer the iperf test:

  1. Start iperf from a command shell on you desktop in "server" mode
  iperf -s -i 1
  1. Log into your Cheaha account
  2. Start iperf from the command shell on Cheaha in "client" mode
  /opt/iperf/bin/iperf  -c <ip-of-you-computer> -t 30 -i 1

The iperf program output on Cheaha will update a data transfer rate to your computer every second for 30 seconds. If this data rate is not what you expect or confusing, please send an email with the output from both command windows to [email:support@vo.uabgrid.uab.edu]

Note: This test requires that your computer have a public IP address that can be accessed by Cheaha

Note: The iperf server listens on port 5001 by default. Your local desktop (iperf server here) should allow incoming connections at this port from cheaha. Following iptables command will append a rule to open port 5001 for incoming tcp connections from cheaha. Consult your local system/network administrator before adding this rule.

  # sudo /sbin/iptables -A -p tcp -s 164.111.161.10 --dport 5001 -j ACCEPT 

Note: Iperf will currently only test your speed for data transferred from Cheaha to your client. This should provide a reasonable estimate for data transferred to Cheaha as well.

What factors impact the actual speeds I can expect in the real world?

The actual transfer rates you get depend on three factors: software, hardware, and other users.

Data transfer software and computer hardware can significantly impact real world transfer rates. If you are transferring lots of data, you will see your best performance with software that can keep the network full, computer hardware that is not slower than the data network, and a network connection sized for your data sets and patience.

How does my copying software impact my transfer speeds?

The software you use to transfer data is the most import factor in maximizing data throughput. Most traditional copy methods move data in a single-file line. Modern computer hardware hides this software inefficiency and can easily keep a 10Mbs connection full and can do ok with a 100Mbs connection. If you are moving lots of data or using a 1Gbs network, you need to use software tuned for high-speed data transfer.

High speed data transfer software uses multiple single-file lines in parallel to improve network throughput. This software must be used at both ends of the data transfer in order coordinate the parallel transfer streams. You won't get very far if you are smart but your peer is not.

What high-speed data transfer software can keep up?

It's important to use improved data transfer software that can move data . (Post examples)


How does my computer hardware impact my transfer speeds?

Computer hardware also impacts transfer speeds. Your slowest piece of hardware will dictate your maximum data transfer rate. If you have a slow disk (you should read that as "an external USB hard drive"), you will be limited by its data transfer speeds.

Additionally, your computer may be fast but it still has to manage your workload and coordinate use of all the devices in your computer, including the network connection. If you are crunching numbers or doing heavy visualizations at the same time you are trying to transfer data, your computer may not be able to keep up. Note, that this scenario is common when you are reading data for your visualization off a file server. Sometimes you need to move your data before you can use it.

How do I measure my off-campus network connection speed?

The SpeedTest.net service can be used to measure your connection to key points on the Internet. To run this test, choose the Atlanta, GA connection point. This will run a data transfer test from your computer, off-campus to the SpeedTest.net server hosted by Comcast in Atlanta, GA. The test will rate the performance of a data transfer.

Atlanta is a good test destination because this is where UAB's Internet-bound traffic actually connects to the commodity Internet. This test will show the network performance to our nearest off-campus neighbor. If you want to share the results of this test with others, please be sure to click the "Share this Test" and then "Copy" buttons. This will provide you a URL to a PNG image capturing the results of this test that anyone can load in their browser.

What factors impact my off-campus network connection speed?

It is important to understand that Internet traffic speeds are highly variable. Transfer speed depends heavily on the network capacity and use along the entire path from your desktop to the location with which you are exchanging data. It also depends on the capabilities of your desktop and the server that is the target of your data transfer. If the networks or remote sites are overloaded or have insufficient bandwidth, then your data transfer speeds will be limited by those conditions.

As an example, you can try a speed test to a network destination other than Atlanta, GA or a speed test hosted by a network provider other than Comcast. The spead tests from Ookla.net and Speakeasy.net may show different performance for the selected destinations. You may also find the information at SpeedTest.org informative.

Internet Questions

Can I make up a host name for my computer for use on the Internet?

No. The Internet relies on a host name look-up service called DNS (Domain Name System). Host names must be registered in the DNS in order to use them on the Internet.

What is DNS?

DNS is the (Domain Name System). It is address look-up service for the Internet. It is the system that allows all computers to know the correct address for a particular name. The DNS has certain rules to follow for registering a public name. The main rule is that you can only name things in your own domain. For example, you can't register a name like mycomputer.google.com, because only Google has the right to use the google.com domain name.

For a basic introduction to the DNS please see these helpful links:

Can I use an "_" (underscore) in my host name?

No. The DNS system does not support using the "_" in host names.

But can't I just call my host whatever I want?

Yes, you can. But you need to understand that all host naming on the Internet is defined from the perspective of whatever computer you are on at the moment. If you make up your own host name for some computer and record it locally, you can certainly use that host name from your local computer, however, you will be the only person who knows about the name.

In order to let anyone know the name and reach the same computer, you need to register your host name in a public database used by all computers on the Internet. That database is the DNS. It is the only common reference point for name-to-IP mappings on the Internet. In order to register this public name you need to follow the rules for assigning names in the DNS.

Storage Questions

Is there storage space for research data?

The rapidly growing demand for research storage is clearly recognized. Solutions for hosting research data are under active development (and funding discussions) as part of the UABgrid Pilot. Currently, research storage is only available through the traditional compute cluster interface of Cheaha.

How can I contribute to the development of research storage?

The best way to contribute to the development of research storage is to share your storage requirements. It will be helpful if you can share information on the following topics with an email to [email:support@vo.uabgrid.uab.edu]:

  1. How much data do you currently store?
  2. How are you solving your research data problem today?
  3. How much do you expect your data to grow in the next year?
  4. Are you building an analysis pipeline that has known storage expectations?
  5. Do you need to archive your data? How long?
  6. Do you need to keep all your data on-line?
  7. Do you ever delete your data?
  8. How expensive is it for you to recreate derived data products?

How can I use the existing research storage on Cheaha?

The generally available research storage on the cluster is designated to support storage requirements for the construction of data analysis pipelines where data needs to be shared by multiple users on the cluster. To request such storage please send an email to [email:support@vo.uabgrid.uab.edu].

The pilot research storage on the cluster is being developed to support the much broader use case of data sharing in collaborations. If you are interested in participating in this pilot and please send a use case and justification of your project to [email:support@vo.uabgrid.uab.edu].

What best-practices exist for storing my research data?

There are many solutions for storing your research data. Simply keeping it on your desktop is one option. As data grows it is often necessary to move it off your system. Most people find some form of USB Drive to be an acceptable solution. One solution that has become popular is the use of DroboFS.

Note: No endorsements are made of any product of the fitness of any solution.

Cheaha Cluster

How do I get an account to use cluster computing on Cheaha?

Please send an email to the support group requesting an account. Include you UAB BlazerID and some information about which group you are a part of here on campus and what your plans are for using the cluster.

How do I get started using the cluster after I have an account?

A basic getting started guide is available and should answer questions about how to log in to Cheaha and submit a batch job.

How do I cut-and-paste into a terminal window, ctrl+c always exits my commands?

Using a terminal window for an SSH session from your desktop, you can cut-n-paste into that terminal window from your desktop, eg. you may want to copy the example job commands in the getting started guide. The exact key combination varies depending on the terminal program you use but it is often Shift+Ctrl+C. On Mac's, the normal command+c keystroke often works since it doesn't not generate the ctrl+c character sequence.

How can I view HTML files on the cluster without transferring them to my desktop?

If you need to view files that are formatted using HTML, e.g documentation for some tool you are using or HTML formatted output produced by your job, an easy way to view that content is the elinks command. ELinks is a terminal-based web browser that you can use directly from you SSH terminal session. Simply enter the command elinks filename.html and it will display a text-only rendering of the HTML content. ELinks is also a convenient choice for accessing regular web sites, for example elinks http://google.com.

More advanced options for viewing HTML files include starting your SSH session with X-forwarding, eg. ssh -X, and launching Firefox to display on your desktop. Your desktop needs to support X11 and should be on-campus (due to network traffic load) to use this option.

Other options not documented here include launching a VNC session to display Firefox, which will work better for off-campus access, or to use a file system client like SSHFS to mount your home directory on your desktop and then use your desktop web browser to load the HTML files.

Collaboration Tools

How do I edit a wiki page on docs?

Users are encouraged to create original content and improve existing content on the docs wiki. Please see the introduction to docs for more guidance on editing wiki pages.

How do I link to a file on docs with alternate text?

There are two ways to link to a file uploaded to docs and provide alternate text:

  1. Link to the file summary page from which the file can then be downloaded. Alternate text can be provided by prefixing the File namespace with a colon and using the vertical bar to separate the text:
 [[:File:name-of-file.jpg|link text for file]]
  1. Link directly to the file so it is immediately available to the client web browser
 [[Media:name-of-file.jpg|link text for file]]

More information on these methods and other file and image link syntax can be found on the MediaWiki Help page for Images.

Why is the wiki markup syntax different between my project space and the docs wiki?

The "Projects" wikis are implemented using a tool called Trac and follow a formatting convention popularized by earlier wikis mainly MoinMoin. The "Docs" wiki is implemented using a tool called MediaWiki and follows a formatting convention popularized by Wikipedia. Because these communities have focused on addressing specific use cases, software developers in the case of Trac and document writers in the case of Mediawiki, there formatting conventions have differ significantly in their details.

Section heading markup (using '=' to designate section headings) and external urls (typing in a bare URL like http://google.com) are typically portable between the two wikis, but details like table layout vary widely.

An easy option is to leave pages in place and reference them by name from the Projects or Docs wikis.

Should I post XYZ to the list/group/forum?

If you participate in an on-line discussion group and are asking yourself if you should post some sort of content to that group, thank you! Asking this question shows self restraint and consideration for others. These are the core tenants of on-line etiquette, or netiquette. Netiquette is the term used to describe rules of behavior for on-line discourse. The good news is that netiquette rules are pretty much the same as the basic rules of human interaction you learned as a child, so they should be really familiar to you by now. Respect others, and they will respect you.

There is one primary additional consideration to keep in mind when participating in on-line discussions. On-line discussions should generally be considered public because you are communicating with more than one person at a time. This means that whatever you say and do on-line is amplified across all the people who will read your comments. This simple fact provides solid guidance for how to act in a forum and what information to post:

  1. Your post will be seen by many people. Make sure it's relevant to the discussions that are typical of the group to which you are sending it.
  2. Your email will be received by many people. You should think of "email" as a primitive "copy" command. For example, sending your email to 100 people will make 100 copies of the email and all the documents you have attached. Make sure the information you are including in your email or attaching to it is really worth each person having their own copy. There are completely legitimate reasons to share information and email is powerful copy command, however, you should use that power wisely and follow the conventions of the group with whom you are communicating. A simple heads up: most groups will frown on attaching large files to messages sent to a mailing list.

If you want to learn more about netiquette or need more guidance here are some links that might be helpful: Netiquette Book, Mailing List and Newsgroup netiquette, and more Mailing List and Newsgroup netiquette.

Security Questions

What kind of security environment do you provide?

The Research Computing System (RCS) is built on top of the Linux kernel and GNU system platform. Linux is a Unix-like environment. This mean that we provide an environment that builds on top of the file-process abstraction that is inherent in all Unix-like environments. The ownership and permissions of any resource (file, group of files, or processes) can be configured to allow only authorized access to the resource. Linux supports a large collection of security features and others can be added if needed. If you can think it; you can build it.

Each user of the Research Computing System is assigned a unique identity that is used to control access to resources in the system. Your access rights are determined by your affiliations and the interfaces through which you access the system.

What interfaces are available to the system?

The Research Computing System environment can be accessed via the web, a command line interface (SSH), and desktop file shares (CIFS). Access via the Open Science Grid is under development.

What are the security features of the command line interface?

Access to the command line interface is provided by SSH via Cheaha. SSH requires you to use your system username and password. It grants you access to processes and files owned by you. SSH provides programmatic control of your files and processes. SSH is most common with users and developers of high-performance computing (HPC). By default, you are assigned a personal directory (i.e. your home directory) and a scratch directory (temporary, high-speed storage for large files on which you are computing). These are the only storage locations to which you have write access. All commands you execute (processes you run) will operate under your user identity and be restricted by file access permissions.

What is the security configuration for the desktop file sharing interface?

Access to the desktop file sharing interface is provided by CIFS, ie. standard Microsoft Windows file sharing that is available on all computing platforms (Linux, Mac, Windows). Access is restricted to on-campus (or VPN) clients. Access requires using your system username and password. It grants you access to your personal directory (i.e. your home directory). All access is limited to manipulating files that you own in your personal storage. Desktop file sharing helps create a seamless user experience between your desktop and the command-line interface (you have an identical view of your home directory from your desktop and the HPC environment). It also enables you to build storage solutions for your research needs.

What is the security configuration for the web interface?

How do I control my affiliations?

Please send an [to support] describing the security configuration you would like to establish and we will work with you to implement it.

Why can't I manage my own affiliations?

Our goal is to provide a comprehensive, integrated, user-managed affiliation and permissions system. Today you can self-manage your affiliations to the degree supported by the interfaces and tools you use, however, coordination of these settings across tools is not universal.

Misc

UABgrid

UABgrid is an infrastructure pilot of UAB IT Research Computing. More information can be found in the UABgrid FAQ though this information may be dated.