Science Gateways¶
About Science Gateways¶
A science gateway is a web-based interface to access HPC computers and storage systems. Gateways allow science teams to access data, perform shared computations, and generally interact with NERSC resources over the web. Common gateway goals are
- to improve ease of use in HPC so that more scientists can benefit from NERSC resources
- to create collaborative workspaces around data and computing for science teams that use NERSC
- to make your data accessible and useful to the broader scientific community.
NERSC encourages its users to create their own science gateways by using the resources described on this page. The center engages with science teams interested in using web services, assists with deployment, accepts feedback, and tries to recycle successful approaches into methods that other science teams can benefit from. Below you will find links to current projects and details about the building blocks available to NERSC users. If you would like to participate, or if you have questions, please open a ticket with NERSC Consulting.
Science Gateway Availability and Support¶
Developers of science gateway applications hosted at NERSC should be aware that if their gateways critically depend on NERSC infrastructure then their gateways will inherit availability from NERSC's underlying infrastructure to some degree. Some examples:
- If the Community file system is out of service for multiple days and a science gateway uses scripts, HTML templates, or other web content stored on the Community file system then the site will not work for the same period of time as the Community outage.
- In contrast, applications that only depend on data files stored (e.g. on Community) will not have the functionality the data files make possible. In such cases it is up to maintainers of gateways to inform their users of any degradation in service.
- Applications that submit jobs to one of the supercomputer system queues via the SF API may be unable to support that functionality for the duration of the a system outage; however proper use of the API can mean a web site functions just with decreased functionality. It is up to gateway maintainers to handle graceful failures in such cases.
Science gateway application developers should keep in mind that NERSC's goal is to make sharing scientific data and high-performance computing resources over the web practical, but NERSC is not a web hosting service. Developers and users should not anticipate availability approaching what is available in commercial offerings. The SF API provides one avenue for users who require >99% uptime for a web presence that exposes NERSC resources like data or job submission. This offers a clean separation between the web application and NERSC infrastructure that can be managed by developers.
If a science gateway or website does not clearly depend on NERSC resources or data, we encourage users to pursue other hosting solutions. For example, Google sites is an excellent alternative for users seeking to establish merely a web presence for their project. Such simple websites are not within the scope of science gateways at NERSC, and we do not provide support to users attempting to set them up.
The service level for NERSC science gateway support is formally 8x5 (business hours). Outside of those hours the NERSC Data and Analytics Services staff provide support on a best effort basis.
Gateway Technologies¶
NERSC provides science teams with the building blocks to create their own science gateways and web interfaces into NERSC. Many of these interfaces are built on web and database technologies.
Web Methods for Data¶
Science gateways can be configured to provide public unauthenticated access to data sets and services. The following features are available to projects that wish to enable gateway access to their data through the web. Other features can be made available on request. Direct access to the Community file system and HPSS tape archives are described in the table below.
NERSC encourages its users to create their own science gateways by using the resources described on this page. The center engages with science teams interested in using web services, assists with deployment, accepts feedback, and tries to recycle successful approaches into methods that other science teams can benefit from. Below you will find links to current projects and details about the building blocks available to NERSC users. If you would like to participate, or if you have questions, please open a ticket with NERSC Consulting
NERSC Resource | Path On NERSC Resource | URL on the Web |
---|---|---|
Community | /global/cfs/cdirs/myproj/www | https://portal.nersc.gov/cfs/myproj/ |
DNA file system | /global/dna/projectdirs/myproj/mysubproj/www | https://portal.nersc.gov/dna/myproj/mysubproj/ |
HPSS archive (home) | /home/m/myuser/www | https://portal.nersc.gov/archive/home/m/myuser/www/ |
HPSS archive (project) | /home/projects/myproj/www | https://portal.nersc.gov/archive/projects/myproj/www/ |
Web Methods for Computing¶
Science gateways can use a REST-API (the SF API) to access the NERSC center, including file management, job submission and accounting interfaces. These interfaces allow you to run large or small jobs on NERSC machines through the web. The SF API examples show how to interact with the API in Python. Other programming language and web-toolkit-level building blocks include
- Back-end PHP programming environments upon request. Please contact NERSC Consulting.
- Conduits to PostGRESQL/MySQL/NoSQL Databases.
- Modern Web 2.0 interfaces with AJAX front-ends such as Google maps and visualization kits.
- OpenDAP access to large data sets (netCDF and HDF5)
- Access to NERSC file systems and HPSS through SF API, grid tools, or other custom interfaces
Database Methods¶
Science gateways can also access data from NERSC's science database nodes. These are specially configured nodes which support MySQL, Postgres, and MongoDB for high-performance access. More detail on the science gateway database services is provided on the Databases page. Some examples of database methods used by gateways are
- Access file catalogs and other persistently stored collections from your batch jobs
- Connect a web-based gateway to datasets stored in a database (read and read-write)
- Store, search, and analyze data objects (e.g., job output) through map/reduce-like MongoDB methods
- Expose public read-only data collections through database protocols
For more information on databases for user science data, please submit a question or request via the science database request form.
Science Gateways in Production¶
Science gateways that have moved from development to providing services to broader communities are listed on the Science Gateways index page.
Nagios monitoring and service level checks of gateway functions are available.
Getting Started¶
A Community directory is a good place to host a science gateway. Both Community and HPSS allow users to create a special web directory. You can publish data through a publicly accessible URL by simply making an appropriate subdirectory called "www". The procedure differs slightly depending on which file system you choose, as detailed below.
How to publish your data on NGF to the web:¶
ssh dtn.nersc.gov
In the above example, you can replace dtn with any other NERSC compute platform that has access to Community. Create a www directory in your Community directory:
mkdir /global/cfs/cdirs/yourproject/www
Make sure your Community directory and the www directory are world executable and that the www directory is also world readable. If not, the owner of each of them will need to change its permissions:
chmod 751 /global/cfs/cdirs/yourproject/
chmod 755 /global/cfs/cdirs/yourproject/www
Copy your data to this www directory. Any public data will need to be world readable. Add PHP and HTML files to this directory to build custom gateway interfaces to the data. Any data under /global/cfs/cdirs/yourproject/www
will be publicly accessible through https://portal.nersc.gov/cfs/yourproject/
.
How to publish data in HPSS to the web:¶
You can also publish data in the archive HPSS system directly to a public URL on the web. Note that this is not intended to be a high-performance interface; it is just a quick way to make data publicly available.
Generally we recommend that users share data from the Community File System when creating a science gateway. Sharing data from the HPSS tape archives via a science gateway should only be reserved for infrequent accesses from a data pool that is too large to be practically kept on the Community File System. If you need to serve very large files very frequently via a science gateway, please contact NERSC Consulting for assistance.
Retrieving data from HPSS via a science gateway can be very slow. If files have not been accessed in some time they will have to be retrieved from tape. If you are accessing multiple files, multiple tapes may need to be read and special care will need to be taken to retrieve the data files in the most optimal way. Finally, the number of concurrent connections per IP address is limited to two. All of these factors can combine for long delays in file retrieval from HPSS via a NERSC web portal.
Login to archive via hsi:
hsi -h archive.nersc.gov
Create a www directory
mkdir /home/projects/DIRNAME/www
Make sure the parent directory and the www directory are world executable and that the www directory is also world readable. If not, the owner of each of them will need to change its permissions:
chmod 751 /home/projects/DIRNAME
chmod 755 /home/projects/DIRNAME/www
The data in the www directory will now be available at a URL of the form https://portal.nersc.gov/archive/home/projects/DIRNAME/www/{FILE|DIR} where DIRNAME is the project directory and FILE|DIR is the name of a file.
Files will be downloaded directly, while directories will give you a listing. Note that all files and directories in the path must be world readable.
Here is an example: https://portal.nersc.gov/archive/home/projects/incite11/www/1935
For a home directory in HPSS, the permissions should be as follows (where i is the initial of your home directory's name):
chmod 751 /home/i/HOMEDIRNAME
chmod 755 /home/i/HOMEDIRNAME/www
The data in www should then be available at a URL of the form https://portal.nersc.gov/archive/home/HOMEDIRNAME/www/{FILE|DIR}.
Note
The time to download files from tape may take some time to start as the tape robot finds and mounts the correct tape.
Moving Beyond Simple Gateway Functions¶
If you are building a web gateway to your science at NERSC, please contact us by opening a ticket with NERSC Consulting. We are interested in engaging directly with science teams so that you can build a gateway that meets your specific needs.