Features available on the hubs#
This document is a concise description of various features we can optionally enable on a given JupyterHub. Explicit instructions on how to do so should be provided in a linked how-to document.
GPUs are heavily used in machine learning workflows, and we support provisioning GPUs for users on all major platforms.
See the associated howto guide for more information on enabling this.
Users of our hubs often need to be granted specific cloud permissions so they can use features of the cloud provider they are on, without having to do a bunch of cloud-provider specific setup themselves. This helps keep code cloud provider agnostic as much as possible, while also improving the security posture of our hubs.
‘Requester Pays’ access#
By default, the organization hosting data on Google Cloud pays for both storage and bandwidth costs of the data. However, Google Cloud also offers a requester pays option, where the bandwidth costs are paid for by the organization requesting the data. This is very commonly used by organizations that provide big datasets on Google Cloud storage, to sustainably share costs of maintaining the data.
Requester Pays is a feature that a bucket can have.
Allow access to external
Requester Payes buckets#
If buckets outside the project have the
Requester Payes flag, then we need to:
hub_cloud_permissions.allow_access_to_external_requester_pays_bucketsin the terraform config of the cluster (see the guide at Enabling specific cloud access permissions)
this will allow them to be charged on their project for access of such outside buckets
When this feature is enabled, users on a hub accessing cloud buckets from
other organizations marked as
Requester Pays will increase our cloud bill.
Hence, this is an opt-in feature.
Requester Pays flag on community buckets#
The buckets that we set for communities, inside their projects can also have this flag enabled on them, which means that other people outside will be charged for their usage.
This is not supported yet by our terraform. Follow 2i2c-org/infrastructure#3746 to check when support will be added.
‘Scratch’ buckets on object storage#
Users often want one or more object storage buckets
to store intermediate results, share big files with other users, or
to store raw data that should be accessible to everyone within the hub.
We can create one more more buckets and provide all users on the hub
equal access to these buckets, allowing users to create objects in them.
A single bucket can also be designated as as scratch bucket, which will
SCRATCH_BUCKET (and a deprecated
PANGEO_SCRATCH) environment variable
of the form
<s3 or gcs>://<bucket-name>/<user-name>. This can be used by individual
users to store objects temporarily for their own use, although there is nothing
preventing other users from accessing these objects!
‘Persistent’ buckets on object storage#
This is exactly the same as scratch bucket storage, but without a rule deleting
contents after a set number of days. This is helpful for storing intermediate computational
results that take a while to compute, and are consistently used throughout the lifetime
of a project. We set the environment variable
PERSISTENT_BUCKET to the form
<s3 or gcs>://<bucket-name>/<user-name> so users can put stuff in this.
Objects put in
PERSISTENT_BUCKET must be deleted by the users when no longer in use
to prevent cost overruns! This can not be managed by 2i2c.