Difference between revisions of "Policy"

Line 1: Line 1:
==Prelude==
+
=Introduction  =
==Storage==
+
The GIGA provides all storage spaces and processing computers. Access to each storage space is strictly limited to it’s owning user. 
===Mass Storage===
+
 
===DoX===
+
Upon joining the GIGA, technical staff will also be assigned access to a group storage (share) for working (storing, updating, deleting) on data and files that need to be shared and accessible to everyone in their respective group/platform. 
===Sharing "non-ULg"===
+
 
==Backup==
+
All members in each designated "group share" are responsible for managing their own data according to each group’s established procedures. 
===Backuping===
+
 
===Archiving===
+
If the business need arises, GIGA members have a third option for sharing data and files. 
==Processing==
+
 
===Main cluster===
+
In this instance, a sharing system has been created to allow the users to work in a collaborative way similar to group shares but with “non-university” customers.
===Basement cluster===
+
 
===CECI cluster===
+
Members of each specific share are responsible for managing their data according to the rules defined later in this document.
 +
 
 +
=Purpose  =
 +
The purpose of this policy is to establish and communicate data storage resources at GIGA as well as to define acceptable usage of it's storage. 
 +
 
 +
The IT infrastructure is configured to optimally support the technology requirements of our constituents. 
 +
 
 +
Effective management and use of individual, group and “outside” storages by constituents will enable system administrators to manage GIGA’s computing resources more efficiently and so by extension will allow each member to make the most of what is given to them.
 +
 
 +
=Storage  =
 +
''Simple representation of our storage structure here''
 +
 
 +
==Personal computer  ==
 +
The GIGA works, in some cases, with other services. 
 +
 
 +
One of them being 'UDIMED' which is a decentralized computer unit of the ULg (acting as relay of the SeGI) within the Faculty of Medicine. 
 +
 
 +
She manages and maintains the computer park for the Faculty. 
 +
 
 +
Whenever one arrives, he must ask his PI for a computer OR using his own.
 +
 
 +
In that special case, the personnal computer must be configured. Those modifications are :
 +
* Defining credentials
 +
* Mounting the shared storage
 +
* Installing some non-scientific tools. Eq DoX, antivirus, office softwares
 +
* Installing some scientific tools
 +
* Setting up our printers
 +
 
 +
 
 +
All that coming with different rules/conditions/restrictions, it might be recommanded to avoid that solution and prefear to ask for a computer from GIGA itself.
 +
 
 +
==Mass Storage ==
 +
The GIGA provide a complex and huge infrastructure which include a big (''yet not infinite'') storage as well as it's equivalent for backuping relevent data, sharing and also group working spaces. Our community of users being quite huge too, everyone must be aware of rules and good practices but in return can expect from GIGA to get a proper environment.
 +
 
 +
This whole infrastructure being "Linux-based", it requires certain things :
 +
* A strict naming sense for both directories and files. That meaning : no spaces, uses underscore \'' instead but also english format, without accents.
 +
* The '''backup''', '''nobackup''' and '''archive''' directories told off further in this document can be created anywhere in the directories structure but is asked from the users to be careful (''that? or force them to ask IT service first ?''). ''[[#TBA|See example]]''
 +
* Each user must be aware of it's private space but also his group's quotas. Despite being huge, that much users generating numerous and big datasets everyone must avoid to duplicate their files. To do so are provided spaces where all input data will be stored and backuped. The users can, for the time of their work on those, temporary create a copy in their private space but ultimately they are asked to delete those when the work is done and it's results moved to their respective shared group spaces. ''[[#TBA_2|See example]]''
 +
 
 +
===User home  ===
 +
Each person working at GIGA is given a personal space of '''100Gb''' on our storage infrastructure. They are by default not backuped.
 +
 
 +
In that directory are already stored a few sub-directories :
 +
* One called '''backup''' which is, like the name implies, backuped
 +
* The others are generated on base of your assignment to projects, where you will be able to work with your project associates and PI
 +
 
 +
===Group home  ===
 +
Everyone working in teams on one or multiple projects, are also present shared spaces of '''2Tb''' for each of our platforms (called '''PTF''') and research groups (called '''URT'''). 
 +
The quotas, if justified, can be expanded. They are by default backuped every night.
 +
 
 +
Those group spaces are containing all the groups' projects, which are all created the same way. They have, like the user homes, sub-directories :
 +
* One called '''nobackup''' which is, like the name implies, not backuped
 +
* If you are working for an URT you also get two others called '''Share''IMG''' and '''Share''GEN'''. They are special shared spaces where the data ordered to the PTF teams is given to you. '''IMG''' being imaging and '''GEN''' genomics sequencing
 +
 
 +
==DoX  ==
 +
Aside all scientific materials, are also the more "office ones"
 +
UDIMED provides to each GIGA member a space called '''DoX''' of '''100Gb'''. That space is a cloud storage which is, like it's alternatives (''Eq. Google Drive, Dropbox''), allows you to synchronize your data through your electronic devices. That including : Windows, Mac, Linux computers but also iPhone, iPad and Android devices. It's content is by default backuped every night.
 +
 
 +
This system provides :
 +
* The versioning of it's content
 +
* An internal editor
 +
* A way to share with non-ULiege users (''meaning without credentials'')
 +
 
 +
==Sharing outside GIGA  ==
 +
''2nd DoX ? (TBD, in testing phase)''
 +
 
 +
==GitLab  ==
 +
Working on scientific datasets, people might need to use and eventually write programmation code. 
 +
To that purpose is provided a specialized environment called '''GitLab''' (''Need to talk about spaces provided''). It is a framework that allows collaborative work on programming code and/or non-formated text (''That do not includes Microsoft/Libre/Open Offices documents'') but also use a powerful versioning system. It's content is by default backuped every night.
 +
 
 +
This system provides :
 +
* The versioning of it's content on an atomic level (''file by file, through to any single character'')
 +
* An '''easy access''' through web interface
 +
* The ability to work in '''structured teams''' with granular credentials control (''Up to the PI's and/or their own computing specialist(s)'')
 +
* The ability to create 'projects' called '''repositories''' entirely privates to your groups but also publics to, as an example, store annex to published papers
 +
* A more robust, powerful yet harder to handle projects through Linux terminals. ''Presentations and workshops can be given on demand''
 +
 
 +
 
 +
In order to have a clean environment, which would be good for both users and administrators, users are asked to :
 +
* Define groups and projects naming in a readable way. ''[[#TBA_3|See example]]''
 +
 
 +
=Backup =
 +
The hosted data, GIGA members' work, being so important are given solutions as much secure as they are flexible.
 +
 
 +
Each of those having their conditions of use it is of the highest importance for the users to have a strict concern of it's rules. 
 +
 
 +
If not, might result skiped backups, corruption of them or many other unwanted things.
 +
 
 +
==Tapes  ==
 +
The '''Tapes library''' is a robot with '''4 drives''' working on actually '''620 tapes'''. 
 +
 
 +
It has a retention policy of 28 days for average use of the spaces but which can shift to a shorter period if their activity on filsystems is high. 
 +
 
 +
Because this process has quite a heavy load, it is made to work during the night with a window of work between 11pm and 07am.
 +
 
 +
Being able to handle 2Tb~2,5Tb during that time, it is asked to the users to :
 +
* Avoid daily changes for an amount over 2Tb
 +
* Ask GIGA's IT before moving over 1Tb
 +
* Create/use as much as he can the '''nobackup''' directories in order to prevent fast/long in time/massive works on filesets
 +
 
 +
==Offline Archiving  ==
 +
The archiving is a more '''long term''' solution yet less flexible. It's pros being the reduction of space and the possibility to keep it's content listed in a text file.
 +
 
 +
This is made for our members who would have huge datasets but knowing they won't need it for a few years : is asked from them to ask for an expiration date, which could be even 10 years or more if justified when they enter their request.
 +
 
 +
Like the '''nobackup''' directories users can create '''archive''' ones containing the datasets, then contact GIGA's IT to provide him your needs. (''Default delay when files are in that dir, modification after request's reception ?'')
 +
 
 +
=Data Processing =  
 +
Are also provided environments called '''clusters''' for high throughput data processing (calculation). Those coming each with their pros and cons.
 +
 
 +
What they share is that users :
 +
* Willing to use them need a certain level of knowledge about Linux terminal navigation, SSH (secured connections). ''Presentations and workshops can be given on demand''. ''[[#TBA_4|See example]]''
 +
* Can not, never, process on the main frame so need as well the basics of jobs submitting to a cluster (through a software called '''Slurm'''). ''[[#TBA_5|See example]]''
 +
* Can have access in a transparent way to their private and shared spaces
 +
 
 +
==Primary cluster  ==
 +
Hosted at SeGI, this cluster is our main one. 
 +
 
 +
Each of it's nodes are bought by PI's from specific platforms/research group. 
 +
 
 +
Therefore, the users might or not have access to it though the system is made in a collaborative way meaning anyone can contribute to one or more and access to the shared cluster. 
 +
 
 +
It is made as well for public as for production uses. Users' private and shared storage spaces are linked to it and so seamlessly.
 +
 
 +
''[[#TBA_6|See example]]''
 +
 
 +
===Scratch===
 +
Within the cluster are special spaces, working as temporary ones for the users' manipulated data (while executing their workflows).
 +
 
 +
Those coming in two different forms :
 +
* Each node has a directory '''/local'''. They are relatively smalls but extremely fast because they are on the processing machine. The retainment policy is strict about the users purging themself from their datasets at the end of each run
 +
* The second one, which is called '''/gallia''', is bigger, and still extremely fast because directly connected to the scientific backbone. This one needs from the users' to enter a request to GIGA's IT to get credentials. The retainment policy give them 15 days before an automatic purge
 +
 
 +
''[[#TBA_7|See example]]''
 +
 
 +
==Secondary cluster  ==
 +
Hosted at GIGA, this cluster works like the primary but requires a separate demand of credentials to allow it's use with users' private and shared storage spaces.
 +
 
 +
To do so, users must make a request to GIGA's IT and provide him their ULiege username and the platform/group which they are related to. 
 +
 
 +
''[[#TBA_8|See example]]''
 +
 
 +
==Inter Belgian French-universities cluster ==
 +
The ''Consortium des Équipements de Calcul Intensif'', or '''CÉCI''' as a means of it's own to share high-performance scientific computing facilities and resources, and mutualize know-how and expertise among the universities' clusters. Concerning their system and software infrastructures, GIGA's and theirs are alike in order to facilitate as much as possible the users.
 +
 
 +
It allows GIGA members to use it's infrastructure given some conditions. Those being :
 +
* The user must make a request of credential on CÉCI website
 +
* CÉCI purpose is solely for public use, no production is allowed on it
 +
* Except for one of their clusters, '''NIC4''' (''ULiege's''), GIGA users won't have direct access to their private and group storages. Hence they will have to take care by themself of their data flow from our storage to CÉCI's.
 +
 
 +
''[[#TBA_9|See example]]''
 +
 
 +
=Use cases=
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA
 +
 
 +
==TBA==
 +
TBA

Revision as of 15:37, 16 March 2018

Introduction

The GIGA provides all storage spaces and processing computers. Access to each storage space is strictly limited to it’s owning user.

Upon joining the GIGA, technical staff will also be assigned access to a group storage (share) for working (storing, updating, deleting) on data and files that need to be shared and accessible to everyone in their respective group/platform.

All members in each designated "group share" are responsible for managing their own data according to each group’s established procedures.

If the business need arises, GIGA members have a third option for sharing data and files.

In this instance, a sharing system has been created to allow the users to work in a collaborative way similar to group shares but with “non-university” customers.

Members of each specific share are responsible for managing their data according to the rules defined later in this document.

Purpose

The purpose of this policy is to establish and communicate data storage resources at GIGA as well as to define acceptable usage of it's storage.

The IT infrastructure is configured to optimally support the technology requirements of our constituents.

Effective management and use of individual, group and “outside” storages by constituents will enable system administrators to manage GIGA’s computing resources more efficiently and so by extension will allow each member to make the most of what is given to them.

Storage

Simple representation of our storage structure here

Personal computer

The GIGA works, in some cases, with other services.

One of them being 'UDIMED' which is a decentralized computer unit of the ULg (acting as relay of the SeGI) within the Faculty of Medicine.

She manages and maintains the computer park for the Faculty.

Whenever one arrives, he must ask his PI for a computer OR using his own.

In that special case, the personnal computer must be configured. Those modifications are :

  • Defining credentials
  • Mounting the shared storage
  • Installing some non-scientific tools. Eq DoX, antivirus, office softwares
  • Installing some scientific tools
  • Setting up our printers


All that coming with different rules/conditions/restrictions, it might be recommanded to avoid that solution and prefear to ask for a computer from GIGA itself.

Mass Storage

The GIGA provide a complex and huge infrastructure which include a big (yet not infinite) storage as well as it's equivalent for backuping relevent data, sharing and also group working spaces. Our community of users being quite huge too, everyone must be aware of rules and good practices but in return can expect from GIGA to get a proper environment.

This whole infrastructure being "Linux-based", it requires certain things :

  • A strict naming sense for both directories and files. That meaning : no spaces, uses underscore \ instead but also english format, without accents.
  • The backup, nobackup and archive directories told off further in this document can be created anywhere in the directories structure but is asked from the users to be careful (that? or force them to ask IT service first ?). See example
  • Each user must be aware of it's private space but also his group's quotas. Despite being huge, that much users generating numerous and big datasets everyone must avoid to duplicate their files. To do so are provided spaces where all input data will be stored and backuped. The users can, for the time of their work on those, temporary create a copy in their private space but ultimately they are asked to delete those when the work is done and it's results moved to their respective shared group spaces. See example

User home

Each person working at GIGA is given a personal space of 100Gb on our storage infrastructure. They are by default not backuped.

In that directory are already stored a few sub-directories :

  • One called backup which is, like the name implies, backuped
  • The others are generated on base of your assignment to projects, where you will be able to work with your project associates and PI

Group home

Everyone working in teams on one or multiple projects, are also present shared spaces of 2Tb for each of our platforms (called PTF) and research groups (called URT). The quotas, if justified, can be expanded. They are by default backuped every night.

Those group spaces are containing all the groups' projects, which are all created the same way. They have, like the user homes, sub-directories :

  • One called nobackup which is, like the name implies, not backuped
  • If you are working for an URT you also get two others called ShareIMG and ShareGEN. They are special shared spaces where the data ordered to the PTF teams is given to you. IMG being imaging and GEN genomics sequencing

DoX

Aside all scientific materials, are also the more "office ones". UDIMED provides to each GIGA member a space called DoX of 100Gb. That space is a cloud storage which is, like it's alternatives (Eq. Google Drive, Dropbox), allows you to synchronize your data through your electronic devices. That including : Windows, Mac, Linux computers but also iPhone, iPad and Android devices. It's content is by default backuped every night.

This system provides :

  • The versioning of it's content
  • An internal editor
  • A way to share with non-ULiege users (meaning without credentials)

Sharing outside GIGA

2nd DoX ? (TBD, in testing phase)

GitLab

Working on scientific datasets, people might need to use and eventually write programmation code. To that purpose is provided a specialized environment called GitLab (Need to talk about spaces provided). It is a framework that allows collaborative work on programming code and/or non-formated text (That do not includes Microsoft/Libre/Open Offices documents) but also use a powerful versioning system. It's content is by default backuped every night.

This system provides :

  • The versioning of it's content on an atomic level (file by file, through to any single character)
  • An easy access through web interface
  • The ability to work in structured teams with granular credentials control (Up to the PI's and/or their own computing specialist(s))
  • The ability to create 'projects' called repositories entirely privates to your groups but also publics to, as an example, store annex to published papers
  • A more robust, powerful yet harder to handle projects through Linux terminals. Presentations and workshops can be given on demand


In order to have a clean environment, which would be good for both users and administrators, users are asked to :

  • Define groups and projects naming in a readable way. See example

Backup

The hosted data, GIGA members' work, being so important are given solutions as much secure as they are flexible.

Each of those having their conditions of use it is of the highest importance for the users to have a strict concern of it's rules.

If not, might result skiped backups, corruption of them or many other unwanted things.

Tapes

The Tapes library is a robot with 4 drives working on actually 620 tapes.

It has a retention policy of 28 days for average use of the spaces but which can shift to a shorter period if their activity on filsystems is high.

Because this process has quite a heavy load, it is made to work during the night with a window of work between 11pm and 07am.

Being able to handle 2Tb~2,5Tb during that time, it is asked to the users to :

  • Avoid daily changes for an amount over 2Tb
  • Ask GIGA's IT before moving over 1Tb
  • Create/use as much as he can the nobackup directories in order to prevent fast/long in time/massive works on filesets

Offline Archiving

The archiving is a more long term solution yet less flexible. It's pros being the reduction of space and the possibility to keep it's content listed in a text file.

This is made for our members who would have huge datasets but knowing they won't need it for a few years : is asked from them to ask for an expiration date, which could be even 10 years or more if justified when they enter their request.

Like the nobackup directories users can create archive ones containing the datasets, then contact GIGA's IT to provide him your needs. (Default delay when files are in that dir, modification after request's reception ?)

Data Processing

Are also provided environments called clusters for high throughput data processing (calculation). Those coming each with their pros and cons.

What they share is that users :

  • Willing to use them need a certain level of knowledge about Linux terminal navigation, SSH (secured connections). Presentations and workshops can be given on demand. See example
  • Can not, never, process on the main frame so need as well the basics of jobs submitting to a cluster (through a software called Slurm). See example
  • Can have access in a transparent way to their private and shared spaces

Primary cluster

Hosted at SeGI, this cluster is our main one.

Each of it's nodes are bought by PI's from specific platforms/research group.

Therefore, the users might or not have access to it though the system is made in a collaborative way meaning anyone can contribute to one or more and access to the shared cluster.

It is made as well for public as for production uses. Users' private and shared storage spaces are linked to it and so seamlessly.

See example

Scratch

Within the cluster are special spaces, working as temporary ones for the users' manipulated data (while executing their workflows).

Those coming in two different forms :

  • Each node has a directory /local. They are relatively smalls but extremely fast because they are on the processing machine. The retainment policy is strict about the users purging themself from their datasets at the end of each run
  • The second one, which is called /gallia, is bigger, and still extremely fast because directly connected to the scientific backbone. This one needs from the users' to enter a request to GIGA's IT to get credentials. The retainment policy give them 15 days before an automatic purge

See example

Secondary cluster

Hosted at GIGA, this cluster works like the primary but requires a separate demand of credentials to allow it's use with users' private and shared storage spaces.

To do so, users must make a request to GIGA's IT and provide him their ULiege username and the platform/group which they are related to.

See example

Inter Belgian French-universities cluster

The Consortium des Équipements de Calcul Intensif, or CÉCI as a means of it's own to share high-performance scientific computing facilities and resources, and mutualize know-how and expertise among the universities' clusters. Concerning their system and software infrastructures, GIGA's and theirs are alike in order to facilitate as much as possible the users.

It allows GIGA members to use it's infrastructure given some conditions. Those being :

  • The user must make a request of credential on CÉCI website
  • CÉCI purpose is solely for public use, no production is allowed on it
  • Except for one of their clusters, NIC4 (ULiege's), GIGA users won't have direct access to their private and group storages. Hence they will have to take care by themself of their data flow from our storage to CÉCI's.

See example

Use cases

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA