Eureka Installation: Storage Server - calab-ntu/gpu-cluster GitHub Wiki
-
apt-get install nfs-kernel-server nfs-common
-
vi /etc/exports
-
rpcbind start #GUI: enable NFS
- Login to DSM
-
Control Panel
>File Services
>SMB / AFP / NFS
- Activate
Enable NFS
Reference: https://kb.synology.com/zh-tw/DSM/help/DSM/AdminCenter/file_winmacnfs_nfs?version=7
- Restart
/etc/init.d/nfs-kernel-server restart
- Check
showmount -e <NFS IP>
- Mount
or edit
mount -t nfs 192.168.0.253:/volume1/gpucluster3 /projectY/
/etc/fstab
"tumaz:/home /home nfs defaults 0 0"
- Mount all devices
mount -a
-
Create a space on eater for existing user on
eureka
- Register target user info in
/etc/passwd
and/etc/shadow
ontumaz
ssh gamer04
ssh admincalab@eater
sudo -i
cd /volume1/gpucluster3
mkdir <user_name>
chown uid:gid user_name
where uid and gid are recoded in /etc/passwd on spartan
- Register target user info in
-
Expand storage volume
open DSM
storage manager
storage pool
Action
add drive
- Drag HDDs on left side to right.
- Click
next
- Click
apply
-
Add new storage volume
- Open DSM
storage manager
volume
create
- Select RAID 6
- Maximize
Modify allocated size
by clickMax
- Choose
Btrfs
instead ofext4
-
Apply
This will take a few days to set up new storage volume, and will create
/volume?
automatically. -
Control Panel
>Shared Folder
>Create
>Create
General : name: gpucluster?
Advanced: choose
Enable data checksum for advanced data integrity
Enable file compression
@server: chmod 755 gpucluster?/
-
@server: vi /etc/exports
append below line to the last line of
/etc/exports
/volume?/gpucluster? 192.168.0.0/24(rw,async,no_wdelay,no_root_squash,insecure_locks,sec=sys)
@server: exportfs -arv
- Check the new added volume is released to clients:
showmount -e
- @eureka00: mkdir /projectW
- @eureka00: chmod 755 /project?
- @eureka00: vi /etc/fstab
append below line to the last line of "/etc/fstab"
eater:/volume?/gpucluster? /project? nfs auto,bg,hard,intr 0 0
repeat the above step on all computing nodes--> Do NOT directly copy the file
eureka00:/etc/fstab
to all computing nodes - Check NFS client can see the newest volume:
@eureka00: showmount -e eater @eureka02: showmount -e eater
- Mount the newest volume on all computing nodes:
pdsh -w eureka[00-33] mount -a
-
change_from_ext4_volume_to_btrfs_volume
https://www.synology.com/en-us/knowledgebase/DSM/tutorial/Storage/How_to_change_from_ext4_volume_to_btrfs_volume -
Synchronize with an NTP server
-
Control Panel
→System
→Regional Options
→Time
- Choose
Synchronize with an NTP server
and pickpool.ntp.org
→Update Now
→Apply
-
-
Set up network link aggregation
-
Control pannel
>Network
>Network Interface
>Create
>Create Bond
- Follow the steps of create with default settings, choose all (4) network interface to create bond.
- Check and test: Write four 2G files to NAS from 4 individual nodes at the same time.
for i in 01 02 03 04; do ssh eureka$i dd if=/dev/zero of=[target file name] bs=2G count=1 & done
-
If both signal light power
and alert
constantly flash ...
-
Turn NFS off
-
Pull all the hard disks halfway open, and then turn NFS on again to see if the power signal constantly flash
-> Do not pull out when the computer is on, Otherwise, the data will be lost
-> If you can't shut down after pressing for 10 seconds, unplug the power directly -
If
power
andalert
keeps flashing, which means that the motherboard has a problem. -
If drive reconnection errors happen frequently, run extended S.M.A.R.T. test:
- Open DSM
- Open
storage manager
- Click
HDD/SSD
on left column - Choose one HDD to be tested.
- Click
Health Info
- Click
S.M.A.R.T.
tab - Click
Extended test
andStart
-
If the warranty is expired, send it to 虹谷資訊 for Maintenance
https://www.hongku.com.tw/
台北市大同區重慶北路一段1號5樓If warranty is not expired yet, send it to the store we originally bought it.
--> Label the indices for every diks before drawing out, like below. (very important!!!)----------------------------------- | [power light] | ----------------------------------- | 1 | 2 | ----------------------------------- | 3 | 4 | ----------------------------------- | 5 | 6 | ----------------------------------- | 7 | 8 | ----------------------------------- | 9 | 10 | ----------------------------------- | 11 | 12 | -----------------------------------
After the repair is completed:
- Insert the hard drive back in the original order (very important!!!)
- Boot
- Finish (eureka will mount automatically)
OS system is spread in all disks in master machine. Disks can only be replaced one at a time.
- Back up data in the disk volume.
- Delete target directory.
控制台
>共用資料夾
>刪除
- Delete volume.
-
ironman
:儲存空間管理員
>儲存空間
>刪除
-
eater
:儲存空間管理員
>儲存空間
>刪除
-
- [Optional] Delete or Modify disk group.
-
ironman
:儲存空間管理員
>磁碟群組
>刪除
-
eater
:儲存空間管理員
>儲存集區
>刪除
-
- [Optional] Rebuild disk group.
-
ironman
:儲存空間管理員
>磁碟群組
>新增
-
eater
:儲存空間管理員
>儲存集區
>新增
Chooseraid6
-
- Rebuild volume.
-
ironman
:儲存空間管理員
>儲存空間
>新增
Choose file system asBtrfs
-
eater
:儲存空間管理員
>儲存空間
>新增
Choose file system asBtrfs
-
- After build up new disk volume. Directly draw out one disk and replace it.
- Repair the disk array.
-
ironman
:儲存空間管理員
>磁碟群組
>管理
>修復
-
eater
:儲存空間管理員
>儲存集區
>動作
>修復
Repeat step 7 and 8 until all disks are replaced.
-
- Back up data in the disk volume.
- Delete target directory.
控制台
>共用資料夾
>刪除
- Delete volume.
-
ironman
:儲存空間管理員
>儲存空間
>刪除
-
eater
:儲存空間管理員
>儲存空間
>刪除
-
- Delete or Modify disk group.
-
ironman
:儲存空間管理員
>磁碟群組
>刪除
-
eater
:儲存空間管理員
>儲存集區
>刪除
-
- Replace all disks in NAS extension.
- Rebuild disk group.
-
ironman
:儲存空間管理員
>磁碟群組
>新增
Chooseraid6
-
eater
:儲存空間管理員
>儲存集區
>新增
Chooseraid6
-
- Rebuild volume.
-
ironman
:儲存空間管理員
>儲存空間
>新增
Choose file system asBtrfs
-
eater
:儲存空間管理員
>儲存空間
>新增
Choose file system asBtrfs
-
- Connect master and insert disks.
- Create new disk group.
儲存空間管理員
>磁碟群組
/儲存集區
>新增
- Choose disks in new extension.
- Choose array type as
raid6
- Create new disk volume.
儲存空間管理員
>儲存空間
>新增
Choose file system asBtrfs
- Connect master and insert disks.
-
儲存空間管理員
>儲存空間
/儲存集區
>新增
- Choose disks in new extension.
- Choose
raid6
.
- Create new disk volume.
儲存空間管理員
>儲存空間
>新增
Choose file system asBtrfs
- Log in to DSM
- Open storage manager
-
Storage pool
>Data Scrubbing
>action
>manual run