Network Storage Project (SYS265) - Chromosom3/TechNotes GitHub Wiki
Seaweed
For our group project, our team utilized Seaweed. Seaweed is an opensource distributed file system. SeaweedFS is well suited for use in cloud environments and includes documentation for Amazon S3. Seaweed allows for replication and fail over to protect your data.
Network Diagram
Machines used:
hostname | ip address and submask | purpose |
---|---|---|
fwSYS26501-Group-2 | 10.0.5.2/24 | firewall |
win10-team2 | 10.0.5.151/24 | windows 10 client |
ubuntu1-group2 | 10.0.5.10/24 | seaweedfs master server |
ubuntu2-group2 | 10.0.5.11/24 | seaweedfs volume server |
ubuntu3-group2 | 10.0.5.12/24 | seaweedfs volume server |
Server Hierarchy
Installation Process:
The first step of installing seaweed is to update packages and tools the most recent version
sudo apt update
sudo apt install vim curl wget zip git -y
sudo apt install build-essential autoconf automake gdb git libffi-dev zlib1g-dev libssl-dev -y
Next install golang from the APT repo
sudo apt install golang
After clone the github seaweed repo
cd ~
git clone https://github.com/chrislusf/seaweedfs.git
Finally install by navigating to the seaweed folder, then use the command make install. ⚠️ Do note that this step takes a fair amount of time and resources, and you may see a notice about high cpu usage in the final steps.
cd ~/seaweedfs
make install
Once everything is installed copy the seaweed binary
sudo cp ~/go/bin/weed /usr/local/bin/
Verify the version of seaweedFS and congratulations, SeaweedFS is installed
weed version
Setup the master servers
Now that seaweed is installed, the master server can be setup. There are two differents ways to start master server. One way is to start the master server using systemd. To do this create a file called seaweedmaster.service
located in /etc/systemd/system
. In the file copy so it looks like the screenshot below. Notice how in screenshot below the value defaultReplicaton=002
, this means that a file will be replicated on two other servers on the same rack. the As well here is link to the file.
After updating the file reload the daemon and start the seaweedmaster service
sudo systemctl daemon-reload
sudo systemctl start seaweedmaster
sudo systemctl enable seaweedmaster
Make sure the service is running
sudo systemctl status seaweedmaster.service
If there is an issue of running the seaweed master as a service, the master service can also be ran manually with the following command
weed master &
Setup the volume servers
Now that the master server is setup, the volume server can be setup. There can be multiple volume servers setup on a single computer as well as distributing the volume servers across many computers. First make sure that seaweed is installed on all the servers.
First create directories for the volume servers. For our environment, we created the directory calledseaweedshare
on ubuntu1, ubuntu2, and ubuntu3. Next create the file called seaweedvolume1.service
and copy it so it looks like the screenshot below. Here is a link to the file as well.
After reload the daemon as well as enabling, and starting the service
sudo systemctl daemon-reload
sudo systemctl start volume1
sudo systemctl enable volume1
Check that the service is running without any issues as well.
sudo systemctl status
The seaweed webgui
With seaweed this is a web interface which can be accessed by entering the ip address of the master server on port 9333, so for our environment on a browser navigate to 10.0.5.10:9333, and this is the page.
Notice where it says leader 10.0.5.10:9333, this is the master server's ip address. The section "Other Masters" is where other master server's ip addresses would show up, but in our environment, there is only one master server. Under the Topology section is shows the other volume server. It shows what data center they are apart of, what rack they are one, what their ip address, the number of volumes and the volume ids.
When clicking on one of the volume servers, this is what the page looks like.
This shows the path of the volume, and information on how many GiBbs are free. There is also the Volumes section and shows how many files are apart of a particular volume, how big the files are, the ttl field as well as if the file is read-only or not.
Adding files
The process to add a file to a seaweed server utilizes the curl command. First, curl a seaweed master server as shown below. Make sure to replace the IP with your master server IP address.
curl http://10.0.5.10:9333/dir/assign
The command should return an FID and URL variable. You will use those in the next command to upload the file. Use the following command to upload the file to a volume server.
curl -F file=@/home/champuser/test.txt http://10.0.5.11:8081/11,01053467242
curl -F file=@/Your-Local-File http://URL/FID
Viewing files
To access files you use the web GUI mentioned earlier. Continuing from the example above to access the file you would put one of the following URL in the browser.
http://10.0.5.11:8081/11,01053467242
http://10.0.5.11:8081/11/01053467242
http://10.0.5.11:8081/11/01053467242/test.txt
These addresses are based on the http://URL/FID
format and will be unique to each file.
Deleting files
Deleting a file from SeaweedFS is similar to uploading the file. You will need to run the following curl
command to delete the file.
curl -X DELETE http://10.0.5.11:8081/11,01053467242
curl -X DELETE http://URL/FID
Creating temporary files
SeaweedFS allows for time-to-live (TTL) to be attached to files. Once the TTL expires the file will be deleted. For more information on storing a file with TTL view their wiki entry, found here. Uploading a file with TTL is similar to uploading a normal file as shown below.
curl http://10.0.5.10:9333/dir/assign
curl -F file=@/home/champuser/test.txt http://10.0.5.11:8081/11,01053467242?ttl=3m
curl -F file=@/Your-Local-File http://URL/FID?ttl=TIME
Presentation and Demo video:
Watch the video here
Reflection
SeaweedFS seems like a powerful network file system. That being said I don't think I would ever go out of my way to use it. Personally, I'm not of fan of needing to use a filer to manage FIDs to keep track of files. This seems to be focused as more of a backend for other applications and I don't have a use case like that right now. I do like how easy it is to configure replication on the systems though. I also found the SeaweedFS wiki to be helpful when building our cluster and understanding the components. Overall I think I would prefer something that is more user-friendly like a Windows DFS.