Cluster - LinuxUserGroupUWSP/RackMesa GitHub Wiki

#What The cluster is a distributed system for large-scale computation. Users of the cluster will be able to execute relatively long-running processes at a near-linear fraction of the time.

#Why The reasoning behind building the cluster is two fold. Firstly, it provides our engineers with experience in distributed systems, high-availability patterns, and techniques of calculating and optimizing for resource-sensitive infrastructure. Secondly, it enables local researchers to calculate results faster. In general, more experiments become feasible and the pacing of research is increased.

#How ##Overview Our cluster will be built primarily using the Apache ecosystem. GlusterFS will our distributed filesystem, Apache Mesos will abstract our CPU and RAM resources, Apache Spark will be the computation engine, Apache Zeppelin will be used as a web-based interface to it, Apache Zookeeper will maintain our configurations to provide high availability, and Foreman will deploy our systems.

##Specifications There are to be at least three Zookeeper instances as well as Mesos masters. Each of these must be on a unique host for a total of six hosts. Every host will have a Mesos agent and Spark agent with the three Zookeeper instances configured as master. Most if not all of the hosts will be older, commodity laptops. Thus, running them 24/7 will introduce memory errors and possibilities of overheating. To mitigate memory errors, the cluster should have a rolling restart over the course of a week each week. Given our small client base, we only need to run Zeppelin on one host.

Cluster Architecture

##Build ###Prerequisites

###Installation NOTE: You only need to have three hosts running "mesos-master", but they should all be running "mesos-slave".