Setup Runbook - NETESOLUTIONS/ERNIE GitHub Wiki
- Create the
ERNIE
Azure Resource Group - Create the
ERNIE-LRS
Recovery Service Vault- Properties > Backup Configuration > Update > Storage replication type = Locally-redundant
- Create the backup policy
- Create the
ernie1-nsg
Azure Network Security Group - Open port 22 only on the Azure firewall in the Network Security Group settings
Security Center > Security policy > {subscription} > View effective policy > {policy assignment} > Parameters >
- This requires subscription owner privileges
- Disk encryption should be applied on virtual machines = Disabled
- Login to Azure under NETE Azure Pay-As-You-Go subscription
- Add a VM:
- Name =
ernie-{purpose}
- Name =
- Basic
- Region =
East US 2
- Image =
CIS CentOS Linux 7.5
- Select an appropriate server size
- Region =
- Disks
- Add an appropriate number of premium storage disks
- Networking
- Virtual network =
ERNIE-vnet
- Public IP = new
- NSG = Advanced >
ernie1-nsg
- Accelerated networking = off
- Virtual network =
- Management
- OS guest diagnostics = on
- Tags
- Add
vm
= {VM name}
- Add
- Create
- Configure public DNS
## Update OMI to 1.4.2-3+ ##
sudo rpm -Uvh https://packages.microsoft.com/config/rhel/7/packages-microsoft-prod.rpm
sudo yum update -y omi
sudo rm -rf /home/omi /var/spool/mail/omi
## Azure CLI ##
sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc
{ cat <<'HEREDOC'
[azure-cli]
name=Azure CLI
baseurl=https://packages.microsoft.com/yumrepos/azure-cli
enabled=1
gpgcheck=1
gpgkey=https://packages.microsoft.com/keys/microsoft.asc
HEREDOC
} | sudo tee /etc/yum.repos.d/azure-cli.repo
sudo chmod a+r /etc/yum.repos.d/*
yum check-update
sudo yum install -y azure-cli
# Add the EPEL repo
sudo yum install -y epel-release
## Add the elrepo ##
# Latest Linux kernel updates
sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
## Add the Open Fusion repo ##
# Get GNU Parallel updates beyond v20160222, which is a very old version
sudo rpm --import http://repo.openfusion.net/RPM-GPG-KEY-openfusion
{ cat <<'HEREDOC'
[OpenFusion]
name=Open Fusion
baseurl=http://repo.openfusion.net/centos7-x86_64
enabled=1
gpgcheck=1
HEREDOC
} | sudo tee /etc/yum.repos.d/OpenFusion.repo
# Add a Midnight Commander CentOS 7 binary repo
sudo wget http://download.opensuse.org/repositories/home:/laurentwandrebeck:/mc/CentOS_7/home:laurentwandrebeck:mc.repo -O /etc/yum.repos.d/home_laurentwandrebeck_mc.repo
References:
Upload logrotate
configuration from the repo to e.g. ~/Workspaces/ERNIE/Config/storage/etc
, then:
sudo cp -Rv ~/Workspaces/ERNIE/Config/storage/etc /
# Fix SELinux context user for some log files
sudo chcon -u system_u /var/log/*
sudo yum install -y parallel
- PCRE grep:
sudo yum install -y pcre-tools
-
jq
:sudo yum install -y jq
-
lftp
, an FTP client:sudo yum install -y lftp
-
7-Zip
:sudo yum install -y p7zip
-
Midnight Commander
, an Orthodox File Manager (OFM):sudo yum install -y mc
-
sudo yum install -y nano
- Some people don't use emacs nor vi
-
glances
, advancedtop
-like server resource stats:sudo yum install -y glances
-
sudo yum install -y qrencode
: QR code generation, e.g. for Google Authenticator
E.g. packages required for compiling monit
sources:
- libtool:
sudo yum install -y libtool
- PAM development support:
sudo yum install -y pam-devel
- SSL header files:
sudo yum install -y openssl-devel
- Increase
sudo
timeout: setsudo sed --in-place --regexp-extended 's/(Defaults.*env_reset).*/\1,timestamp_timeout=60/' /etc/sudoers
- Create the core team group:
sudo groupadd erniecore
- [] TBD. Create end user group:
sudo groupadd ernieusers
-
Create core team Linux users and add
ernie_admin
and core team users to theerniecore
(as the primary group) andwheel
. - Configure PAM
sudo yum install -y google-authenticator
- Upload and copy PAM config files to
/etc/pam.d/
- Configure system banner: upload and copy issue.net file to
/etc/
- Configure SSH: upload and copy sshd_config file to
/etc/ssh/
- Set up SSH for System Accounts
For each additional disk:
- Create a new VM drive in Azure Portal.
- On the machine: partition, format and mount the drive.
- If throughput > 200MB/s or IOPS > 1000 (e.g. for Premium HDD storage, 1 TB), use
xfs
. Otherwise, (e.g. for Standard HDD or SDD storage, 1 TB) useext4
file system. See How to Choose Your Red Hat Enterprise Linux File System for details.
- If throughput > 200MB/s or IOPS > 1000 (e.g. for Premium HDD storage, 1 TB), use
- Add the disk UUID to
/etc/fstab
. For example, add the following line:
UUID=43204a4e-48b4-4c44-8db2-bc411fe10da4 /data1 xfs defaults,nofail 1 2
- Reference: Add a disk to a Linux VM
- [] TBD PAR-496 Evaluate a need in firewalling
- Azure can do firewalling via Azure NSG so we don't need
firewalld
noriptables
firewalls. The hardening script enablesiptables
/ip6tables
, but doesn't do anything withfirewalld
. - Stop and disable Linux
firewalld
service:
sudo systemctl stop firewalld
sudo systemctl disable firewalld
- If the project decides to enforce tunneling, Azure firewall (Azure dashboard > Network security group) should be used.
Azure Dashboard > Virtual Machines > {server} > Backup >
- Recovery Services vault > Select existing =
ERNIE-LRS
- Choose backup policy =
Maximum-9-points
- Enable Backup
- Azure Monitor setup as documented did not work: the monitor was enabled, but no data is being recorded.
- Linux Diagnostic Extension 3.0 setup as documented failed with a Python syntax error.
sudo yum install -y epel-release
sudo yum install -y clamav-server clamav-data clamav-update clamav-filesystem clamav clamav-scanner-systemd clamav-devel clamav-lib clamav-server-systemd
sudo setsebool -P antivirus_can_scan_system 1
sudo setsebool -P clamd_use_jit 1
sudo sed -i -e "s/^Example/#Example/" /etc/clamd.d/scan.conf
sudo cp /etc/clamd.d/scan.conf /etc/clamd.d/scan.conf.backup
sudo sed -i -e "s/#LocalSocket /LocalSocket /" /etc/clamd.d/scan.conf
sudo cp /etc/freshclam.conf /etc/freshclam.conf.backup
sudo sed -i -e "s/^Example/#Example/" /etc/freshclam.conf
sudo freshclam
sudo bash -c "cat >/usr/lib/systemd/system/freshclam.service <<EOF
[Unit]
Description = freshclam scanner
After = network.target
[Service]
Type = forking
ExecStart = /usr/bin/freshclam -d -c 2
Restart = on-failure
PrivateTmp = true
[Install]
WantedBy=multi-user.target
EOF
"
sudo systemctl start freshclam
sudo systemctl enable freshclam
sudo systemctl start clamd@scan
sudo systemctl enable clamd@scan
This sets up:
- ClamAV services
- Periodic DB updates
- [] TODO. Check on that. There were root emails with warning messages.
- Disabled on-access scan
sudo clamscan -i -r /home /erniedev_data1
. This jobs would fail with exit code 1 on any infected files found. Running it under root should ensure no access errors, which trigger exit code 2.
For more info, see How to Install ClamAV on CentOS 7.
- Install Postgres per the recipes
- Configure Postgres per the recipes
- Set up user access
- Add
postgres
user to theerniecore
group:sudo usermod -a -G erniecore postgres
and restart Postgres
-
/etc/profile.d/postgres_defaults.sh
:
export PGDATA=/var/lib/pgsql/11/data
export PGDATABASE=ernie
- This makes scripts (which mostly connect to Postgres on the same server) less verbose and more portable between systems. It'd also help a lot if connection parameters ever need to change.
- Users can override these defaults via the command line, particularly to connect under their own accounts, e.g
psql -U dk
.
- For Jenkins-executed local scripts these could be set on the fly in Manage Jenkins > Configure System > Global properties > Environment variables.
Allocate hard drive space and create tablespaces. See Postgres Server Performance Tuning. Make sure that the Postgres service user which was set up above (postgres
) can read and write to the parent and the actual tablespace directories:
-
{module}_tbs
per each module -
p2_studies_tbs
,theta_plus_tbs
,sb_plus_tbs
,tri_citations_tbs
for large case study tables -
index_tbs
for all indexes -
temp_tbs
for the Postgrestemp_tablespace
and for all staging tables -
user_tbs
for the Postgresdefault_tablespace
and for non-public (user) objects -
ernie1_museum_tbs
for the data moved from ERNIE1
- Install Java 11
- Install Neo4j per the Neo4j recipes
- Add
neo4j
user to the core group:sudo usermod -a -G erniecore neo4j
- Install Anaconda3 distribution
- Navigate to the link for the most recent version, then download it on the server, e.g.
wget https://repo.continuum.io/archive/Anaconda3-2018.12-Linux-x86_64.sh
chmod ug+x Anaconda3*.sh
sudo ./Anaconda3*.sh
- Accept the license agreement
- Enter the following location:
/anaconda3
- Enter defaults in other prompts, no to install Visual Studio Code and finish installation
- Navigate to the link for the most recent version, then download it on the server, e.g.
- Set up environment:
sudo alternatives --install /usr/local/bin/python python /usr/bin/python2.7 1
sudo alternatives --install /usr/local/bin/python python /anaconda3/bin/python 2
- Install modules:
sudo /anaconda3/bin/pip install psycopg2
sudo /anaconda3/bin/pip install pandas
sudo /anaconda3/bin/pip install tzlocal
sudo /anaconda3/bin/pip install lxml
sudo /anaconda3/bin/pip install inflect
sudo /anaconda3/bin/pip install graphene_sqlalchemy
sudo /anaconda3/bin/pip install Flask-GraphQL
- Grant permissions to all users:
sudo chmod o+rx -R /anaconda3
- TBD [] Figure out what permissions are exactly needed for Anaconda and installed packages to be executable by all users.
-
sudo yum install -y centos-release-scl
: Software Collections, also known as SCL is a community project that allows you to build, install, and use multiple versions of software on the same system, without affecting system default packages. -
sudo yum install -y devtoolset-9
- Activate the Developer Toolset 9 environment with:
scl enable devtoolset-9 bash
- [] TBD this might be optional. Install MPICH. Download sources, unarchive and:
mkdir build
../configure
make
# Using the default installation directory: /usr/local/bin
sudo make install
- Install C++ 14+ and activate the Developer Toolset 9 environment
- Download latest sources and unarchive.
cmake .
make
suco cp -v bin/* /usr/local/bin/
- Install Jenkins per the Jenkins recipes
- Move Jenkins user to the main
pardicore
group:sudo usermod -g pardicore jenkins
- Use
/erniedev_data1/jenkins_home
asJENKINS_HOME
- Move Jenkins user to the main
- Configure Jenkins per the Jenkins recipes
- Configure Global Security > Enable security, Security Realm = Jenkins’ own user database
- Naming Strategy
- Pattern =
(CG|CT|Derwent|FDA|WoS|CaseStudy|ERNIE)+(-[A-Za-z0-9]+)+
- Description:
A job name must conform to the following convention: "{module: CG|CT|Derwent|FDA|WoS|CaseStudy|ERNIE}[-{branch}]-{do something}[-{option][-{option]". Examples: "CG-update", "CG-mybranch-download-data-GW1". Each word component consists of alphanumerics only. This name pattern can be changed in Configure System.
- force existing = on
- Pattern =
- Create an integration in NETE Slack for Jenkins to post to
#ernie-notifications
- Create a Postgres user:
psql -c "CREATE USER jenkins SUPERUSER;"
-
For Jenkins jobs to connect locally via Unix sockets
- [] TODO Transition to
pg_read_server_files
, pg_write_server_files roles
- [] TODO Transition to
- Create Jenkins jobs
- Use Slack integration tokens from the integration created in Slack
- Install and configure Docker Distribution per the Code Review Systems recipes using
UPSOURCE_HOME=/erniedev_data1/upsource
For the #ernie-notifications
channel:
- Create a Jenkins integration
- Create an email integration for monit
- Add GitHub Slack app. In the channel:
/github subscribe NETESOLUTIONS/ERNIE
- JIRA integration: JIRA Administration > Projects > ERNIE > Slack integration >
- Add Team > NETEtysons
- Configure > Channel =
#ernie-notifications
- Configure > Trigger events: =
Issue Created,Issue Updated,Issue Assigned,Issue Resolved,Issue Closed,Issue Commented,Issue Reopened,Issue Deleted,Issue Moved,TO DO,In Progress,Done
- Save
- Install:
sudo yum install -y monit
sudo systemctl start monit
- After you install and start a service, set up monitoring for all services running on particular server, e.g.:
- Upload and copy all monit configuration files from the Config directory:
sudo cp -v ~/Workspaces/ERNIE/Config/**/etc/monit.d/*.conf /etc/monit.d
- For Postgres:
SQL> CREATE USER root WITH PASSWORD :'password'; CREATE DATABASE root OWNER root;
sudo monit reload
- Upload and copy all monit configuration files from the Config directory:
- To check monitored status, use
sudo monit summary
. To check service details, usesudo monit status
.
The following steps are optional based on whether or not you want public access to the Spark cluster
- Login to Azure under NETE Azure Pay-As-You-Go subscription
- Go to
ERNIE-vnet
>Subnets
- Add a subnet to the Virtual Network with enough address space to contain the nodes of the Spark cluster you wish to create
- Specify the NSG you want to attach to the Subnet if it already exists. If it does not, create a new NSG to attach to the subnet, being aware of what you should leave open (access from the same private range, access from Azure services, etc.) while trying to close out the outside world. This link provides further detail on the Inbound/Outbound rules that should remain in place when configuring an NSG for use with an HDInsight Cluster: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-extend-hadoop-virtual-network#hdinsight-ip-1