Setup Runbook - NETESOLUTIONS/ERNIE GitHub Wiki

Table of Contents

Azure

  • Create the ERNIE Azure Resource Group
  • Create the ERNIE-LRS Recovery Service Vault
    • Properties > Backup Configuration > Update > Storage replication type = Locally-redundant
    • Create the backup policy
  • Create the ernie1-nsg Azure Network Security Group
  • Open port 22 only on the Azure firewall in the Network Security Group settings

Security Policy Assignment

Security Center > Security policy > {subscription} > View effective policy > {policy assignment} > Parameters >

  • This requires subscription owner privileges
  1. Disk encryption should be applied on virtual machines = Disabled

Linux VM

Create Azure VM

  1. Login to Azure under NETE Azure Pay-As-You-Go subscription
  2. Add a VM:
    • Name = ernie-{purpose}
  3. Basic
    • Region = East US 2
    • Image = CIS CentOS Linux 7.5
    • Select an appropriate server size
  4. Disks
    • Add an appropriate number of premium storage disks
  5. Networking
    • Virtual network = ERNIE-vnet
    • Public IP = new
    • NSG = Advanced > ernie1-nsg
    • Accelerated networking = off
  6. Management
    • OS guest diagnostics = on
  7. Tags
    • Add vm = {VM name}
  8. Create
  9. Configure public DNS

Set up system

Customize

## Update OMI to 1.4.2-3+ ##
sudo rpm -Uvh https://packages.microsoft.com/config/rhel/7/packages-microsoft-prod.rpm
sudo yum update -y omi
sudo rm -rf /home/omi /var/spool/mail/omi

## Azure CLI ##
sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc
{ cat <<'HEREDOC'
[azure-cli]
name=Azure CLI
baseurl=https://packages.microsoft.com/yumrepos/azure-cli
enabled=1
gpgcheck=1
gpgkey=https://packages.microsoft.com/keys/microsoft.asc
HEREDOC
} | sudo tee /etc/yum.repos.d/azure-cli.repo
sudo chmod a+r /etc/yum.repos.d/*
yum check-update
sudo yum install -y azure-cli

# Add the EPEL repo
sudo yum install -y epel-release

## Add the elrepo ##
# Latest Linux kernel updates

sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

## Add the Open Fusion repo ##
# Get GNU Parallel updates beyond v20160222, which is a very old version 

sudo rpm --import http://repo.openfusion.net/RPM-GPG-KEY-openfusion

{ cat <<'HEREDOC'
[OpenFusion]
name=Open Fusion
baseurl=http://repo.openfusion.net/centos7-x86_64
enabled=1
gpgcheck=1
HEREDOC
} | sudo tee /etc/yum.repos.d/OpenFusion.repo


# Add a Midnight Commander CentOS 7 binary repo
sudo wget http://download.opensuse.org/repositories/home:/laurentwandrebeck:/mc/CentOS_7/home:laurentwandrebeck:mc.repo -O /etc/yum.repos.d/home_laurentwandrebeck_mc.repo

References:

Set up log rotation

Upload logrotateconfiguration from the repo to e.g. ~/Workspaces/ERNIE/Config/storage/etc, then:

sudo cp -Rv ~/Workspaces/ERNIE/Config/storage/etc /

# Fix SELinux context user for some log files
sudo chcon -u system_u /var/log/*

Script Tools

  • sudo yum install -y parallel
  • PCRE grep: sudo yum install -y pcre-tools
  • jq: sudo yum install -y jq
  • lftp, an FTP client: sudo yum install -y lftp
  • 7-Zip: sudo yum install -y p7zip

Interactive Tools

  • Midnight Commander, an Orthodox File Manager (OFM): sudo yum install -y mc
  • sudo yum install -y nano
    • Some people don't use emacs nor vi
  • glances, advanced top-like server resource stats: sudo yum install -y glances
  • sudo yum install -y qrencode: QR code generation, e.g. for Google Authenticator

Linux SDKs

E.g. packages required for compiling monit sources:

  • libtool: sudo yum install -y libtool
  • PAM development support: sudo yum install -y pam-devel
  • SSL header files: sudo yum install -y openssl-devel

Configure authentication, groups and users

  1. Increase sudo timeout: set sudo sed --in-place --regexp-extended 's/(Defaults.*env_reset).*/\1,timestamp_timeout=60/' /etc/sudoers
  2. Create the core team group: sudo groupadd erniecore
  3. [] TBD. Create end user group: sudo groupadd ernieusers
  4. Create core team Linux users and add ernie_admin and core team users to the erniecore (as the primary group) and wheel.
  5. Configure PAM
    • sudo yum install -y google-authenticator
    • Upload and copy PAM config files to /etc/pam.d/
  6. Configure system banner: upload and copy issue.net file to /etc/
  7. Configure SSH: upload and copy sshd_config file to /etc/ssh/

(Neo4j Server) SSH for System Accounts

Add storage

For each additional disk:

  1. Create a new VM drive in Azure Portal.
  2. On the machine: partition, format and mount the drive.
  3. Add the disk UUID to /etc/fstab. For example, add the following line:
UUID=43204a4e-48b4-4c44-8db2-bc411fe10da4 /data1 xfs defaults,nofail 1 2

Configure swap

Disable Linux Firewall

  • [] TBD PAR-496 Evaluate a need in firewalling
  • Azure can do firewalling via Azure NSG so we don't need firewalld nor iptables firewalls. The hardening script enables iptables/ip6tables, but doesn't do anything with firewalld.
  • Stop and disable Linux firewalld service:
sudo systemctl stop firewalld
sudo systemctl disable firewalld
  • If the project decides to enforce tunneling, Azure firewall (Azure dashboard > Network security group) should be used.

Set up Backup

Azure Dashboard > Virtual Machines > {server} > Backup >

  1. Recovery Services vault > Select existing = ERNIE-LRS
  2. Choose backup policy = Maximum-9-points
  3. Enable Backup

Azure Monitor

  • Azure Monitor setup as documented did not work: the monitor was enabled, but no data is being recorded.
  • Linux Diagnostic Extension 3.0 setup as documented failed with a Python syntax error.

ClamAV

sudo yum install -y epel-release
sudo yum install -y clamav-server clamav-data clamav-update clamav-filesystem clamav clamav-scanner-systemd clamav-devel clamav-lib clamav-server-systemd
sudo setsebool -P antivirus_can_scan_system 1
sudo setsebool -P clamd_use_jit 1
sudo sed -i -e "s/^Example/#Example/" /etc/clamd.d/scan.conf
sudo cp /etc/clamd.d/scan.conf /etc/clamd.d/scan.conf.backup
sudo sed -i -e "s/#LocalSocket /LocalSocket /" /etc/clamd.d/scan.conf
sudo cp /etc/freshclam.conf /etc/freshclam.conf.backup
sudo sed -i -e "s/^Example/#Example/" /etc/freshclam.conf
sudo freshclam
sudo bash -c "cat >/usr/lib/systemd/system/freshclam.service <<EOF
[Unit]
Description = freshclam scanner
After = network.target
[Service]
Type = forking
ExecStart = /usr/bin/freshclam -d -c 2
Restart = on-failure
PrivateTmp = true
[Install]
WantedBy=multi-user.target
EOF
"
sudo systemctl start freshclam
sudo systemctl enable freshclam
sudo systemctl start clamd@scan
sudo systemctl enable clamd@scan

This sets up:

  1. ClamAV services
  2. Periodic DB updates
    • [] TODO. Check on that. There were root emails with warning messages.
  3. Disabled on-access scan
To scan files periodically, create a Jenkins jobs, running sudo clamscan -i -r /home /erniedev_data1. This jobs would fail with exit code 1 on any infected files found. Running it under root should ensure no access errors, which trigger exit code 2.

For more info, see How to Install ClamAV on CentOS 7.

Postgres

Installation

  1. Install Postgres per the recipes
  2. Configure Postgres per the recipes
  3. Set up user access
  4. Add postgres user to the erniecore group: sudo usermod -a -G erniecore postgres and restart Postgres

Default client parameters

  • /etc/profile.d/postgres_defaults.sh:
export PGDATA=/var/lib/pgsql/11/data
export PGDATABASE=ernie
  • This makes scripts (which mostly connect to Postgres on the same server) less verbose and more portable between systems. It'd also help a lot if connection parameters ever need to change.
  • Users can override these defaults via the command line, particularly to connect under their own accounts, e.g psql -U dk.
  • For Jenkins-executed local scripts these could be set on the fly in Manage Jenkins > Configure System > Global properties > Environment variables.

Tablespaces

Allocate hard drive space and create tablespaces. See Postgres Server Performance Tuning. Make sure that the Postgres service user which was set up above (postgres) can read and write to the parent and the actual tablespace directories:

  1. {module}_tbs per each module
  2. p2_studies_tbs, theta_plus_tbs, sb_plus_tbs, tri_citations_tbs for large case study tables
  3. index_tbs for all indexes
  4. temp_tbs for the Postgres temp_tablespace and for all staging tables
  5. user_tbs for the Postgres default_tablespace and for non-public (user) objects
  6. ernie1_museum_tbs for the data moved from ERNIE1

(Neo4j Server) Neo4j

  • Install Java 11
  • Install Neo4j per the Neo4j recipes
  • Add neo4j user to the core group: sudo usermod -a -G erniecore neo4j

Python 3

  • Install Anaconda3 distribution
    1. Navigate to the link for the most recent version, then download it on the server, e.g. wget https://repo.continuum.io/archive/Anaconda3-2018.12-Linux-x86_64.sh
    2. chmod ug+x Anaconda3*.sh
    3. sudo ./Anaconda3*.sh
    4. Accept the license agreement
    5. Enter the following location: /anaconda3
    6. Enter defaults in other prompts, no to install Visual Studio Code and finish installation
  • Set up environment:
    1. sudo alternatives --install /usr/local/bin/python python /usr/bin/python2.7 1
    2. sudo alternatives --install /usr/local/bin/python python /anaconda3/bin/python 2
  • Install modules:
    • sudo /anaconda3/bin/pip install psycopg2
    • sudo /anaconda3/bin/pip install pandas
    • sudo /anaconda3/bin/pip install tzlocal
    • sudo /anaconda3/bin/pip install lxml
    • sudo /anaconda3/bin/pip install inflect
    • sudo /anaconda3/bin/pip install graphene_sqlalchemy
    • sudo /anaconda3/bin/pip install Flask-GraphQL
  • Grant permissions to all users: sudo chmod o+rx -R /anaconda3
    1. TBD [] Figure out what permissions are exactly needed for Anaconda and installed packages to be executable by all users.
References:
  1. Anaconda Documentation - Installing on Linux

C++ 14+

  • sudo yum install -y centos-release-scl: Software Collections, also known as SCL is a community project that allows you to build, install, and use multiple versions of software on the same system, without affecting system default packages.
  • sudo yum install -y devtoolset-9
  • Activate the Developer Toolset 9 environment with: scl enable devtoolset-9 bash

HipMCL

  • [] TBD this might be optional. Install MPICH. Download sources, unarchive and:
mkdir build
../configure
make
# Using the default installation directory: /usr/local/bin
sudo make install
  • Install C++ 14+ and activate the Developer Toolset 9 environment
  • Download latest sources and unarchive.
  • cmake .
  • make
  • suco cp -v bin/* /usr/local/bin/

Jenkins

  • Install Jenkins per the Jenkins recipes
    • Move Jenkins user to the main pardicore group: sudo usermod -g pardicore jenkins
    • Use /erniedev_data1/jenkins_home as JENKINS_HOME
  • Configure Jenkins per the Jenkins recipes
    • Configure Global Security > Enable security, Security Realm = Jenkins’ own user database
    • Naming Strategy
      • Pattern = (CG|CT|Derwent|FDA|WoS|CaseStudy|ERNIE)+(-[A-Za-z0-9]+)+
      • Description:
        A job name must conform to the following convention: "{module: CG|CT|Derwent|FDA|WoS|CaseStudy|ERNIE}[-{branch}]-{do something}[-{option][-{option]". Examples: "CG-update", "CG-mybranch-download-data-GW1". Each word component consists of alphanumerics only. This name pattern can be changed in Configure System.
      • force existing = on
  • Create an integration in NETE Slack for Jenkins to post to #ernie-notifications
  • Create a Postgres user: psql -c "CREATE USER jenkins SUPERUSER;"
  • For Jenkins jobs to connect locally via Unix sockets
    • [] TODO Transition to pg_read_server_files, pg_write_server_files roles

Jenkins jobs

  • Create Jenkins jobs
  • Use Slack integration tokens from the integration created in Slack

Upsource

Slack Integrations

For the #ernie-notifications channel:

  1. Create a Jenkins integration
  2. Create an email integration for monit
  3. Add GitHub Slack app. In the channel: /github subscribe NETESOLUTIONS/ERNIE
  4. JIRA integration: JIRA Administration > Projects > ERNIE > Slack integration >
    • Add Team > NETEtysons
    • Configure > Channel = #ernie-notifications
    • Configure > Trigger events: = Issue Created,Issue Updated,Issue Assigned,Issue Resolved,Issue Closed,Issue Commented,Issue Reopened,Issue Deleted,Issue Moved,TO DO,In Progress,Done
    • Save

monit

  • Install: sudo yum install -y monit
  • sudo systemctl start monit
  • After you install and start a service, set up monitoring for all services running on particular server, e.g.:
    1. Upload and copy all monit configuration files from the Config directory: sudo cp -v ~/Workspaces/ERNIE/Config/**/etc/monit.d/*.conf /etc/monit.d
    2. For Postgres: SQL> CREATE USER root WITH PASSWORD :'password'; CREATE DATABASE root OWNER root;
    3. sudo monit reload
  • To check monitored status, use sudo monit summary. To check service details, use sudo monit status.

HDInsight Cluster VM - Optional restricted access

The following steps are optional based on whether or not you want public access to the Spark cluster

Create Subnet

  1. Login to Azure under NETE Azure Pay-As-You-Go subscription
  2. Go to ERNIE-vnet > Subnets
  3. Add a subnet to the Virtual Network with enough address space to contain the nodes of the Spark cluster you wish to create
  4. Specify the NSG you want to attach to the Subnet if it already exists. If it does not, create a new NSG to attach to the subnet, being aware of what you should leave open (access from the same private range, access from Azure services, etc.) while trying to close out the outside world. This link provides further detail on the Inbound/Outbound rules that should remain in place when configuring an NSG for use with an HDInsight Cluster: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-extend-hadoop-virtual-network#hdinsight-ip-1
⚠️ **GitHub.com Fallback** ⚠️