BigCouch Setup - dmwm/WMCore GitHub Wiki

Building BigCouch

To build BigCouch RPM we need the following set of spec/patch files:

  • bigcouch-add-cmsauth-to-chttpd.patch
  • bigcouch_cms_auth.erl.file
  • bigcouch-fix-sconscript.patch
  • bigcouch-megapatch.patch
  • bigcouch.spec
  • bigcouch-ssl-replication.patch

all of them originated from CouchDB port and later were modified by A. Melo and myself to adopt to BigCouch release 0.4.2b

I put all of these files into AFS area: /afs/cern.ch/user/v/valya/public/bigcouch/cmsdist/ I'll request to include them into cmsdist area via pull request later.

Please note that bigcouch.spec file requires further clean-up and improvements as it currently has quite messy setup procedure, e.g. local git area, tweaking rebar and SCons configuration files, etc. Unfortunately BigCouch team uses those tools and their configuration is limited via external environment settings. But as a first draft it is suitable to build BigCouch RPM.

For the reference the BigCouch depends on the following packages:

  • curl spidermonkey openssl icu4c erlang couchapp python

The final RPM contains full erlang OTP release, which has its own configuration files in place. Unfortunately they cannot be relocated and usage of environment variables is limited in Erlang. Therefore CMS specific configuration will be provided separately to replaced erlang OTP ones during deployment step.

Installation

Please find all configuration files, including CMS deploy/manage scripts, in /afs/cern.ch/user/v/valya/public/bigcouch/cfg/ area. Here is complete list:

  • default.ini
  • deploy
  • local.ini
  • manage
  • monitoring.ini
  • ports.config
  • vm.args

The default.ini file contains all settings for BigCouch. The local.init file contains CMS specific settings with path templates which suppose to be overritten during deployment step. The ports.config contains settings for Erlang kernel module and contains range of open ports used by Erlang processes between participating cluster nodes. The vm.args file contains set of parameters for Erlang VM.

Please note: Erlang code (the core VM one) uses strict pattern matching and provides no errors if some of the parameter names does not match. This makes debugging process very tricky. Therefore before changing anything in .ini/.config/*.args files please consult with Erlang documentation or local experts. For example, the configuration file must ends with .config extension. The start-up of the program expects certain files in places (either local area or within Erlang release one), therefore their relocation will lead to non-working behavior of your program without any hints of what went wrong.

During deployment step which installs BigCouch release, I removed Erlang configuration files from install area and made soft links to CMS ones which were deployed into /data/current/config/bigcouch/ area on VM. This step is required by two reasons:

  • during build step we don't want to distribute CMS specific configuration files
  • upon installation Erlang expect all of those files in its release install area

I'll let HTTP group to review this procedure, see deploy_bigcouch_sw in deploy script.

I chose 9100-9105 port range for Erlang processes. This is what is suggested in online documentation. The exact number of ports to be open is an open question and I think it should be tuned based on application utilization. These ports are required to be open for hosts which particiate in Erlang cluster (see below iptables rules which will be required).

Finally, the manage script follows standard cmsweb template and contains options to start/stop BigCouch service. It will also require some additional tweaking though. For instance, I did not yet implement sysboot. Also the bigcouch application reads from Erlang congiguration file the location of log file. Therefore its rotation should be treated separately (not as done in many cmsweb apps which redirect their stdout into pipe of rotatelogs app). But the location of bigcouch log file I made configurable via local.ini and its destination is set during deploy script.

Deployment

To deploy BigCouch into CERN VM I followed standard cmsweb deployment procedure outlined in https://cms-http-group.web.cern.ch/cms-http-group/tutorials/index.html

Here I summarized all steps I did on my VM (of course user account settings will be taken care once bigcouch app will be included by HTTP group into full list of supported apps on VMs):

sudo /usr/sbin/groupadd _bigcouch

sudo /usr/sbin/useradd -M -g _bigcouch -s /bin/nologin -c "CMSWEB BigCouch App" -d /data/empty _bigcouch

sudo chgrp _bigcouch /data/logs/bigcouch

sudo chgrp _bigcouch /data/state/bigcouch

mkdir -p /data/state/bigcouch/{database,replication,stagingarea}

sudo chgrp _bigcouch /data/state/bigcouch/{database,replication,stagingarea}

mkdir /data/cfg/bigcouch/

cp /afs/cern.ch/user/v/valya/public/bigcouch/cfg/* /data/cfg/bigcouch/

Then I follow standard cmsweb deployment procedure.

Please note, I created three areas for BigCouch in /data/state, but I don't use stagingarea directory. The former two are used by BigCouch itself.

Once bigcouch app is installed, I can start it in an usual way as any other cmsweb application via manage script.

Cluster node configuration

This step is required when we need to setup BigCouch cluster. Several ports to be open across participating nodes:

  • 4369 epmd port (Erlang Process Manager Deamon)
  • 9100-9105 port range to be used by erlang nodes (TCP traffic only)

As such I applied the following set of iptables rules

sudo /sbin/iptables -I INPUT -s $host -p tcp --dport 4369 -j ACCEPT

sudo iptables -I INPUT -s $host -m state --state NEW -m tcp -p tcp --dport 9100:9105 -j ACCEPT

sudo /etc/init.d/iptables status

sudo /etc/init.d/iptables save

here the $host specifies the hostname of the node which will participate in a cluster.

BigCouch database management

BigCouch uses the following set of ports:

  • 15984 for HTTP
  • 15986 for admin usage
  • 16984 for HTTPs
  • 16986 for HTTPs admin usage

Prior insertion of documents you can check cluster membership:

curl http://127.0.0.1:15984/_membership

Then you may add a new node, e.g. das-dbs3.cern.ch to the cluster:

curl -X PUT http://127.0.0.1:15986/nodes/[email protected] -d {}

Here is an example of how to create new DB comprised of 32 partitions where each document is stored 3 times

curl -X PUT 'http://127.0.0.1:15984/test_db?n=3&q=32'

Insert one doc

curl -X PUT http://127.0.0.1:15984/test_db/doc_1 -H content-type:application/json -d '{"a":1,"b":2}'

Retrieve one doc

curl http://127.0.0.1:15984/test_db/doc_1

CouchDB to BigCouch replication

CouchDB links:

we can replicate one DB into another as following:

  • create new wmstats DB on BigCouch

curl -X PUT http://127.0.0.1:15984/wmstats

  • start replication from CouchDB into BigCouch

curl -H "Content-Type: application/json" -X POST http://127.0.0.1:5984/_replicator \ -d '{"target":"http://127.0.0.1:15984/wmstats","source":"wmstats"}'

here we replicate content of wmstats CouchDB database running on 127.0.0.1:5984 into BigCouch wmstats database running on 127.0.0.1:15984.

References: