Hadoop setup - clhedrick/kerberos GitHub Wiki

Known web interfaces into hadoop

Internal stuff

Table of Contents

starting

  • Do "virsh list" on ilab1, 2, 3, to see what's already running. Make sure you don't start the same VM on two different hosts.
  • Use "virsh start NAME" to start the vm's that aren't running.
Here's current distribution:
  • ilab1: data1, dataservices2
  • ilab2: data2, jupyter
  • ilab3: data3, dataservices1, dataservices3, dataservices4
The reason for this is that dataservices2 and jupyter all both big, and dataservices1, 3, 4 are all small.
  • browser to https://data-services1.cs.rutgers.edu
  • login as admin, with ambari UI password stored in 1password
  • In the left margin you'll see a list of services. At the bottom there is an action menu. Do "start all"
  • Similarly you can stop with "stop all."
  • Starting takes a very long time, like 900 sec.
  • if you need to stop the VMs, use "virsh shutdown NAME"

restoration from backup

Restore netapp snapshots for all 6 images. E.g.

  • on ilab1, cd /var/lib/hadoop-shared
  • mv images images.hold
  • mkdir images
  • cp -a .snapshot/hourly.2018-06-06_0405/images/* images/ - obviously your date will be different
  • start all 6 images
We keep two copies of the namenode data, one local and one on NFS. When you restore, the local copy will be out of date compared to the actual HDFS data. Thus you need to replace the local copy with the one on the NFS server.
  • ssh to data-services2
  • cd /hadoop/
  • rm -rf hadoop/hdfs
  • cp -a hdfs hadoop/
Make very sure you delete the old, local copy, and not the one on NFS. It will be nearly impossible to recover from deleting the copy on NFS server.

installation

This is a log of what I did. However if you're going to install a new verson, you should use the Ambari documentation at hortonworks.com. It has step by step instructions. What I did is based on them. There are a few matters of interpretation, so that will show you have I interpreted them.

in every vm that is going to be used, things with + are done by ansible hadoop.yml

  • NOTE: For HDP 3 Kerberos to work, the ambari node must be part of the cluster. It needs all the users. Of course no services are installed on it other than the mandatory.
  • make sure java is 1.8
  • make sure python is 2.7.x
  • +in bash, ulimit -Sn and ulimit -Hn should be both at least 10,000 [number]. I created /etc/security/limits.d/30-nofile.conf with set nofile to 10000 for everyone. It's not so clear to me whether this is needed. ambari creates files for each of its users with their own specification. It may be that where the document talks about this it only means to check whether limits can be set to 10,000, but doesn't mean we should actually do it -- since they'll do it themselves. But for the moment I've set it to 10,000 for everyone.
  • +allow root ssh from the main ambari server to each of the vms
  • make sure nokrb5conf is set in ansible hosts file. kerberizing will replace krb5.conf, and we need to use theirs. actually we use a mix, starting with theirs but adding the services
  • +install the JCE, get http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
  • +install the JCE, unzip -o -j -q jce_policy-8.zip -d /usr/jdk64/jdk1.8.0_60/jre/lib/security/ using the appropriate location. Finding the location can be a challenge. because of the levels of indirection. In my case it's /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/lib/security/
  • +Create the users and the hadoop group. Some of the users have gotten into LDAP. The install scripts try to do changes to /etc/passwd, so this confuses them.
  • Make sure the places exist for the data. Normally stuff goes on /hadoop. It should be mounted in a unique place
  • In the following data-services1 and data-services3 will change for different installations. Make sure to use the right names.
  • install database on ambari node (data-services1 and utility node (data-services3). ambari database , hive database, oozie database
  • We will use the default mysql port, users and database names are ambari-db, hive-db, oozie-db, password will be the cluster service password. However the actual database names remove the -, as - isn't legal in them.
  • db root password is the same.
  • Ambari currently supporrs mysql 2.6 and mariadb 10 (10.2 with HDP 3). Yum has mariadb 5.5. So I downloaded the latest 10.0, the version for use with systemd.
  • For HDP 3 I installed the maridb repo per instructions at Mariadb.org and installed from there. There's already a service available called mariadb. To get yum install to work, had to erase mariadb-bench-5.5.60-1.el7_5.x86_64. Skip the following
    • untar the distribution in /usr/local/, symlink the versioned directory to mysql (This is the default location.)
    • mysql user and group already exist
    • following instructions in INSTALL-BINARY
    • start with bin/mysqld_safe -u mysql
    • ln -s /var/lib/mysql/mysql.sock /tmp
  • bin/mysql_secure_installation --basedir=/usr/local/mysql, default answers, set password using hadoop service password in 1password
  • mysql -u root, should require password. "use mysql; select * from user" verify that there are only entries for root and they have passwords
    • create /etc/systemd/system/mariadb.service.d/start.conf
[Service]

ExecStart=
ExecStart=/usr/local/mysql/bin/mysqld_safe -u mysql
ExecStartPre=
This overrides parameters in the system mariadb.service, which we assume exists
    • kill the mysql process
    • verify that systemctl start mariadb will start it properly
  • when you go into production, don't forget "systemctl enable mariadb" on data-services1 and data-services3 so mariadb starts automatically
  • verify that mysql-connector-java.noarch is installed. Should be in /usr/share/java
  • setup ambari database on data-services1
mysql -u root -p

CREATE USER 'ambari-db'@'%' IDENTIFIED BY 'xx';
GRANT ALL PRIVILEGES ON *.* TO 'ambari-db'@'%';
CREATE USER 'ambari-db'@'localhost' IDENTIFIED BY 'xx';
GRANT ALL PRIVILEGES ON *.* TO 'ambari-db'@'localhost';
CREATE USER 'ambari-db'@'data-services5.cs.rutgers.edu' IDENTIFIED BY 'xx';
GRANT ALL PRIVILEGES ON *.* TO 'ambari-db'@'data-services5.cs.rutgers.edu';
FLUSH PRIVILEGES;

where xx is replaced with the real password

  • on data-services3
HDP 2:

CREATE USER 'hive-db'@'localhost' IDENTIFIED BY 'xx';
GRANT ALL PRIVILEGES ON *.* TO 'hive-db'@'localhost';
CREATE USER 'hive-db'@'%' IDENTIFIED BY 'xx';
GRANT ALL PRIVILEGES ON *.* TO 'hive-db'@'%';
CREATE USER 'hive-db'@'data-services3.cs.rutgers.edu' IDENTIFIED BY 'xx';
GRANT ALL PRIVILEGES ON *.* TO 'hive-db'@'data-services3.cs.rutgers.edu';

CREATE DATABASE hivedb;

CREATE USER 'oozie-db'@'%' IDENTIFIED BY 'xx';
GRANT ALL PRIVILEGES ON *.* TO 'oozie-db'@'%';

CREATE DATABASE ooziedb;

FLUSH PRIVILEGES;

HDP 3:
; by default we get root at localhost but not the actual hostname
grant all privileges on *.* to 'root'@'data-services7.cs.rutgers.edu' identified by 'xx';
create database hive;                                                                                               
 grant all privileges on hive.* to 'hive'@'localhost' identified by 'xx';                                           
 grant all privileges on hive.* to 'hive'@'%.cs.rutgers.edu' identified by 'xx';                                    

 create database ranger;                                                                                            
 grant all privileges on ranger.* to 'ranger'@'localhost' identified by 'xx';                                       
 grant all privileges on ranger.* to 'ranger'@'%.cs.rutgers.edu' identified by 'xx';                                

 create database rangerkms;                                                                                         
 grant all privileges on rangerkms.* to rangerkms@'localhost' identified by 'xx';                                   
 grant all privileges on rangerkms.* to rangerkms@'%.cs.rutgers.edu' identified by 'xx';                            

 create database oozie;                                                                                             
 grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'xx';                                         
 grant all privileges on oozie.* to 'oozie'@'%.cs.rutgers.edu' identified by 'xx';                                  

 create database superset DEFAULT CHARACTER SET utf8;;                                                              
 grant all privileges on superset.* to 'superset'@'localhost' identified by 'xx';                                   
 grant all privileges on superset.* to 'superset'@'%.cs.rutgers.edu' identified by 'xx';                            

 create database druid DEFAULT CHARACTER SET utf8;;                                                                 
 grant all privileges on druid.* to 'druid'@'localhost' identified by 'xx';                                         
 grant all privileges on druid.* to 'druid'@'%.cs.rutgers.edu' identified by 'xx';                                  

exit;                                                                                                               


create database ambaridb;
use ambaridb;
source /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql;
  • ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
the command above doesn't do a real ambari setup. It just distributes the connector
  • ambari-server setup, take defaults until advanced database setup; say y
  • mysql
  • localhost
  • 3306
  • ambaridb
  • ambari-db
  • services password
  • ambari-server start
  • go to data-services1:8080, login as admin/admin
  • configure
  • [not] We need one non-default configuration. There's a problem between yarn and Java 8 that causes yarn to kill jobs because it thinks they are out of memory. To fix it, go to yarn configuration and in yarn-site custom configuration, add yarn.nodemanager.pmem-check-enabled=false
To enable ssl for ambari
  • get ssl cert. need cert and key, and the key may temporarily need to be readable to import it
  • ambari-server setup-security
  • give it cert and key. I used port 443, since this is the main service on that server
  • note that you have to fix the base for tez to be https: port 443
  • ambari-server restart
Following sets up a trust store. Our certs are all commercial, and the default Java trust store has the usual commercial providers, so many things don't need this. However I ended up having to do it to get the views in Ambari to work.
  • use openssl s_client -connect krb1.cs.rutgers.edu:636 and also for krb2 and capture the certs in krb1.crt and krb2.crt
  • /usr/jdk64/jdk1.8.0_112/bin/keytool -import -file /etc/ssl/cert.crt -alias ambari-server -keystore ambari-server-truststore
  • /usr/jdk64/jdk1.8.0_112/bin/keytool -import -file /etc/ssl/krb1.crt -alias krb1 -keystore ambari-server-truststore
  • /usr/jdk64/jdk1.8.0_112/bin/keytool -import -file /etc/ssl/krb2.crt -alias krb2 -keystore ambari-server-truststore
  • /usr/jdk64/jdk1.8.0_112/bin/keytool -import -file /etc/ssl/krb4.crt -alias krb4 -keystore ambari-server-truststore
  • put the trust store in /etc/ssl
  • ambari-server stop
  • ambari-server setup-security; use option 4, setup truststore to point ambari-server-truststore just created
  • ambari-server start
To enable SSL for zeppelin
  • I had a disaster doing this the first time. I've tried things to make it work that may not actually be needed.
  • setenv RANDFILE .rnd [if]
  • openssl pkcs12 -export -in ilab_cs_rutgers_edu.crt -inkey ilab_cs_rutgers_edu.key -name zeppelin -out ilab.pk12 [password]
  • keytool -importkeystore -deststorepass hadoop -destkeystore zeppelin-keystore.jks -srckeystore ilab.pk12 -srcstoretype PKCS12
  • get the first-level CA cert in a pem file. One way to do that is to go to a system with the same cert using firefox, look at the certs, and export the first one up from the host.
  • keytool -import -file ca.crt -keystore zeppelin-truststore.jks
  • put the file in /etc/ssl/zeppelin on [data-services1] and data-services2. keystore should be 440, group hadoop, but for the moment they are public. probably the copy of data-services1 isn't needed.
  • I tried to do this on port 443, but the system won't start when I do. I have no idea why not. So we have SSL on port 9995
In zeppelin, config, advanced
Zeppelin.ssl = true
Zeppelin.ssl.client.auth = false
Zeppelin.ssl.key.manager.password = hadoop
Zeppelin.ssl.keystore.password = hadoop
Zeppelin.ssl.keystore.path = /etc/ssl/zeppelin/zeppelin-keystore.jks
Zeppelin.ssl.keystore.type = JKS
Zeppelin.ssl.truststore.password = hadoop 
Zeppelin.ssl.truststore.path =  /etc/ssl/zeppelin/zeppelin-truststore.jks
Zeppelin.ssl.truststore.type = JKS

To access LDAP for passwords. Note that you still need to create the users, though there's a sync feature

  • ambari-server setup-ldap
  • obvious answers, but url is krb1.cs.rutgers.edu:636, not really a url [HDP]. Used krb4 as secondary.
  • HDP 3: for ldap specify member attribute memberUid, base cn=compat,dc=cs,dc=rutgers,dc=edu. The default base doesn't give a real member list when there are nested groups.
  • ambari-serber restart
  • [not] edit /usr/lib/python2.6/site-packages/ambari_server/serverUtils.py. replace SERVER_API_HOST with data-services1.cs.rutgers.edu. otherwise it uses 127.0.0.1, which will fail for ssl
  • [not] fix /usr/lib/ambari-server/ambari-server-2.6.0.0.267.jar. This is complex. I'll give instructions below.
  • put users you want to sync in a file users.txt. ended up not doing this. Just used a group
  • ambari-server sync-ldap --groups=groups.txt
Note that it will create all the users in the group. Thus we don't actually need a user list.
  • to sync nightly, put group name (e.g. ambari-users) in /etc/ambari.group, and in /etc/cron.d/sync-ldap, add
3  3  *  *  *  root ambari-server sync-ldap --groups /etc/ambari.group --ldap-sync-admin-name=admin --ldap-sync-admin-password
=XXXX

[so] hbase zookeeper session timed out. did three things:

  • in hbase, increased heap to 3G from 1
  • in hbase, increased zookeeper session timeout to 3 min
  • in zookeeper, increased length of single tick to 9000 ms. tick * 20 gives timeout, so this is 3 min. both timeouts have to be adjusted
[HDP] hive defaulted to allowing no one to create tables. I believe they expect we'll use ranger.
  • hdfs dfs -chmod 777 /warehouse/tablespace/managed/hive
[HDP] Yarn defaulted to only allowing yarn to submit jobs. In yarn, in capacity scheduled, for default queue, change submit acl to *
  • yarn.scheduler.capacity.root.default.acl_submit_applications=*
[HDP] After kerberos (and maybe before) HDP 3 timeline 2 wouldn't work. It couldn't talk to a special embedded copy of hbase that it uses. I ended up putting the reader into maintenace mode and ignoring errors. Also in yarn
  • yarn.timeline-service.version set to 1.5f
  • yarn.timeline-service.versions set to 1.5f
Needed to change memory allocations. With the initial settings we could only do 1 - 2 map reduce jobs at once. The map reduce job size was way too large, and YARN was set to use too little memory.
  • In YARN, Node memory 150 GB, max memory of job 20 GB
  • In Mapreduce, Map and Appmaster 4 GB, Reduce 8 GB. This is the same as Spark settings in Zeppelin
When saving the new values in the Ambari UI, you'll get a warning that lots of settings aren't as recommended. Keep them, with one exception: When changing the mapreduce settings, it will offer to change JVM parameters to match. You must accept that.
  • In hive, reduce heap size to 20g
  • In tez, reduce am memory to 20g
Note that we're not enforcing the memory allocation. A map-reduce user can specify a larger JVM than their memory allocation allows. I tried to turn on the softer of the two limits, and was unable to run the Spark pi demo. Furthermore, there were no usable error messages. I don't want to cause obscure problems for Spark users.

Fixing ambari-server

[not] The following section is for fixing sync of users. But if you use a group, it doesn't matter.

There's a bug in the code for retrieving data from ldap. See AMBARI-24029 To fix it,

  • retrieve source. This is ambari version 2.6.0.0, so got a tar file and untared it
  • make sure mvn and npm are installed
  • make sure your ~/.m2/settings.xml includes
<settings>
         <mirrors>
            <mirror>
               <id>public</id>
               <mirrorOf>*</mirrorof>
               <url>http://nexus-private.hortonworks.com/nexus/content/groups/public</url>
            </mirror>
         </mirrors>
</settings>
  • edit ./ambari-server/src/main/java/org/apache/ambari/server/security/ldap/AmbariLdapDataPopulator.java
  • in line 675 replace the while with. This is adding one null test
   } while (configuration.getLdapServerProperties().isPaginationEnabled()
             && processor.getCookie() != null && processor.getCookie().getCookie() != null);
  • in the main directory, do "mvn package -DskipTests=true". You only need to go until ambari-server is built.
  • copy ./ambari-server/target/classes/org/apache/ambari/server/security/ldap/AmbariLdapDataPopulator* to org/apache/ambari/server/security/ldap/ in your work directory. There should be 4 .class files
  • copy /usr/lib/ambari-server/ambari-server-2.6.0.0.267.jar to your working directory.
  • save a copy of the original
  • jar uf ambari-server-2.6.0.0.267.jar org/apache/ambari/server/security/ldap/*
  • use jar tf to make sure you replaced 4 class files
  • put the new version of the jar file in /usr/lib/ambari-server

services

Initially I started with a reduced set of services. I added some and ended up with all except accumulo, atlas, knox, logsearch, ranger, druid

failed: accumulo, druid; both known bugs I believe the others are unnecessary.

oozie apparently installed and started by the test failed, I believe because "su hdfs" isn't enough in a kerberized environment but the test used it.

Sqoop and mahout apparently installed, but the test failed. Don't know why, but it hung and I'm pretty sure nothing was actually present on the system. I suggest deploying sqoop separately because of this. I did sqoop -version and mahout -version on all hosts to make sure they were present.

Falcon requires a hack. After installing it, you'll get an alert because the web UI doesn't respond. There's a jar that Hortonworks can't ship. It has to be installed manually.

wget http://search.maven.org/remotecontent?filepath=com/sleepycat/je/5.0.73/je-5.0.73.jar -O /usr/hdp/current/falcon-server/server/webapp/falcon/WEB-INF/lib/je-5.0.73.jar

chown falcon:hadoop /usr/hdp/current/falcon-server/server/webapp/falcon/WEB-INF/lib/je-5.0.73.jar

I had trouble starting falcon. It complained about locking. I relocated /hadoop/falcon to /var/falcon, leaving a symlink. I was suspicious that NFS might not be handling locking correclty. That seemed to fix it.

kerberos

  • ambari will replace krb5.conf. Go to the kerberos section, configs, advanced krb5.conf. You'll find a template for the genrated krb5.conf. Change renew time to 365d and add the appdefaults section for kgetcred, pam_kmkhomedir, and register-cc. You need to restart the services to get it to regenerate the krb5.conf.
[libdefaults]
  noaddresses = true
I think this can be done within ambari by specifying a setting, but this seemed safest
  • make sure krb5.conf has default_ccache_name = /tmp/krb5cc_%{uid}. There's an ansible setting for this
  • install the JCE, unzip -o -j -q jce_policy-8.zip -d /usr/jdk64/jdk1.8.0_60/jre/lib/security/. (version number will be different) This was done above, but to the default java. ambari installs its own. It also installs a jce, so it's not clear to me whether this is needed
  • services must be running for this
  • On kerberos server do "ipa config-mod --defaultgroup=allusers". Without this attempts to create users will fail, because we have to specify the gid manually. Our default group doesn't have a GID, because there are performance issues if all users are in a Posix group. Thus our default is normallly a non-posix group. That means we have to specify the GID manually when creating users. (There ought to be a better way.)
  • [not] in ambari go to host/#/experimental, select enableipa, fairly near the bottom, save
  • in ambari, under admin, kerberos, enable kerberos
  • choose ipa
  • enter various data; i used the actual kerberos admin with the admin password, not saving it. This may not be needed, but I had odd errors before so this seemed safest.
  • the rest of automatic
  • when it's done, on the kerberos server do "ipa config-mod --defaultgroup=ipausers" to put the group back.
Note that it's hard to redo this, because the setup process wants to create principals. You'd have to delete them all. Or save the key tables, pick the manual method, and put the key tables back manually. That works as long as the set of services is the same.

To see what was created in ipa, "ipa user-find ilab" should find the users. To find services you'll need to look for services on all the hosts in the cluster. Or go to /etc/security/keytabs on each host and do klist -k -t on all the keytabs. That's the safest way to find all the principals.

added oozie as a user to ipa, to avoid complaints from the netapp

Once the system is kerberized, the hdfs user is hard to get to. To allow privileged operations, in IPA create a group hdfs and add administrators to it.

storage

By default data is kept in /hadoop. All HDFS data is stored in 6 copies. 3 nodes, as documented in all descriptions of HDFS, but what isn't documented is that ambari sets up each node to eep copies of the data in /hadoop/hdfs and /hadoop/hadoop/hdfs. (This is not true for HDP 3.) It makes no sense to have 6 copies on the same NFS file system, or even the same NFS file server.

First, change the replication factor from 3 to 1. This is done in ambari, hdfs, config. In the search box type replication.

[not] Next, put /hadoop on local storage on all servers, except mount /hadoop/hdfs on NFS. For the name node this will replicate the files on local and NFS, as by default they set up redundant directories. I think this's fine. However on data nodes, the two directories are combined. I.e. it's like striping. So for the data nodes both /hadoop/hdfs and /hadoop/hadoop/hdfs should be on NFS. If you change the mount points you need to update /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist.

Moved /hadoop/yarn and /hadoop/hadoop/yarn to NFS also, mounting in the expected place.

[HDP3] Initially I set up the system with /hadoop mounted on edinburgh-10g, a separate directory for each system. However after everything was set up, I made /hadoop local, copying files in most directories from the NFS directory to local. I mounted just the following from NFS:

  • hdfs
  • storm, data-services only
  • yarn, data nodes only
The idea was that if something is a collection of files it goes on NFS; if it's more like a database it goes on local. It's possible that the HDFS files for data-services should be local also, as they are namenode data, not files.

Spark defaults temporary files to /tmp. To create a space for them, create /hadoop/tmp on all systems, mounted the same place as /haddop/yarn, etc.

Then in spark and spark2 config, Custom spark-defaults,

spark.local.dir=/hadoop/tmp
spark.executor.extraJavaOptions=-Djava.io.tmpdir=/hadoop/tmp  
spark.driver.extraJavaOptions=-Djava.io.tmpdir=/hadoop/tmp

Add /usr/hdp/current/zookeeper-server/bin/zkCleanup.sh -n 4 to crontab on both services and data nodes.

Another exception: data-services3, /hadoop/var is on NFS. It's for statistics. It'a already 1.3 G. I'm worried it will grow too much.

Hbase: giving users access

Hbase ACLs don't do quite what we want. You can't create a table without create access, but global create access is too dangerous. (Without Kerberos, things are even worse: anyone can do anything to any table.)

So to give a user access, create a namespace for them and give them full control of that namespace.

On one of the data nodes:

as root
kinit -k -t /etc/security/keytabs/hbase.headless.keytab [email protected] [maybe hbase-ilab2, etc]
hbase shell
create_namespace 'netid'
grant 'netid', 'RWCA', '@netid'
^D

Tables in a namespace look like ns:table, i.e. they have the namespace prefixed with a colon. Otherwise it appears that they work the same.

If a class wants to do this, create a script or webapp that lets any user create a namespace.

(It's possible that if we disable ACLs it will do the right thing, but I haven't found documentation for that yet.)

Zeppelin

FYI: Zeppelin stores notebooks in HDFS /user/zeppelin/notebook. (In HDP 2, it started there but moves to a local directory, /usr/hdp/current/zeppelin-server/notebook/, after Kerberization. Because notebooks can be shared, Zeppelin owns there all, and there's a file showing who is authorized to access what in HDFS /user/zeppelin/conf/notebook-authorization.json.

The interpreter configuration is stored in HDFS, /user/zeppelin/conf/interpreter.json. I strongly suggest keeping a copy of this file, since Zeppelin now and then will mess it up, and you'll want to be able to restore it.

In ambari, zeppelin, in the shiro.ini section, edit the template.

  • In HDP 3, comment out the static password for admin and the 2 lines later for the password matcher. In this version you can't mix static passwords with LDAP.
  • uncomment the two lines for PAM authentication. Make the authentication domain zeppelin, not sshd. Make sure others such as ldap are commented.
Copy the sshd pam configuration to zeppelin. Note that sssd will be set to allow any ilab user to login. We want them to be able to use Zeppelin, but not to ssh in. So the sshd config file should have at the beginning of the account section
account    [default=ignore success=3] pam_localuser.so
account    [default=ignore success=2] pam_succeed_if.so user ingroup slide
account    optional     pam_echo.so ****** This system is not intended for login except through Zeppelin; please use data1, data2 or data3 for ssh if you need the hadoop cluster, otherwise use	ilab.cs.rutgers.edu
account    required     pam_deny.so

In /etc/pam.d/zeppelin the auth section should be

auth       required     pam_sepermit.so
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        required    pam_sss.so forward_pass

auth       include      postlogin
auth       optional     pam_exec.so /usr/libexec/hdfsmkdir
# Used with polkit to reauthorize users in remote sessions                                                                                                        
-auth      optional     pam_reauthorize.so prepare

Pam for zeppelin runs as zeppelin. This causes trouble with the normal pam stack. So we use an abbreviated one. pam_unix will always fail unless you have to be in the local passed file, but we need it to read the password. pam_sss will try to get two factors separately, which won't work in a web context. pam_unix will get just one and pass it on to pam_sss.

Also, for the scripts to run, zeppelin needs to be able to sudo. Create /etc/sudoers.d

##over ride defaults
Defaults>root   shell_noargs , \
                preserve_groups , \
                ! env_reset , \
                ignore_dot , \
                ! requiretty

##LCSR Specific Group
#zeppelin  ALL = (root) NOPASSWD: ALL
zeppelin  ALL = (ALL) NOPASSWD: ALL

Note that credentials won't be registered for renewal. Zeppelin doesn't call session, and we can't tell who is logged in anyway. Renewal is only done once the user starts an interpreter. See below on session management.



In users section of shiro.ini remove all users except admin. Admin needs both admin and coresysadmins rules, which isn't the default.

back to shiro.ini:

  • in roles section remove all roles except admin
  • at the end, in URL security, use
/api/version = anon
/api/interpreter/setting/restart/** = authc
/api/interpreter/** = authc, roles[coresysadmins]
/api/configurations/** = authc, roles[admin]
/api/credential/** = authc, roles[admin]
#/** = anon
/** = authc

R by default requires knitr, but Centos doesn't included it with R. TO load it and some other useful packagss, do

sudo R -e "install.packages('devtools', repos = 'http://cran.us.r-project.org')"
sudo R -e "install.packages('knitr', repos = 'http://cran.us.r-project.org')"
sudo R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')"
sudo R -e "install.packages(c('devtools','mplot', 'googleVis'), repos = 'http://cran.us.r-project.org'); require(devtools); install_github('ramnathv/rCharts')"
sudo R -e "install.packages('data.table', repos = 'http://cran.us.r-project.org')"

Zeppelin runs as zeppelin. That means that the pam modules run as zeppelin. The pam module creates hdfs directories for users. For that to work:

  • zeppelin has to be set so it can setuid to anyone. That has to be true anyway for user impersonation to work
  • /etc/security/keytabs/hdfs.headless.keytab has to be readable by zeppelin. There are lots of ways to do this. I used "setfacl -m u:zeppelin:r /etc/security/keytabs/hdfs.headless.keytab"
Zeppelin has hung. I'm restarting it at 4am daily with a cron job on data-services1.
#!/bin/bash

curl -u admin:xxx -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop Zeppelin via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' https://data-services1.cs.rutgers.edu/api/v1/clusters/ilab/services/ZEPPELIN

# actually took 16 sec
sleep 60

# Killing zeppelin doesn't kill the interpreters, but leaves them as orphans.
# Even though the processes are owned by users, they're part of a single
# zeppelin session.
ssh data-services2.cs.rutgers.edu loginctl terminate-user zeppelin

sleep 60

curl -u admin:xxx -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start Zeppelin via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' https://data-services1.cs.rutgers.edu/api/v1/clusters/ilab/services/ZEPPELIN

# took 48 sec

[HDP] In Ambari, zeppelin custom zeppelin-site, set zeppelin.interpreter.lifecyclemanager.class to org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager. That will cause idle interpreters to be killed after an hour.

[not] By default, new notebooks use Livy2. We want them to use Spark2. I can't tell how the ordering is done. Edit /usr/hdp/2.6.3.0-235/zeppelin/webapps/webapp/app.05bbdae750681c30f521.js. Find the last occurrence of note.defaultinterpreter. It is

e.note.defaultInterpreter=n.interpreterSettings[0]
Change the 0 to 3.

To make it permanent, replace the file in /usr/hdp/2.6.3.0-235/zeppelin/lib/zeppelin-web-0.7.3.2.6.3.0-235.war.

Setting admin password

Somewhere that you have a valid Maven pom file, do "mvn dependency:get -DgroupId=org.apache.shiro.tools -DartifactId=shiro-tools-hasher -Dclassifier=cli -Dversion=1.4.0"

Find shiro-tools-hasher-1.4.0-cli.jar in your ~/.m2 directory

In that directory, type "java -jar shiro-tools-hasher-1.4.0-cli.jar -gs -p ". Give it the ambari UI admin password. You'll get back something like "$shiro1$SHA-256$500 .... "

Try that java command without arguments to see what options you have. You might want to use sha512.

Edit shiro.ini. In the main section, add

passwordMatcher = org.apache.shiro.authc.credential.PasswordMatcher
iniRealm.credentialsMatcher = $passwordMatcher

In the user section, change admin = to "admin=$shiro1...., admin,coresysadmins". I.e. insert the encrypted password. Surprisingly, it doesn't seem necessary to quote it. The admin,coresysadmins are the groups it's in. We need coresysadmins because that's what is authorized to do some of the configuration.

User impersonation

In zeppelin, zeppelin-env, the script. At the end add

export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '
export ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER="false"

restart it.

Of course that requires an entry in /etc/sudoers.d allowing zeppelin to become any user.

Now in zeppelin, interpreters, under sh, choose per-user isolated. That will expose the "impersonate user." click it. Now you'll get sh running as the user, though without kerberos authentication. Do the same thing for all the interpreters, except maybe md, which I assume doesn't need a process.

If you're going to use impersonation for anything, you have to use it for everything. To make impersonation work, the user zeppelin has to be able to become any user. That makes it effectively root. We can't let users run as that user. So I've set it for everything except md, which I assume doesn't have an issue

I added an interpret %python. See below. Also, I recreated %spark and %spark2 with default settings, except the python interpreter. The initial one didn't work. I think that's because user impersonate didn't work with the default yarn mode. When I recreated it, it defaulted to mode local[*], where the normal Kerberos authentication works. For %spark and %spark2, make sure SPARK_HOME is set to /usr/hdp/current/spark-client/ or spark2-client.

Because users have to be able to create log files,

chmod 777 /var/log/zeppelin

Important: change zeppelin.interpreter.config.upgrade to false. Otherwise interpreter configs are reset at every restart.

Creating user sessions

We want each user's interpreters to run in a separate logind session, with its own cgroup. That lets us set memory limits, etc. We also do Kerberos ticket management on a per-interpreter basis. The reason is that we can't tell who is logged in until they start an interpreter. At least in HDP 3

  • They get a kerberos ticket on login
  • When they start an interpreter, try to renew it. If it works, register it for renewd, so tickets won't expire during a long job. Renewd will work since the interpreter process is owned by the user.
  • If we can't renew the Kerberos credential, it's probably expired. They need to log out and login. We won't start the interpreter until they do.
  • Thus we can guarantee that interpreters always have credentials avaiable, but if all interpreters time out, eventually they won't be able to start a new one because the Kerberos credentials will have expired. Can't do better with this version of Zeppelin. (The next one will let us see who is logged in.)

Old version

At the end of /etc/pam.d/sudo

session    [default=2 success=ignore] pam_exec.so quiet /usr/libexec/sudozeppelin
-session   optional     pam_systemd.so
session    optional     pam_exec.so /usr/libexec/setlimits.sh 
This tests whether sudo is being called from zeppelin, and if so does the two key pam calls.

Here is sudozeppelin:

#!/bin/sh

#printenv >> /tmp/sudozep
#echo pid $$ >> /tmp/sudozep

# This is intended to be called from pam.d/sudo session.
# It checks whether this is a sudo being done by zeppelin to start a
#   user process. If so, it succeeds. Otherwise it fails.
# That lets pam call pam_systemd just for zeppelin user jobs.
# Normally sudo doesn't want to create a new session, but for
#   Zeppelin user jobs we do.
# Put the sudo process's PID into the root cgroups. That removes
#   them from the session they're currently in. pam_systemd won't
#   start a new session if a process is already in a session, so
#   this is needed for pam_systemd to do anything.

if test "$PAM_RUSER" = "zeppelin" -a "$PAM_USER" \!= "root" -a "$PAM_TYPE" = "open_session"; then
  MYPPID=`awk '/^PPid:/{print $2}' /proc/$$/status`
  CMD=`cat /proc/$MYPPID/cmdline`
  if echo "$CMD" | grep -q "sudo.*-H.*-u.*source /usr/hdp/current/zeppelin-server" ; then
     echo $MYPPID >/sys/fs/cgroup/systemd/cgroup.procs 
     echo $MYPPID >/sys/fs/cgroup/memory/cgroup.procs 
     echo $MYPPID >/sys/fs/cgroup/cpu,cpuacct/cgroup.procs 
     exit 0
  fi
fi

exit 1

New version

/etc/pam.d/sudo, session section.With the new zeppelin we have idle timeout, so we don't need to restart very day. But we also can't tell what users are logged in, so there's no way to know what tickets to renew. Thus we call a script that tries to do kinit -R when you start an interpreter. If it works you have another 24 hours, and we register for renewd. If it fails, the user needs to login again to get Kerberos credentials.

Unfortunately we have no easy way to generate errors that the user will see. It turns out that if a pam script fails, Zeppelin will print the name of the failing script. So we name the script Your-session-has-expired.-Logout-and-login-again.

The first sudozeppelin prepares for system-auth by removing the process from any cgroups. Without that, pam_systemd doesn't create a login session, but assumes they're already in one.

sudozeppelin2 checks whether sudo is being invoked by zeppelin, since we want to check for Kerberos credentials and set memory limits if so.

# if zeppelin, remove from current cgroup so system-auth will work correctly                                                                       
session    optional     pam_exec.so quiet /usr/libexec/sudozeppelin
session    optional     pam_keyinit.so revoke
session    required     pam_limits.so
session    optional     pam_exec.so /usr/libexec/setlimits.sh
session    include      system-auth
# is this zeppelin?                                                                                                                                
session    [default=2 success=ignore] pam_exec.so quiet /usr/libexec/sudozeppelin2
# if so, do setlimits 
session    required     pam_exec.so /usr/libexec/Your-session-has-expired.-Logout-and-login-again.                                                                                            
session    optional     pam_exec.so /usr/libexec/setlimits.sh

/usr/libexec/sudozeppelin:

#!/bin/sh

#printenv >> /tmp/sudozep
#echo pid $$ >> /tmp/sudozep

# This is intended to be called from pam.d/sudo session.
# It checks whether this is a sudo being done by zeppelin to start a
#   user process. If so, it succeeds. Otherwise it fails.
# That lets pam call pam_systemd just for zeppelin user jobs.
# Normally sudo doesn't want to create a new session, but for
#   Zeppelin user jobs we do.
# Put the sudo process's PID into the root cgroups. That removes
#   them from the session they're currently in. pam_systemd won't
#   start a new session if a process is already in a session, so
#   this is needed for pam_systemd to do anything.

if test "$PAM_RUSER" = "zeppelin" -a "$PAM_USER" \!= "root" -a "$PAM_TYPE" = "open_session"; then
  MYPPID=`awk '/^PPid:/{print $2}' /proc/$$/status`
  CMD=`cat /proc/$MYPPID/cmdline`
  if echo "$CMD" | grep -q "sudo.*-H.*-u.*source /usr/hdp/current/zeppelin-server" ; then
     echo $MYPPID >/sys/fs/cgroup/systemd/cgroup.procs 
     echo $MYPPID >/sys/fs/cgroup/memory/cgroup.procs 
     echo $MYPPID >/sys/fs/cgroup/cpu,cpuacct/cgroup.procs 
     exit 0
  fi
fi

exit 0

/usr/libexec/sudozeppelin2:

#!/bin/sh

#printenv >> /tmp/sudozep
#echo pid $$ >> /tmp/sudozep

# test to see if this is a sudo from zeppelin for a user interpreter

if test "$PAM_RUSER" = "zeppelin" -a "$PAM_USER" \!= "root" -a "$PAM_TYPE" = "open_session"; then
  MYPPID=`awk '/^PPid:/{print $2}' /proc/$$/status`
  CMD=`cat /proc/$MYPPID/cmdline`
  if echo "$CMD" | grep -q "sudo.*-H.*-u.*source /usr/hdp/current/zeppelin-server" ; then
     exit 0
  fi
fi

exit 1

/usr/libexec/Your-session-has-expired.-Logout-and-login-again.

LOGIN=`getent passwd "$PAM_USER" | cut -d: -f3`

if sudo -u "$PAM_USER" kinit -R -c /tmp/krb5cc_"$LOGIN"; then
  # register for renewed                                                                                                                           
  touch /run/renewdccs/FILE:\\tmp\\krb5cc_"$LOGIN"
  exit 0
else
  exit 1
fi

Spark properties

You can set any spark property in the interpreter setup. For livy and livy2, prefix the property name with livy.

I've had to set spark.port.maxRetries to 1000. It defaults to 16. That limits the number of spark sessions to 16, which we can't live with. I ended up setting this is spark defaults for spark and spark2 cluster-wide, so the setting here isn't really needed anymore.

Livy and livy2:

livy.spark.executor.cores 4
livy.spark.executor.instances 3
livy.spark.executor.memory 4g
livy.spark.driver.cores 4
livy.spark.driver.memory 4g
livy.spark.port.maxRetries 10000

Spark and Spark2

SPARK_HOME /usr/hdp/current/spark2-client/   [omit 2 for spark]
master local[4]
spark.cores.max 4
spark.driver.memory 4g
spark.executor.memory 4g
spark.port.maxRetries 1000

Remove spark.yarn kerberos keytab and principal. Despite their name, they get used for local as well, which causes obvious problems. (The user can't read the key table.)

python setup

data-services2, and data123 must be set up like Jupyter:

  • /usr/lib/anaconda3 needs to be there
  • /etc/profile.d/pythonuser.sh and .csh need to be there. Note that I'm using a slightly different pythonuser on data123, because I don't want to change any more than I have to mid-semester.
  • automount for common/users and /common/clusterdata need to be set up
On data123, pythonuser.sh
export PYTHONUSERBASE="/common/clusterdata/$USER/local"
export PYSPARK_PYTHON=/usr/lib/anaconda3/bin/python3
# this is probably OK all the time, but to avoid breaking other things I'm
# only setting it for interactive shells. This is a bash-ism, but /bin/sh
# is bash for us.
export HDP_VERSION=2.6.3.0-235
if ! test `expr match "$PATH" "/usr/lib/anaconda3/bin"` -gt 0 ; then
   PATH="/usr/lib/anaconda3/bin:${PATH}"
fi
pythonuser.csh
setenv PYTHONUSERBASE "/common/clusterdata/$USER/local"
setenv PYSPARK_PYTHON "/usr/lib/anaconda3/bin/python3"
# this is probably OK all the time, but to avoid breaking other things I'm
# only setting it for interactive shells.
setenv HDP_VERSION 2.6.3.0-235
if (`expr match "$PATH" "/usr/lib/anaconda3/bin"` == 0) set path = ("/usr/lib/anaconda3/bin" $path)
The copies on jupyter and data-services2 don't have the HDP_VERSION, but that's just because I wanted to minimize retesting. I recommend putting them there.

At some point this broke spark-submit. Needed to convert /etc/hadoop/conf/topology_script.py to be compatible with python3. At the beginning replace the whole import section with

from __future__ import print_function
import sys, os
try:
    from string import join
except ImportError:
    join = lambda s: " ".join(s)
try:
    import ConfigParser
except ModuleNotFoundError:
    import configparser as ConfigParser
then find the print rack and change to print(rack).

However ambari will overwrite this.

  • copy it to topology_script3.py
  • change net.topology.script.file.name in Ambari's HDFS configuration to point to topology_script3.py
Must install the python interpreter:
  • /usr/hdp/2.6.3.0-235/zeppelin/bin/install-interpreter.sh -n python
  • In interpreters create it, set the python to /usr/lib/anaconda3/bin/python
  • set to per-user impersonate
  • in the notes, enable it
Python, %spark and %spark2 all have settings for the python to use. I use /usr/local/bin/zsparkpy
#!/bin/bash

# for %spark2.pyspark and %python
# uses special matlib backend and normal spark 2 python stuff

export PYTHONPATH="/usr/hdp/2.6.3.0-235/zeppelin/interpreter/lib/python:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client
/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH"
export PYSPARK_PYTHON="/usr/lib/anaconda3/bin/python3"
export PYTHONUSERBASE="/common/clusterdata/$USER/local"

exec /usr/lib/anaconda3/bin/python3 "$@"

%spark.pyspark uses /usr/local/bin/zsparkpy1

#!/bin/bash

# for %spark2.pyspark
# uses special matlib backend and normal spark 2 python stuff

export SPARK_HOME="/usr/hdp/current/spark-client"
export PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH"

#export PYTHONPATH="/usr/hdp/2.6.3.0-235/zeppelin/interpreter/lib/python:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH"
export PYSPARK_PYTHON="/usr/bin/python"
export PYTHONUSERBASE="/common/clusterdata/$USER/local"

exec /usr/bin/python "$@"

The configurations for %spark and %spark2 must be edited to change the python from "python" to /usr/local/bin/zsparkyp[1]. Also, for %spark, you need to add SPARK_HOME as /usr/hdp/current/spark-client/. Without that, python and R don't work.

The newest version of Zeppelin will run ipython if possible. For that to work, the following modules must be installed in the version of python involved. I did it for anaconda's python and the system python2, just in case someone choose python 2 explicitly.

  • the command is "python -m pip install PACKAGE", but you may need to type the path for python to get the right one
  • python -m pip install ipykernel
  • python -m pip install jupyter-client
  • python -m pip install ipython
  • python -m pip install grpcio
  • python -m pip install protobuf
I also installed py4j.

HOWEVER, for the anaconda environment you really want to use conda to do the installation. Several of those things are there already. What you actually need to do is

conda install py4j
conda install grpcio
conda install protobuf

Documentation for Zeppelin doesn't mention all of this, particularly protobuf. Zeppelin checks for most of it and will warn you, but if protobuf is missing, it will mysteriously fail. (Fixed in a future release.)

To get python2 to work, I ended up installing Anaconda python2 in /usr/lib/anaconda2. It needs the same "conda install" commands. User documentation says how to activate it. Rather than just running the executable, I use a python2 version of /usr/local/bin/zsparkpy, this one called /usr/local/bin/zsparkpy2:

#!/bin/bash

# for %spark2.pyspark and %python
# uses special matlib backend and normal spark 2 python stuff

export PYTHONPATH="/usr/hdp/3.1.0.0-78/zeppelin/interpreter/lib/python/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH"
export PYSPARK_PYTHON="python2"
export PYTHONUSERBASE="/common/clusterdata/$USER/local"
export PATH="/usr/lib/anaconda2/bin:$PATH"

exec /usr/lib/anaconda2/bin/python2 "$@"

livy

Livy is part of spark, so there's no option to add it in the normal services menu. Instead, go to the host that you want to be the server (services2 in this case), and click add. You'll see an optin for the livy server. Similarly add spark2 client and then livy for spark2 to data-services2.

The following says how to set up Livy: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_zeppelin-component-guide/content/config-livy-interp.html. But it looks to me like it' really about using livy for Zeppelin to access Spark. I'm not clear whether that's actually needed.

views

[this] Files view:

I have terrible problems getting the files view to work. Most of it was that krb5.conf on data-services1 isn't generated by ambari, and thus was the default. noaddress most be set true, or it fails.

Other than that, the published instructions work. It is at least possible that none of this was needed, and the only problem was noaddress in krb5.conf

  • In hdfs configuration, in custom core-site [note]
hadoop.proxyuser.ambari-server-ilab.hosts *
hadoop.proxyuser.root.groups *
hadoop.proxyuser.root.hosts *
  • in the view itself,
WebHDFS Authorization auth=KERBEROS;proxyuser=ambari-server-ilab
cluster configuration: custom
WebHDFS FileSystem URI*   webhdfs://data-services2.cs.rutgers.edu:50070
  • grant permission to some group all users are in

hdfs won't start

HDFS has a namenode, a backup name node, and data nodes. In our case the three data* systems are the data nodes.

When you start, the data nodes start first, then the namenode. The name node waits for all of the data nodes to check in. Each one reports the number of blocks it has. The namenode knows how many there are supposed to be. It waits until it sees them all.

If everything works, you have a 2 min wait. (It waits for 2 min before checking.) If it never sees all the blocks, it doesn't come up.

There are a couple of reasons it might not see all the blocks:

  • One of the data nodes isn't up, or there's a network or Kerberos issue. You'll need to fix those.
  • There's a problem with the file system. Try fsck as explained in the next section. You'll need to fix it until fsck is clean before it will come online. Fsck can be run with the file system still in safe mode. But it it shows a problem, you'll have to exit safe mode. If you do that when ambari is still trying to start, the system will come up. You don't want that. So if there's a failure in fsck, abort the startup, fix hdfs, then stop all and start all.
  • Fsck says everything is OK, but a few blocks are missing. On the name node, /var/log/hadoop/hdfs/hadoop-hdfs-namenode-data-servicesX.cs.rutgers.edu.log should show messages saying that not all blocks are there yet.
If fsck shows that the file system is OK, and there are just a few blocks missing, you can force it to come online using
hdfs dfsadmin -safemode leave

hdfs recovery

Note that hdfs depends upon Zookeeper. If it didn't start, fix that first.

hdfs is a distributed file system. The data is stored on data1,2,3. Servers run on those nodes. However they are coordinated by the name server, which is on data-services2. If there are issues, they show up with the name server / namenode not coming on line. In Ambari, the startup will simply hang at that point. It's not clear when it's hung, since it takes serveral minutes to come online normally. but you can tell by looking at the log. Do "ls -lt" in data-services2:/var/log/hadoop/hdfs. There will be two new files with the same name, ending in .log and .out. You want .log.

I have seen it work if I try again. I.e. use ambari to shut down all services and then start all services again. Let's assume that doesn't work.

If there's an issue, you will want to abort the ambari startup. That will leave hdfs running, but in readonly mode. It changes to read/write when it comes on line. The HDFS terminology is "safe mode."

The following commands require you to be the user "hdfs"

su - hdfs
klist -k -t /etc/security/keytabs/hdfs.headless.keytab 
; note the principal, it should be something like [email protected]
kinit -k -t /etc/security/keytabs/hdfs.headless.keytab PRINCIPAL

This command will do an fsck of hdfs:

hdfs fsck /    > file

The two kinds of errors you're most likely to see are corrupt blocks / files, and under replicated files. Corrupt files will keep the system from coming up. Look through the file to see if you care about any of them. Most likely they'll be temporary files. If there are ones you care about, you can find them from a backup on data-services4. It has daily snapshots, if you need to go back.

The following command will remove all corrupt files. Note that you have to leave safe mode, to allow writes:

hdfs dfsadmin -safemode leave
hdfs fsck / -delete

The fsck will report errors. With "-delete" it actually fixes them, but that's not obvious from the output. You can do it again to verify that they're fixed.

Under replicated files mean that somehow something set a replication factor over 3. Since we only have 3 data nodes, you can't replicate a file more than 3. We normally use 1. Since all 3 replicas are on the same NFS file server, replication doesn't seem to buy much. The following will find all under replicated files and fix them, by setting the replication count to 1 on them. Note that this assumes you're using bash.

hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files
for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; sudo -u hdfs hadoop fs -setrep 1 $hdfsfile; done

Try "hdfs fsck / " again, and make sure everything is OK.

At this point I would do an ambari shutdown followed by an ambari startup. Some of the systems may get into an odd state if HDFS isn't working.

Zeppelin won't start correctly

Note that Zeppelin depends upon HDFS. If there's an issue with HDFS, fix it first.

Zeppelin is restarted nightly at 4am. It doesn't recover idle sessions, so this is the only way to prevent an infinite number of sessions building up. (This is fixed in the next release.) The cron job is on data-services1 (the Ambari system), because it uses Ambari commands to do the restart. See "crontab -l" on that host. You'll see a check command following by a few minutes. If the restart failed, it sends me email.

To make sure it's up, go to https://data-services2.cs.rutgers.edu:9995 and make sure you see a login screen.

If the restart fails, the first thing I would do is login to ambari as admin and try to restart Zeppelin again. That might actually work, if there was a transient failure in the environment.

Let's assume the restart fails, or results in a server that gives an error.

To see what is going on, on data-services2, look at /var/log/zeppelin/zeppelin-zeppelin-data-services2.cs.rutgers.edu.log.

There's one failure mode we've seen that is likely to happen again: During startup, Zeppelin loads all the user notebooks. Unfortunately if there's a problem with one of them, it can abort the startup. (This has supposedly been fixed in a newer version.)

You can tell which notebook caused the trouble by going to /usr/hdp/current/zeppelin-server/notebook/. Do

ls -f | cat > /tmp/foo

"ls -f" means to do a listing without sorting. Normally "ls" sorts in alphabetical order. However Zeppelin loads files in the order that the directory entries occur in the directory. That's what "ls -f" shows. The problem will be the one after the latest that the log shows was loaded.

Remove it or move it, and restart zeppelin. I would look at the JSON file in the directory, and either send it to the user or pull out just the code and send it. Otherwise the user could lose substantial work.

hbase recovery

We had a crash. After we came up, hbase wouldn't stay online. on services2, in /var/log/hbase/hbase-hbase-master-data-services2.cs.rutgers.edu.log

This entry has information you probably don't need, because some of it is magic that's hard to find and I thought you might need it.

There are two major things to look at:

  • the hbase table itself
  • the zookeeper state information
Our problem is that it woudn't come online. It turned out the deleting the zookeeper state information fixed it. The last entry here says how to do that. You need to use zkCli.sh, which is a CLI for zookeeper, and delete the entries for hbase-secure.

If the table data is bad, you can try to fix it with hbase hbck.

Now for the narrative, some of which you won't need:

  • The first problem was " Waiting for namespace table to be online. Time waited" Eventually it times out and hbase stops. Google came up with the most drastic solution to delete all of /apps/hbase/data/WALs/. That worked. But WAL is the write log, so this will lose data in progress.
  • Now it came up, but the web page said 2 servers were in transition, and they wouldn't go online. I tried hbase hbck, and that didn't help. A real fix looked complex. So I remove hbase from the two servers involved, and the one left worked fine. I then reinstalled hbase on this. I assume this also would lose some data. At this point it looks OK.
  • After more experiments it seems likely that just stopping the node in trouble is enough. It causes another node to take over the region.
Here's how to use the tool that's supposed to help recover:

on dataservices2

  • su - hbase
  • kinit -k -t /etc/security/keytabs/hbase.headless.keytab [email protected]
  • export HBASE_SERVER_JAAS_OPTS=-Djava.security.auth.login.config=/usr/hdp/current/hbase-client/conf/hbase_master_jaas.conf
  • hbase hbck
Further reading suggests that the "region in transition" can be fixed by deleting hbase information in zookeeper. It apparently keeps allocations of regions to region servers and other state information. Supposedly you can delete all of it and restarting hbase will recreate it.

Running the tool is interesting. Create a file /usr/local/zkcli.conf

Client {  
com.sun.security.auth.module.Krb5LoginModule required  
useKeyTab=true  
keyTab=/etc/security/keytabs/hdfs.headless.keytab  
storeKey=true  
useTicketCache=false  
[email protected]
}; 

Now do

  • export JVMFLAGS="-Djava.security.auth.login.config=/usr/local/zkcli.conf"
  • /usr/hdp/current/zookeeper-server/bin/zkCli.sh
"rmr /hbase" is supposed to do it, except there are two hbases. It looks like you want hbase-secure. Of course hbase needs to be down for this. Note that I haven't tried this.

The problem is that rmr /hbase-secure runs into security problems. To fix, ignore the two things above (which give you onlly read-only access). on data-services2

  • cd /usr/hdp/current/zookeeper-server
  • java -cp "./zookeeper.jar:lib/slf4j-api-1.6.1.jar" org.apache.zookeeper.server.auth.DigestAuthenticationProvider super:password
  • it outputs super:XXX copy that
  • in ambari's zookeeper config, find zookeeper-env template config, add to the end
  • export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dzookeeper.DigestAuthenticationProvider.superDigest=super:XXX" where xxx is the encrypted password printed
  • save, and restart zookeeper.
  • /usr/hdp/current/zookeeper-client/bin/zkCli.sh -server data-services2.cs.rutgers.edu
  • addauth digest super:password
Now you can do rmr /hbase-secure

Once it works, I recommend removing the entry from Ambari' zookeeper config and restarting. The problem with this approach is that you're exposing the encryptd form of the password to everyone. Rmember that everyone can see our configurations. Of course it's omly the encrypted password, but still it's not a good idea.

zookeeper recovery

We ran into a situation where components were failing to start, and the logs showed missing info in zookeeper. What's worse, which components failed differed when you started again.

Zookeeper runs on data-services2, data-services3, and data1. Look at the directories under /hadoop/zookeeper. On data1, the current database file was a few bytes, while the others were megabytes. I suspect disk had filled. This explains the inconsistent results, since it depends upon which componenet talked to which copy.

If you have just one bad database, fixing it is easy. Shut down all the hadoop components using ambar. Start the two good zookeepers, just zookeeper, not the rest of hadoop. Remove all files from /hadoop/zookeeper/version-2. It will notice that its files are missing and fetch a good copy. Now you can start the rest of hadoop.

checking yarn status

During a period when lots of users are working, you may want to see who is doing what, so you can deal with the possibility that yarn might run out of resources. There are several tools

  • resource manager ui - this shows all jobs, current and past, and total resources used. However finding resource usage for each job is a drag.
  • "yarn application -list" - will list all jobs currently running. Doesn't show resource usage though.
  • "yarn application -list -appStates ALL" - will show the whole history.
  • "yarn application -status application_1542400534304_0694" - shows detailed info on one job
  • "yarn top" - this is the best summary. In order to see specifics of all jobs you need to su to "yarn" with the right Kerberos credentials. Those should always be set up on data-services3. If you had to set it up you would kinit from /etc/security/keytabs/rm.service.keytab . It doesn't appear that credentials exist other than on data-services3.
If you need more than will fit on the screen, try "yarn top -rows 1000 | more".
⚠️ **GitHub.com Fallback** ⚠️