Hadoop - animeshtrivedi/notes GitHub Wiki

How to enable short circuit

First need to setup the native environment. You can check with

   $ hadoop checknative -a

you need to set up the java.library.path, so put hadoop lib path in the LD_LIBRARY_PATH as

export LD_LIBRARY_PATH="/home/your_hdfs/lib/native/":$LD_LIBRARY_PATH

then in the ./bin/hadoop script as

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS -Djava.library.path=$LD_LIBRARY_PATH:"

this is to start with.

Now we need to create a UNIX socket domain. I created a folder at /var/lib/hadoop-hdfs/ and gave access to me user. And that was it. Then in the hdfs-site.xml put these

<configuration>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/lib/hadoop-hdfs/dn_socket</value>
  </property>
</configuration>

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html

DISCARDABLE MEMORY AND MATERIALIZED QUERIES

https://hortonworks.com/blog/dmmq/

DISCARDABLE DISTRIBUTED MEMORY: SUPPORTING MEMORY STORAGE IN HDFS

https://hortonworks.com/blog/ddm/

Misc. notes about Hadoop

How to see block distribution

In crail

./bin/crail fsck -t getLocations -f /sql/parquet-100m/part-00002-fc266a6a-663b-4ece-a2c8-453d54f784b9.parquet -y 0 -l 1280255769

In HDFS

./bin/hdfs fsck /sql/parquet-100m/part-00000-505ae4a0-0f0a-4210-a27c-bd854d95787e.parquet -files -blocks -locations