Hadoop - nimrody/knowledgebase GitHub Wiki Consolidating many small files from S3 to HDFS using Hadoop Thumbstack Spark and Cloudera Impala21