JAVA 读取lzo压缩文件 - adofu/tec-blog GitHub Wiki

今天犯了一个愚蠢的问题,用lzo做过压缩的数据,用lzop去读,蛋疼了好久。简单做以记录

lzopcode和lzocode的做个简单介绍:

  • lzocode压缩过的文件都是以.lzo_deflate结尾,相应的加载类:(com.hadoop.compression.lzo.LzoCodec)
  • zopcode压缩过的文件都以.lzo结尾(com.hadoop.compression.lzo.LzopCodec)

读取lzocode文件

private static Configuration conf = new Configuration(true);  
private static FileSystem hdfs;  
private static Class<?> codecClass ;
private static CompressionCodec codec;

static {  
        String path = "/usr/local/webserver/hadoop/etc/hadoop/";  
        conf.addResource(new Path(path + "core-site.xml"));  
        conf.addResource(new Path(path + "hdfs-site.xml"));  
        //加载解压lzo的class,对应的还有lzop的class
        codecClass = Class.forName("com.hadoop.compression.lzo.LzoCodec");
        codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf);
}  

public List<String> readFile(String dir) {
    InputStream input = null;
    List<String> list = new LinkedList<String>();
    try {
        Path path = new Path(dir);
        FileSystem hdfs = FileSystem.get(URI.create(dir),conf); 
        
        //获取hdsf上文件夹下面的文件
        FileStatus[] fileStatus = hdfs.listStatus(path);
        //遍历文件,逐一读取内容
        for (int i = 0; i < fileStatus.length; i++) {
            input = hdfs.open(new Path(fileStatus[i].getPath().toString()));
            //解压缩流
            input = codec.createInputStream(input);
            list.addAll(IOUtils.readLines(input,"utf8"));
        }

    } catch (IOException e) {
        e.printStackTrace();
    }finally{
        try {
            if(input != null)
                input.close();
            hdfs.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        
    }
    return list;
}
⚠️ **GitHub.com Fallback** ⚠️