Home - Genetalks/gtz GitHub Wiki

What is GTX.Zip?

GTX.Zip(or GTZ for short) is a professional fastq/bam compressor and also can be used as a universal data compression software, developed by GTXLab of Genetalks Inc. GTX.Zip can rapidly compress any DNA sequencing files and directories with very high compression rate, and generate a single compressed data files, thus facilitating the data storage, distribution and transmission. Different from other compression tools, GTX.Zip system focuses on high compression rate, high speed, and convenient data extraction.

GTX.Zip has released three products: GTX.Zip Professional、GTX.Zip Enterprise、GTX.Zip Cloud.

GTX.Zip Professional

GTX.Zip Professional is a stand-alone version which supports local compression service. GTX.Zip Professional runs by command lines for compression and decompression of local genomic data.

  • It can be used by Companies, Institutions and individual users that with large local sequencing data, etc.
  • Installation Guide.

GTX.Zip Enterprise

GTX.Zip Enterprise is designed to make full and flexible use of specified local computing capacity and corporate intranet to provide distributed compression and transmission service for companies and data centers.

  • It can be used by Large-scale enterprises and data centers that with PB-level sequencing data and require distributed compression by their own computing clusters, etc.
  • It takes a professional engineer to install this software on the LAN.
    If you are interested in it, please contract us.

GTX.Zip Cloud

GTX.Zip Cloud provides compression and distribution service for users who have tons of sequencing data online already. The compression server will be created in the same region as the storge server so there is no hidden traffic fee. GTX.Zip Cloud supports Alibaba Cloud, AWS, Tencent Cloud, and Huawei Cloud, etc.

  • It can be used by Companies that with large amounts of sequencing data distribution and storage in the cloud.
  • You can get it by visiting http://gtz.io .

Supported Bioinformatic Analysis Softwares

BWA 0.7 for GTX.Zip is the most widely used software package for mapping DNA sequences that can input XXX.gtz files. It consists of two softwares : bwa 0.7 and bwa-opt 0.7.

  • Bwa-opt 0.7 is the optimized version that is about 30% faster than standard bwa, and its mapping results are completely consistent with those of standard bwa.
  • If you want more software for GTX.Zip , please click -here-.

Features

  • High Compression Ratio: The system implements Context Model compression technology, with a variety of optimized predicting model, and balancing the system concurrent and memory resources consumption, thus achieving a extreme high compression rate. For FASTQ files, GTX.Zip is capable to compress the original fastq file to 2.53%. The compression rate of GTX.Zip is about 3-6 times of gzip compressor which could save up to 80% storage space and transfer costs.
Data List Compression rate of GTX.Zip Compression rate of Fastq.gz
Nova_wes_1.fq 2.53% 17.15%
Nova_wes_2.fq 3.45% 18.34%
nova_wgs_1.fq 3.18% 17.55%
nova_wgs_2.fq 3.93% 18.66%
nova_rna_1.fq 4.56% 17.70%
nova_rna_2.fq 5.39% 18.94%
  • High Performance: GTX.Zip fully exploits the concurrency of the CPU, the new Haswell CPU architecture, and the computing power of the new instructions such as AVX2, BMI2, which makes GTX.Zip gain high compression speed even on a ordinary computing server, with the throughput of 1100MB/s for a single compression node. GTX.Zip Enterprise supports large-scale distributed compression.

  • Safety Guarantee: Thanks to its high speed, during the process of GTX.Zip compression, the data decompression and restore test is performed. The compression process will be done only after the data has been confirmed exactly the same as the source data. MD5 validation is performed to ensure data integrity as well.

  • Software Ecology: GTX.Zip provides command line and GUI decompression software for Linux, Mac OSX and Windows. It also provides SDK interfaces in languages such as Python, C, C++, etc. which is convenient for third-party developers to read and write gtz file (GTX.Zip compression format) directly. For example, gtz version of bcl2fastq, fastp and BWA are supported by community now.
    If you want to get these softwares, please go to GTZ Ecology Softwares.

  • Nirvana Plan:
    As an enterprise-level software, GTX.Zip has developed a nirvana program for high-availability requirements to ensure that users can decompress compressed data into original data under the extreme condition. The nirvana plan's dual availability protection strategy is as follows:

    • GTX.Zip is multi-site hosted. http://gtz.io website, GitHub and other sites will permanently host all versions of GTX.Zip, to make sure that it is available to the entire network all the time and free of charge at any time.
    • To ensure that compressed data can be restored to original file under any conditions, pre-embedded micro decompression programs could be extract from compressed data first, and then be used to decompress the file.

System Environment Requirements

  • 64-bit Linux system (CentOS 6.5 or above, or Ubuntu 12.04 or above)
  • To achieve good performance, the computing server with 32-core 64GB memory is recommended (at least 4-core and 8GB memory), or that has the same configuration with the AWS C4.8xlarge machine)

Contact Us

If you have any questions, feel free to contact: [email protected], or create a new GitHub issue .

License

See LICENSE for details.

⚠️ **GitHub.com Fallback** ⚠️