23
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      TERSE/PROLIX ( TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This article presents a fast and lossless algorithm for compressing diffraction data, achieving up to 85% reduction in file size while processing up to 2000 512 × 512 frames s −1. This breakthrough in compression technology is a significant step towards more efficient analysis and storage of large diffraction data sets.

          Abstract

          High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard( zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+ LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/ Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.

          Related collections

          Author and article information

          Contributors
          Role: Editor
          Journal
          Acta Crystallogr A Found Adv
          Acta Crystallogr A Found Adv
          Acta Cryst. A
          Acta Crystallographica. Section A, Foundations and Advances
          International Union of Crystallography
          2053-2733
          01 November 2023
          25 September 2023
          25 September 2023
          : 79
          : Pt 6 ( publisher-idID: a230600 )
          : 536-541
          Affiliations
          [a ]Biozentrum, University of Basel , Basel, Switzerland
          [b ]Laboratory of Nanoscale Biology, Paul Scherrer Institute , Villigen, Switzerland
          Czech Academy of Sciences, Czech Republic
          Author notes
          Correspondence e-mail: jp.abrahams@ 123456unibas.ch
          Article
          lu5031 ACSAD7 S205327332300760X
          10.1107/S205327332300760X
          10626653
          37743849
          2eb27731-2050-4af0-800c-cf072c5880a5
          © Matinyan and Abrahams 2023

          This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

          History
          : 10 May 2023
          : 31 August 2023
          Page count
          Pages: 6
          Funding
          Funded by: HORIZON EUROPE Marie Sklodowska-Curie Actions
          Award ID: 956099
          Funded by: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
          Award ID: 205320_201012
          This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska Curie grant agreement No. 956099, and from the Swiss National Science Foundation project grant No. 205320_201012.
          Categories
          Research Papers

          compression,terse/prolix,trpx,lossless,diffraction data,cryo-em data,lossless data compression

          Comments

          Comment on this article

          scite_
          0
          0
          0
          0
          Smart Citations
          0
          0
          0
          0
          Citing PublicationsSupportingMentioningContrasting
          View Citations

          See how this article has been cited at scite.ai

          scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

          Similar content65