Femto is a novel image compression methodology and related suite of tools designed at MHacks: Refactor. It was created to allow machine learning algorithms to run more efficiently on images by compressing them using pre-computed features. Further, the femto methodology can be reduced smaller than the size of a jpeg without sacrificing information and is transmissable by text, making it both lightweight and low overhead.
The femto suite includes an automated setup tool for Apache Spark (PySpark) and the SciPy stack on Linode servers, allowing distributed scientific computing using master and slave configurations on multiple machines.
Femto treats images as an array of pixels and performs singular value decomposition to isolate important features. The singular values are encoded using a collection of performance optimizations (including multi-stage clustering to increase redundancy, indexing using k-strings, and standard text compression) on the server side. A pre-chosen amount of encoded data is then sent to the client where image reconstruction or data analysis takes place.
The femto protocol manages to compress large TIF images by up to 8 times without significant loss in quality or information. Additionally, femto is between 40% and 50% of the size of JPEG encoding (when converting from large TIF or Raw files).
Tested Use Cases
Femto has been tested using clustering algorithms (Lloyd's) and computer vision algorithms (blob identification). While being a smaller file, the reconstructed images using the femto protocol were just as accurate as uncompressed TIF images while being faster to complete.