https://github.com/genomoncology/FuzzTypes/tree/main
Danger Testing
dangertesting.comOpen source, high-throughput, fault-tolerant vector embedding pipeline
Simple API endpoint that ingests large volumes of raw data, processes, and stores or returns the vectors quickly and reliably
Simple API endpoint that ingests large volumes of raw data, processes, and stores or returns the vectors quickly and reliably
dgarnitz • GitHub - dgarnitz/vectorflow: VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Exon is an analysis toolkit for life-science applications. It features:
- Support for many file formats from bioinformatics, proteomics, and others
- Local filesystem and object storage support
- Arrow FFI primitives for multi-language support
- SQL based access to bioinformatics data -- general DML and some DDL support
wheretrue • GitHub - wheretrue/exon: Exon is an OLAP query engine specifically for biology and life science applications.
DataTrove
DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.
DataTrove processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster. Its (relatively) low memory... See more
DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.
DataTrove processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster. Its (relatively) low memory... See more
