sgkit: Statistical genetics toolkit in Python

sgkit: Statistical genetics toolkit in Python

Sgkit is a Python package that provides a variety of analytical genetics methods through the use of general-purpose frameworks such as Xarray, Pandas, Dask and Zarr. The sgkit API makes as few assumptions as possible about the origin, structure, and intended use of genetic data by adopting a set of domain-specific conventions that allow such data to be used within this broader ecosystem of tools. The package is designed for complex workflows over large distributed datasets but attempts to make it as easy as possible to scale down to smaller datasets and access simpler functionality for those that may be new to Python (though there is still a good bit of work to done on this front). See Getting Started for more details.

Sgkit is inspired heavily by scikit-allel and Hail, both popular Python genetics toolkits with a respective focus on population and quantitative genetics.

Indices and tables