latentcor: Fast Computation of Latent Correlations for Mixed Data

Documentation Status https://img.shields.io/pypi/v/latentcor.svg https://app.travis-ci.com/mingzehuang/latentcor_py.svg?branch=master https://codecov.io/gh/mingzehuang/latentcor_py/branch/master/graph/badge.svg?token=SF57J6ZW0B

Introduction

latentcor is an Python package for estimation of latent correlations with mixed data types (continuous, binary, truncated, and ternary) under the latent Gaussian copula model. For references on the estimation framework, see

Statement of need

No Python software package is currently available that allows accurate and fast correlation estimation from mixed variable data in a unifying manner. The Python package latentcor, introduced here, thus represents the first stand-alone Python package for computation of latent correlation that takes into account all variable types (continuous/binary/ordinal/zero-inflated), comes with an optimized memory footprint, and is computationally efficient, essentially making latent correlation estimation almost as fast as rank-based correlation estimation.

Installation

The easiest way to install latentcor is using pip.

pip install latentcor

Example

Let’s import gen_data, get_tps and latentcor from latentcor.

from latentcor import gen_data, get_tps, latentcor

First, we will generate a pair of variables with different types using a sample size n=100 which will serve as example data. Here first variable will be ternary, and second variable will be continuous.

simdata = gen_data(n = 100, tps = ["ter", "con"])
print(simdata['X'][ : 6, : ])
[[ 2.          1.50695058]
 [ 1.          2.21447941]
 [ 2.         -0.62085717]
 [ 0.         -2.44107528]
 [ 1.         -1.47804335]
 [ 2.         -1.23434909]]

Then we can estimate the latent correlation matrix based on these 2 variables using latentcor function.

estimate = latentcor(simdata['X'], tps = ["ter", "con"])
print(estimate['R'])
          0         1
0  1.000000  0.580435
1  0.580435  1.000000

Community Guidelines

  • Contributions and suggestions to the software are always welcome. Please consult our contribution guidelines prior to submitting a pull request.

  • Report issues or problems with the software using github’s issue tracker.

  • The easiest way to replicate development environment of latentcor is using pip:

pip install -r requirements_dev.txt

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.