Get started
A simple example with two variables
Let’s import gen_data
, get_tps
and latentcor
from package latentcor
.
from latentcor import gen_data, get_tps, latentcor
First, we will generate a pair of variables with different types using a sample size n=100
which will serve as example data. Here first variable will be ternary, and second variable will be continuous.
simdata = gen_data(n = 100, tps = ["ter", "con"])
print(simdata['X'][ :6, :])
[[ 2. 0.58681373]
[ 0. 1.56689355]
[ 1. 1.03330599]
[ 1. 1.8223853 ]
[ 1. 0.17617261]
[ 1. -0.3987981 ]]
simdata['plotX']
The output of gen_data is a list with 2 elements:
simdata['X']
: a matrix (\(100\times 2\)), the first column is the ternary variable; the second column is the continuous variable.simdata['plotX']
:None
(showplot = False
, can be changed to display the plot of generated data ingen_data
input).
Then we use get_tps
to guess data types automatically.
data_types = get_tps(simdata['X'])
print(data_types)
['ter' 'con']
Then we can estimate the latent correlation matrix based on these 2 variables using latentcor
function.
estimate = latentcor(simdata['X'], tps = data_types)
print(estimate['R'])
0 1
0 1.000000 0.550859
1 0.550859 1.000000
print(estimate['Rpointwise'])
0 1
0 1.00000 0.55141
1 0.55141 1.00000
print(estimate['plot'])
None
print(estimate['K'])
0 1
0 1.000000 0.306667
1 0.306667 1.000000
print(estimate['zratios'])
[[0.3 nan]
[0.8 nan]]
The output of estimate
is a list with several elements:
estimate['R']
: estimated final latent correlation matrix, this matrix is guaranteed to be strictly positive definite (throughstatsmodels.stats.correlation_tools.corr_nearest
projection and parameternu
, see Mathematical framework for estimation) ifuse.nearPD = True
.estimate['Rpointwise']
: matrix of pointwise estimated correlations. Due to pointwise estimation, it is not guaranteed to be positive semi-definiteestimate['plot']
:None
by default asshowplot = False
inlatentcor
. Otherwise displays a heatmap of latent correlation matrix.estimate['K']
: Kendall \(\tau (\tau_{a})\) correlation matrix for these \(2\) variables.estimate['zratios']
: a list has the same length as the number of variables. Here the first element is a (\(2\times1\)) vector indicating the cumulative proportions for zeros and ones in the ternary variable (e.g. first element in vector is the proportion of zeros, second element in vector is the proportion of zeros and ones.) The second element of the list isnumpy.nan
for continuous variable.