torchvinecopulib.util package¶
Module contents¶
- class torchvinecopulib.util.ENUM_FUNC_BIDEP(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
Enum
an enum class for bivariate dependence measures
- chatterjee_xi = functools.partial(<function chatterjee_xi>)¶
- ferreira_tail_dep_coeff = functools.partial(<function ferreira_tail_dep_coeff>)¶
- kendall_tau = functools.partial(<function kendall_tau>)¶
- mutual_info = functools.partial(<function mutual_info>)¶
- wasserstein_dist_ind = functools.partial(<function wasserstein_dist_ind>)¶
- torchvinecopulib.util.cdf_func_kernel(obs: Tensor, is_scott: bool = True) callable [source]¶
kernel density estimation (KDE) function of the cumulative distribution function (CDF)
- Parameters:
obs (torch.Tensor) – observations, of shape (n,1)
is_scott (bool, optional) – whether to use Scott’s rule for bandwidth, defaults to True
- Returns:
a CDF function by KDE
- Return type:
callable
- torchvinecopulib.util.chatterjee_xi(x: Tensor, y: Tensor, M: int = 1) float [source]¶
revised Chatterjee’s rank correlation coefficient (ξ), taken max to be symmetric
Chatterjee, S., 2021. A new coefficient of correlation. Journal of the American Statistical Association, 116(536), pp.2009-2022.
Lin, Z. and Han, F., 2023. On boosting the power of Chatterjee’s rank correlation. Biometrika, 110(2), pp.283-299. “a large negative value of ξ has only one possible interpretation: the data does not resemble an iid sample.”
- Parameters:
x (torch.Tensor) – obs of shape (n,1)
y (torch.Tensor) – obs of shape (n,1)
M (int) – num of right nearest neighbors
- Returns:
revised Chatterjee’s rank correlation coefficient (ξ), taken max to be symmetric for (X,Y) and (Y,X)
- Return type:
float
- torchvinecopulib.util.debye1(x: float) float [source]¶
computes the Debye function of order 1.
- Parameters:
x (float) – upper limit of the integral
- Returns:
Debye function of order 1; 0 if x<=0
- Return type:
float
- torchvinecopulib.util.ferreira_tail_dep_coeff(x: Tensor, y: Tensor) float [source]¶
pairwise tail dependence coefficient (λ) estimator, max of rotation 0, 90, 180, 270 x and y are both of shape (n, 1) inside (0, 1) symmetric for (x,y), (y,1-x), (1-x,1-y), (1-y,x), (y,x), (1-x,y), (1-y,1-x), (x,1-y) Ferreira, M.S., 2013. Nonparametric estimation of the tail-dependence coefficient;
- torchvinecopulib.util.kendall_tau(x: Tensor, y: Tensor, tau_min: float = -0.999, tau_max: float = 0.999) float [source]¶
x,y are both of shape (n, 1)
- torchvinecopulib.util.mutual_info(x: Tensor, y: Tensor, is_sklearn: bool = True) float [source]¶
mutual information, need scikit-learn or fastkde installed. x,y are both of shape (n, 1)
Purkayastha, S., & Song, P. X. K. (2024). fastMI: A fast and consistent copula-based nonparametric estimator of mutual information. Journal of Multivariate Analysis, 201, 105270.
- torchvinecopulib.util.ref_count_hfunc(dct_tree: dict, tpl_first_vs: tuple[tuple[int, frozenset]] = (), tpl_sim: tuple[int] = ()) tuple[dict, tuple[int], int] [source]¶
reference counting for each data vertex during cond-sim workflow, for garbage collection (memory release) and source vertices selection; 1. when len(tpl_sim) < num_dim: vertices not in tpl_sim are set on the top lv, vertices in tpl_sim are set at deepest lvs 2. when len(tpl_sim) == num_dim: check dct_first_vs to move vertices up
- Parameters:
dct_tree (dict) – dct_tree inside a DataVineCop object, of the form {lv: {(v_l, v_r, s): bidep_func}}
tpl_first_vs (tuple[tuple[int, frozenset]], optional) – tuple of vertices (explicitly arranged in conditioned - conditioning set) that are taken as known at the beginning of a simulation workflow, only used when len(tpl_sim)==num_dim, defaults to tuple()
tpl_sim (tuple[int], optional) – tuple of vertices in a full simulation workflow, gives flexibility to experienced users, defaults to tuple()
- Returns:
number of visits for each vertex; tuple of source vertices in this simulation workflow from shallowest to deepest; number of hfunc calls
- Return type:
tuple[dict, tuple[int], int]
- torchvinecopulib.util.solve_ITP(f: callable, a: float, b: float, eps_2: float = 1e-09, n_0: int = 1, k_1: float = 0.2, k_2: float = 2.0, j_max: int = 31) float [source]¶
Solve an arbitrary function for a zero-crossing.
Oliveira, I.F. and Takahashi, R.H., 2020. An enhancement of the bisection method average performance preserving minmax optimality. ACM Transactions on Mathematical Software (TOMS), 47(1), pp.1-24.
https://docs.rs/kurbo/0.8.1/kurbo/common/fn.solve_itp.html
https://en.wikipedia.org/wiki/ITP_method
! It is assumed that f(a) < 0 and f(b) > 0, otherwise unexpected results may occur.
The ITP method has tuning parameters. This implementation hardwires k2 to 2.0, both because it avoids an expensive floating point exponentiation, and because this value has been tested to work well with curve fitting problems.
The n0 parameter controls the relative impact of the bisection and secant components. When it is 0, the number of iterations is guaranteed to be no more than the number required by bisection (thus, this method is strictly superior to bisection). However, when the function is smooth, a value of 1 gives the secant method more of a chance to engage, so the average number of iterations is likely lower, though there can be one more iteration than bisection in the worst case.
The k1 parameter is harder to characterize, and interested users are referred to the paper, as well as encouraged to do empirical testing. To match the the paper, a value of 0.2 / (b - a) is suggested, and this is confirmed to give good results. When the function is monotonic, the returned result is guaranteed to be within epsilon of the zero crossing.
- torchvinecopulib.util.wasserstein_dist_ind(x: Tensor, y: Tensor, p: int = 2, reg: float = 0.1, num_step: int = 50, seed: int = 0) float [source]¶
- Wasserstein distance from bicop obs to indep bicop, by ot.sinkhorn2 (averaged for each observation).
Need pot installed.
- Parameters:
x (torch.Tensor) – copula obs of shape (n,1)
y (torch.Tensor) – copula obs of shape (n,1)
p (int, optional) – p-norm to calculate distance between each vector pair, defaults to 2
reg (float, optional) – regularization strength, defaults to 0.1
num_step (int, optional) – number of steps in the independent bivariate copula grid, defaults to 50
seed (int, optional) – random seed for torch.manual_seed(), defaults to 0
- Returns:
Wasserstein distance from bicop obs to indep bicop
- Return type:
float