节点文献
高维协方差矩阵结构检验
Testing the Structures of High-dimensional Covariance Matrices
【作者】 许林;
【导师】 史宁中;
【作者基本信息】 东北师范大学 , 概率论与数理统计, 2014, 博士
【摘要】 伴随着计算机技术的快速发展和广泛应用,高维数据被广泛采集并存储(见[7])。当数据维数相对于样本量较大的时候,许多经典的极限定理体现出了其自身的局限性。因为经典的极限定理通常是假设数据的维数是固定的,而样本量是可以趋于无穷的。在过去的二十年间,许多关于高维数据的分析方法被人们所研究,例如[3],[6],[5],[12]等。在这些研究成果中,大维随机矩阵理论([7]),线性谱统计量的中心极限定理(CLT,[60])在高维数据统计推断中体现出了重要价值。两个高维协方差矩阵成比例检验一直是多元统计分析中的一个重要研究问题,在现实生活中具有重要的应用价值。在本文的第二章中,我们在样本量和数据维数成比例增长的前提下,利用随机矩阵理论提出了拟似然比检验统计量(PLRT)在原假设下的中心极限定理。大量的模拟研究表明(PLRT)在高维数据协方差矩阵成比例检验中具有良好的表现。然后,我们在第三章中推广了拟似然比检验统计量的应用范围,提出了新的检验方法(LZ),使之具有更为广泛的应用价值。我们通过数据模拟表明新的拟似然比检验统计量(LZ)对于高斯分布和非高斯分布都具有良好的检验能力。同时当数据维数相对于样本量较大的时候,新的拟似然比检验统计量的检验性能明显优于已有的检验方法。相似地,两个高维变量的独立性检验在高维数据分析及现实应用中也有重要价值。例如在临床实验中,我们通常关心两个高维变量集合是否独立。本文的第四章,针对于两个高维变量的独立性检验提出了新的拟似然比检验统计量(PLST),并在理论上证明了其渐近正态性。大量的模拟结果表明该检验统计量的第一类错误符合检验要求且有较高的势(power)。
【Abstract】 With the rapid development and wide applications of computer techniques,high-dimensional data can be collected and stored. This is called as high-dimensionaldata or large-dimensional data(see [7]). Many traditional estimation and test toolsare no more valid or perform badly for such high-dimensional data, since thesetraditional methods are often based on the classical central asymptotic theoremswhich assuming a large sample size and fixed dimension. Therefore, some newstatistical methods about high-dimensional data analysis have been studied in thelast score years,(see [3],[6],[5],[12], etc.). Especially, the random matrix theory(RMT)[7] and the central limit theorems (CLT,[60]) for linear spectral statisticsplay important roles in the statistical inference.In the second chapter of this article, we consider testing proportionality ofhigh-dimensional covariance matrices from two diferent populations. The propor-tionality of covariance matrices is the simplest form of heteroscedasticity betweenpopulations, which has extensive applications in economics, discriminations, etc.This paper generalizes the work of [5] and concerns the test of proportionality oftwo high-dimensional covariance matrices Σ1and Σ2which allows any positiveconstant. Based on the modern random matrix theory, this paper proposes apseudo-likelihood ratio test (PLRT) and proves asymptotic normality property asthe dimension and sample sizes (1,2) tend to infinity proportionally. Sim-ulation studies show that the pseudo-likelihood ratio test behaves well for bothhigh-dimensional Gaussian and non-Gaussian distributions.The independence test for two multivariate variables is a classical testing prob-lem in multivariate statistical analysis and also widely used in real life. For instance,in canonical correlation analysis, it is important to know whether two sets of vari-ates are independent. Besides, in micro-array data analysis on genes and DNA test,it is meaningful to check whether there is correlation among pieces of genes. The fourth chapter of this paper proposes a new test procedure by trace criterion fortesting the independence of sets of two large-dimensional multivariate variables. Anew test statistic is developed based on the modern random matrix theory whichbehaves well no matter the dimension is either small or large relative to the samplesize. Under some regularity conditions, the asymptotic normality property of thenew test statistic as the dimensions and the sample size tend to infinity simultane-ously and proportionally is established. Numerical simulations are carried out toillustrate that the new test statistic is more efective then existent ones.
【Key words】 high-dimensional data; random matrices; covariance matrices; pseudo-likelihood ratio test; CLT;