节点文献
基于粗集与聚类投票机制的光谱双星特征分析
Spectral Binary Star Analysis Based on Rough Set and Cluster Voting Mechanism
【摘要】 光谱双星通常是指光谱中呈现双主导成分特征,由于该双成分复杂多样,其成因也多种多样,同时光谱信噪比相对比较低,现有许多分析方法将双成分系统光谱分离成两条光谱进行分析,而分离方法无法保证光谱的准确性,现有聚类方法的单次聚类可靠性比较低。提出一种基于粗集与聚类投票机制的光谱双星分析与评估方法,采用多次聚类和投票思想,给出每条光谱属于相应类别的梯度可靠性。该方法包含两个部分:(1)采用不同思想的聚类算法,将光谱双星数据集进行重构,将每种聚类算法标签采用匈牙利算法将聚类标签对齐作为光谱属性,从而重构数据集。(2)利用投票机制,得票数反映聚类结果的一致程度,获得每条光谱的类别,定义粗集示踪每类光谱特征,采用上/下近似集给出每条光谱所归类别的可靠性。选择郭守敬望远镜(LAMOST) DR10发布光谱双星集作为分析对象,采用基于划分的K-means、基于模型的GMM(Gaussian mixture model)、谱聚类(spectral clustering)和层次聚类(agglomerative clustering)四种聚类算法重构光谱数据集,选择得票数下界μ为2,通过投票得到1、 0.75、 0.5为可靠性梯度的聚类结果。其中大约1/3的样本可靠性为1,说明这批样本的四种聚类结果完全一致;对每类光谱和投票数的信噪比进行统计分析,投票数低的样本的信噪比相对较低,是它们被不同的聚类算法划分到不同类别的原因之一;对可靠性为1的6类光谱样本进行了物理成因的分析,其中以双星、河内星云+目标恒星两种为主,聚类标签的差异可能由于两种成分流量差异或拼接、定标等数据处理所导致。也有可能由于光谱质量较低导致pipeline误判的因素,其天区位置分布与低质量数据分布特征的研究基本一致。
【Abstract】 Spectral binary star usually refers to the spectra that show double dominant component characteristics. Due to the double component’s complexity and diversity, its formation is complicated. At the same time, the spectral signal-to-noise ratio is relatively low. Many of the existing analytical methods separated two-component system spectra into two spectra. Still, the separation method can’t guarantee the accuracy of the spectra, and the reliability of the existing clustering methods of the single clustering is relatively low. This paper proposes a binary star spectrum analysis and evaluation method based on a rough set and cluster voting mechanism. Using the idea of multiple clustering and voting, the gradient reliability of each spectrum belongs to the corresponding category. The method consists of two parts: First, the spectral binary star data set is reconstructed by using clustering algorithms with different ideas, and each clustering algorithm label is aligned with the Hungarian algorithm as a spectral attribute to reconstruct the data set. Secondly, the voting mechanism is used to reflect the consistency of the clustering results and give the category of each spectrum. At the same time, rough sets are defined to trace the characteristics of each spectrum, and the reliability of the classification of each spectrum is given by using the up/down approximation set. LAMOST DR10 was selected to publish the spectral set of binary stars as the analysis object. Four clustering algorithms, partition-based K-means, model-based Gaussian mixture model(GMM), Spectral clustering, and Agglomerative clustering, were used to reconstruct the spectral data set. Select the lower bound of votes as 2 and obtain clustering results with reliability gradients of 1, 0.75, and 0.5 through voting. About 1/3 of the samples have a reliability of 1, indicating that the four clustering results of this batch of samples are completely consistent. The SNR of each spectrum and the number of votes arestatistically analyzed. The SNR of the samples with the low number of votes is relatively low, which is one of the reasons why they are divided into different categories by different clustering algorithms. We analyzed the physical origin of 6 spectral samples with a reliability of 1, among which binary stars, Hanoi Nebula, and target stars were the main ones. The difference in clustering labels may be caused by the difference in the flow rate of the two components or data processing such as splicing and calibration. In addition, factors may lead to pipeline misjudgment due to low spectral quality, and its sky location distribution is consistent with the research on the distribution characteristics of low-quality data.
【Key words】 Spectral binary star; Spectral analysis; Clustering algorithm; LAMOST (Large Sky Area Multi-Object Fiber Spectroscopy Technology);
- 【文献出处】 光谱学与光谱分析 ,Spectroscopy and Spectral Analysis , 编辑部邮箱 ,2025年02期
- 【分类号】P153
- 【下载频次】22