快速HAC聚类算法的改进及应用于无监督语音分割
Improvement of Fast HAC Clustering Algorithm and Application to Unsupervised Speech Segmentation
DOI: 10.12677/CSA.2020.108153, PDF,   
作者: 韦占江, 梁 宇:云南大学软件学院,云南 昆明
关键词: 无监督音素HAC算法语音分割相邻Unsupervised Phoneme HAC Algorithm Speech Segmentation Adjacent
摘要: HAC是一种常用的聚类方法。本文的目的是根据语音特征中的音素与连续时间的紧密关系,改进HAC快速算法提高无监督分割语音信号到类似音素单位。该算法是基于同一段特征相似度高于跨段特征的相似度。特征的相似度是通过计算相邻特征间的欧式距离,来得到输入语音特征相邻的距离双链表,链表中的每个节点由语音相邻特征的距离和指向前后相邻节点的指针组成。该算法也是通过遍历相邻距离节点链表,查找最小距离后,对相似的相邻特征进行合并,并重复迭代至最后一个类或满足某个阀值。整个过程完全基于无监督下完成,该方法优于快速HAC算法,与快速HAC算法相比能提升65倍以上的聚类速度,节约更多的内存空间,可应用于零资源的语音分割。
Abstract: HAC is a commonly used clustering method. According to the close relationship between phonemes and continuous time in speech features, the purpose of this paper is to improve the HAC fast algorithm to improve the unsupervised segmentation of speech signals to similar phoneme units. The algorithm is based on the fact that the similarity of the same segment feature is higher than that of the cross-segment feature. The similarity of features is to calculate the Euclidean distance between adjacent features to obtain the adjacent distance double-linked list of input speech features. Each node in the linked list is composed of the distance of adjacent speech features and pointers pointing to the adjacent nodes before and after. The algorithm also traverses the linked list of adjacent distance nodes, finds the minimum distance, combines similar adjacent features, and iterates to the last class or satisfies a certain threshold. The whole process is completed completely without supervision. This method is better than the fast HAC algorithm. Compared with the fast HAC algorithm, it can improve the clustering speed by more than 65 times, save more memory space, and can be applied to zero-resource speech segmentation.
文章引用:韦占江, 梁宇. 快速HAC聚类算法的改进及应用于无监督语音分割[J]. 计算机科学与应用, 2020, 10(8): 1464-1470. https://doi.org/10.12677/CSA.2020.108153

参考文献

[1] 周涛, 袁飞, 庄旭. 最简数据挖掘[M]. 北京: 电子工业出版社, 2020.
[2] 邹臣嵩, 段桂芹. 基于改进K-medoids的聚类质量评价指标研究[J]. 计算机系统应用, 2019, 28(6): 235-242.
[3] Xie, W.-B., Lee, Y.-L., et al. (2020) Hi-erarchical Clustering Supported by Reciprocal Nearest Neighbors. Information Sciences, 527, 279-292. [Google Scholar] [CrossRef
[4] Lee, L.-S., Lee, H.-Y. and Chan, C.-A. (2015) Spoken Content Re-trieval—Beyond Cascading Speech Recognition with Text Retrieval. IEEE/ACM Transactions on Audio, Speech and Language Processing, 23, 1389-1420.
[5] Qiao, Y., Shimomura, N. and Minematsu, N. (2008) Unsupervised Optimal Phoneme Segmentation: Objectives, Algorithm and Comparisons. IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, 31 March-4 April 2008, 3989-3992. [Google Scholar] [CrossRef
[6] Wang, H.P., et al. (2015) Acoustic Segment Modeling with Spectral Clustering Methods. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 264-277. [Google Scholar] [CrossRef
[7] Yang, S.-W., Liu, A.T. and Lee, H.-Y. (2019) Understanding Self-Attention of Self-Supervised Audio Transformers. Computer Science, 8, 15-19.
[8] Jain, A.K. (2010) Data Clus-tering: 50 Years beyond k-Means. Pattern Recognition Letters, 31, 651-666. [Google Scholar] [CrossRef
[9] Pratap, R., Deshmukh, A., Nair, P. and Dutt, T. (2018) A Faster Sampling Algorithm for Spherical k-Means. Proceedings of Machine Learning Research, Vol. 95, 343-358.
[10] HAC with Minimum SSE Criterion.
https://hlab.stanford.edu/brian/error_sum_of_squares.html
[11] 李琳山. 数位语音处理概念. 2019. http://speech.ee.ntu.edu.tw/courses.html
[12] Novotney, S., Schwartz, R. and Ma, J. (2009) Unsuper-vised Acoustic and Language Model Training with Small Amounts of Labelled Data. IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, 19-24 April 2009, 4297-4300. [Google Scholar] [CrossRef
[13] 吴信东, 库玛尔. 数据挖掘十大算法[M]. 北京: 清华大学出版社, 2014.
[14] 李勃昊, 张连海, 郑永军. 基于声学分段模型的无监督语音样例检测[J]. 数据采集与处理, 2016, 18(12): 41-44.
[15] Yarmish, G., Listowsky, P. and Dexter, S. (2017) Distributed Lance-William Clustering Al-gorithm.
https://arxiv.org/ftp/arxiv/papers/1709/1709.06816.pdf
[16] Bhati, S., Nayak, S., Sri Rama Murty, K. and Dehak, N. (2019) Unsupervised Acoustic Segmentation and Clustering Using Siamese Network Embeddings. INTERSPEECH 2019, Graz, 15-19 September 2019, 2668-2672.
[17] Jansen, A. and Van Durme, B. (2011) Efficient Spoken Term Discovery Using Randomized Algorithms. IEEE Automatic Speech Recognition and Understanding (ASRU), Hawaii, 11-15 December 2011, 401-406. [Google Scholar] [CrossRef
[18] Badino, L., Canevari, C., Fadiga, L. and Metta, G. (2014) An Autoencoder Based Approach to Unsupervised Learning of Subword Units. Acoustics, Speech and Signal Processing (ICASSP), Florence, 4-9 May 2014, 7634-7638. [Google Scholar] [CrossRef
[19] 詹竣安. 以口語查詢之非督導式口語詞彙偵測[D]: [博士学位论文]. 台北: 台湾大学, 2012.
[20] Mary, L. and Deekshitha, G. (2018) Searching Speech Databases: Features, Techniques and Evaluation Measures. Springer, Berlin. [Google Scholar] [CrossRef
[21] Nazari, Z. and Kang, D. (2018) A New Hierarchical Clustering Algorithm with Intersection Points. IEEE Uttar Pradesh Section In-ternational Conference on Electrical, Electronics and Computer Engineering, Gorakhpur, 2-4 November 2018, 315-319. [Google Scholar] [CrossRef
[22] Chung, C.T. and Lee, L.S. (2018) Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 394-405.