Text clustering is an important technology in data mining and machine learning. It is widely used in event discovery and tracking, document summarization, search results clustering and other issues. Although there are many researches on text clustering, there are still many challenging problems to be solved:
(1) How to set the number of clusters? Is it possible to automatically discover the number of clusters from the data?
(2) How to deal with the sparsity of short text?
(3) How to automatically discover abnormal documents in a dataset?
(4) How to deal with the concept drift problem of stream text clustering?
We proposed model based algorithms for text clustering, which can meet the above challenges to a certain extent. Relevant data and codes are as follows: https://github.com/jackyin12