Persiancoclust

Notation
Objective Function of Models
Co-clustering
Datasets
Model
Visualization
Cite
References

Notation

$\mathbf{X}$: Word-Document counts, Movie-Viewer ratings, Product-Customer purchases matrices
$\mathbf{R}$: Row-coefficient matrix
$\mathbf{B}$: Summarization matrix
$\mathbf{C}$: Column-coefficient matrix

Objective function of models

$NMTF_{\alpha}$

D_{\alpha}(\mathbf{X}|| \mathbf{RBC}^{\top})

$ONMTF_{\alpha}$

D_{\alpha}(\mathbf{X}|| \mathbf{RBC}^{\top})
+
\delta \; Tr(\mathbf{R}\Psi_{g}\mathbf{R}^{\top})
+
\beta \;  Tr(\mathbf{C}\Psi_{s}\mathbf{C}^{\top}),

Co-clustering

NMTF

Screenshot: 'README.md'

Example of Co-clustering on Word-Document Matrix

Screenshot: 'README.md'

Word Cloud Co-clustering for Digikala Persian Comments

Screenshot: 'README.md'

Datasets

Datasets	Documents	Words	Number of clusters
Digikala	3261	10728	3
Digimag	6896	80160	7
Persian news	1644	28216	8
Psychological advice text Persian	79	1929	11
Snappfood	3891	4303	3

For more details see this page

import pickle
                                                                   # Read Data Sets ------->  Digikala
# Loading pickle data from a file
with open('tfidf_Digikala.pkl', 'rb') as f:
        tfidf_Digikala = pickle.load(f)

# Loading pickle data from a file
with open('labels_Digikala', 'rb') as f:
        labels_Digikala = pickle.load(f)

true_labels = np.sort(labels_Digikala)

Model

from NMTFcoclust.Models.NMTFcoclust_ONMTF_alpha import ONMTF
from NMTFcoclust.Models.NMTFcoclust_NMTF_alpha import NMTF

ONMTF_alpha = ONMTF(n_row_clusters = 3, n_col_clusters = 3, delta = 0.03,  beta = 0.03,  alpha = 0.1, max_iter=1)
ONMTF_alpha.fit(tfidf_Digikala)

NMTF_alpha = NMTF(n_row_clusters = 3, n_col_clusters = 3, alpha = 2, max_iter=1)
NMTF_alpha.fit(tfidf_Digikala)

from sklearn.metrics import confusion_matrix 

confusion_matrix(np.sort(true_labels), np.sort(ONMTF_alpha.row_labels_))

from NMTFcoclust.Evaluation.EV import Process_EV

Process_Ev = Process_EV( np.sort(true_labels), tfidf_Digikala , ONMTF_alpha) 
Process_Ev = Process_EV( np.sort(true_labels), tfidf_Digikala , NMTF_alpha) 



Accuracy (Acc):0.8761116222017786
Normalized Mutual Info (NMI):0.6836524406477642
Adjusted Rand Index (ARI):0.7667679710034221
Confusion Matrix   (CM):
[[2181  201    0]
 [   0  216  203]
 [   0    0  460]]

Screenshot: 'README.md'

Word Cloud Co-clustering for Persian News

Screenshot: 'README.md'

Cite

Please cite the following paper in your publication if you are using Persiancoclust in your research:

 @article{Persiancoclust, 
    title={Orthogonal Non-negative Matrix Tri-Factorization with $\alpha$-Divergence for Persian Text Co-clustering.}, 
    DOI={Preprint}, 
    journal={Iranian Journal of Science (preprint)}, 
    authors={Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour}, 
    year={2023}
}