License https://github.com/Saeidhoseinipour/Persiancoclust/blob/master/Models/NMTFcoclust_ONMTF_alpha.py https://github.com/Saeidhoseinipour/NMTFcoclust https://github.com/Saeidhoseinipour/NMTFcoclust

Table of Contents

  1. Notation
  2. Objective Function of Models
  3. Co-clustering
  4. Datasets
  5. Model
  6. Visualization
  7. Cite
  8. References

Notation

  • $\mathbf{X}$: Word-Document counts, Movie-Viewer ratings, Product-Customer purchases matrices

  • $\mathbf{R}$: Row-coefficient matrix

  • $\mathbf{B}$: Summarization matrix

  • $\mathbf{C}$: Column-coefficient matrix

Objective function of models

  • $NMTF_{\alpha}$

    D_{\alpha}(\mathbf{X}|| \mathbf{RBC}^{\top})
    
  • $ONMTF_{\alpha}$

    D_{\alpha}(\mathbf{X}|| \mathbf{RBC}^{\top})
    +
    \delta \; Tr(\mathbf{R}\Psi_{g}\mathbf{R}^{\top})
    +
    \beta \;  Tr(\mathbf{C}\Psi_{s}\mathbf{C}^{\top}),
    

Co-clustering

NMTF

Screenshot: 'README.md'

Example of Co-clustering on Word-Document Matrix

Screenshot: 'README.md'

Word Cloud Co-clustering for Digikala Persian Comments

Screenshot: 'README.md'

Datasets

Datasets Documents Words Number of clusters
Digikala 3261 10728 3
Digimag 6896 80160 7
Persian news 1644 28216 8
Psychological advice text Persian 79 1929 11
Snappfood 3891 4303 3

For more details see this page

import pickle
                                                                   # Read Data Sets ------->  Digikala
# Loading pickle data from a file
with open('tfidf_Digikala.pkl', 'rb') as f:
        tfidf_Digikala = pickle.load(f)

# Loading pickle data from a file
with open('labels_Digikala', 'rb') as f:
        labels_Digikala = pickle.load(f)

true_labels = np.sort(labels_Digikala)

Model

from NMTFcoclust.Models.NMTFcoclust_ONMTF_alpha import ONMTF
from NMTFcoclust.Models.NMTFcoclust_NMTF_alpha import NMTF
ONMTF_alpha = ONMTF(n_row_clusters = 3, n_col_clusters = 3, delta = 0.03,  beta = 0.03,  alpha = 0.1, max_iter=1)
ONMTF_alpha.fit(tfidf_Digikala)

NMTF_alpha = NMTF(n_row_clusters = 3, n_col_clusters = 3, alpha = 2, max_iter=1)
NMTF_alpha.fit(tfidf_Digikala)

from sklearn.metrics import confusion_matrix 

confusion_matrix(np.sort(true_labels), np.sort(ONMTF_alpha.row_labels_))

from NMTFcoclust.Evaluation.EV import Process_EV

Process_Ev = Process_EV( np.sort(true_labels), tfidf_Digikala , ONMTF_alpha) 
Process_Ev = Process_EV( np.sort(true_labels), tfidf_Digikala , NMTF_alpha) 



Accuracy (Acc):0.8761116222017786
Normalized Mutual Info (NMI):0.6836524406477642
Adjusted Rand Index (ARI):0.7667679710034221
Confusion Matrix   (CM):
[[2181  201    0]
 [   0  216  203]
 [   0    0  460]]

Screenshot: 'README.md'

Word Cloud Co-clustering for Persian News

Screenshot: 'README.md'

Cite

Please cite the following paper in your publication if you are using Persiancoclust in your research:

 @article{Persiancoclust, 
    title={Orthogonal Non-negative Matrix Tri-Factorization with $\alpha$-Divergence for Persian Text Co-clustering.}, 
    DOI={Preprint}, 
    journal={Iranian Journal of Science (preprint)}, 
    authors={Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour}, 
    year={2023}
} 

References

[1] Mehrdad Farahani et al, Parsbert: Transformer-based model for Persian language understanding, Neural Processing Letters (2021).

[2] Yoo et al, Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel manifolds (2010), Information Processing and Management.

[3] Ding et al, Orthogonal nonnegative matrix tri-factorizations for clustering, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008).

[4] Long et al, Co-clustering by block value decomposition, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (2005).

[5] Li et al, Nonnegative Matrix Factorization on Orthogonal Subspace (2010), Pattern Recognition Letters.

[6] Cichocki et al, Non-negative matrix factorization with $\alpha$-divergence (2008), Pattern Recognition Letters.

[7] Saeid, Hoseinipour et al, Orthogonal Parametric Non-negative Matrix Tri-Factorization with $\alpha$-Divergence for Co-clustering (2023), Expert Systems With Application.

[8] Saeid, Hoseinipour et al, Sparse Expoential Family Latent Block Model for Co-clustering (2023), Computational Statistics and Data Analysis (preprint).

Built With

  • co-clustering
  • nmtf
  • persian-text-mining
  • python
  • wordclouds
Share this project:

Updates