Table of Contents
Notation
$\mathbf{X}$: Word-Document counts, Movie-Viewer ratings, Product-Customer purchases matrices
$\mathbf{R}$: Row-coefficient matrix
$\mathbf{B}$: Summarization matrix
$\mathbf{C}$: Column-coefficient matrix
Objective function of models
$NMTF_{\alpha}$
D_{\alpha}(\mathbf{X}|| \mathbf{RBC}^{\top})$ONMTF_{\alpha}$
D_{\alpha}(\mathbf{X}|| \mathbf{RBC}^{\top}) + \delta \; Tr(\mathbf{R}\Psi_{g}\mathbf{R}^{\top}) + \beta \; Tr(\mathbf{C}\Psi_{s}\mathbf{C}^{\top}),
Co-clustering
NMTF

Example of Co-clustering on Word-Document Matrix

Word Cloud Co-clustering for Digikala Persian Comments

Datasets
| Datasets | Documents | Words | Number of clusters |
|---|---|---|---|
| Digikala | 3261 | 10728 | 3 |
| Digimag | 6896 | 80160 | 7 |
| Persian news | 1644 | 28216 | 8 |
| Psychological advice text Persian | 79 | 1929 | 11 |
| Snappfood | 3891 | 4303 | 3 |
For more details see this page
import pickle
# Read Data Sets -------> Digikala
# Loading pickle data from a file
with open('tfidf_Digikala.pkl', 'rb') as f:
tfidf_Digikala = pickle.load(f)
# Loading pickle data from a file
with open('labels_Digikala', 'rb') as f:
labels_Digikala = pickle.load(f)
true_labels = np.sort(labels_Digikala)
Model
from NMTFcoclust.Models.NMTFcoclust_ONMTF_alpha import ONMTF
from NMTFcoclust.Models.NMTFcoclust_NMTF_alpha import NMTF
ONMTF_alpha = ONMTF(n_row_clusters = 3, n_col_clusters = 3, delta = 0.03, beta = 0.03, alpha = 0.1, max_iter=1)
ONMTF_alpha.fit(tfidf_Digikala)
NMTF_alpha = NMTF(n_row_clusters = 3, n_col_clusters = 3, alpha = 2, max_iter=1)
NMTF_alpha.fit(tfidf_Digikala)
from sklearn.metrics import confusion_matrix
confusion_matrix(np.sort(true_labels), np.sort(ONMTF_alpha.row_labels_))
from NMTFcoclust.Evaluation.EV import Process_EV
Process_Ev = Process_EV( np.sort(true_labels), tfidf_Digikala , ONMTF_alpha)
Process_Ev = Process_EV( np.sort(true_labels), tfidf_Digikala , NMTF_alpha)
Accuracy (Acc):0.8761116222017786
Normalized Mutual Info (NMI):0.6836524406477642
Adjusted Rand Index (ARI):0.7667679710034221
Confusion Matrix (CM):
[[2181 201 0]
[ 0 216 203]
[ 0 0 460]]

Word Cloud Co-clustering for Persian News

Cite
Please cite the following paper in your publication if you are using Persiancoclust in your research:
@article{Persiancoclust,
title={Orthogonal Non-negative Matrix Tri-Factorization with $\alpha$-Divergence for Persian Text Co-clustering.},
DOI={Preprint},
journal={Iranian Journal of Science (preprint)},
authors={Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour},
year={2023}
}
References
Built With
- co-clustering
- nmtf
- persian-text-mining
- python
- wordclouds
Log in or sign up for Devpost to join the conversation.