3D Garbage Image Classification

Team Garbage Pro: Hengguang Cui(hcui15), Ruichen Zhu(rzhu30), Runchang Zhou(rzhou32), Qiwen Li(qli97)

Check-in 1 : 2470 Project Outline

11/12/2022

Introduction: What problem are you trying to solve and why?(Hengguang Cui)

We want to develop a model that could do 3D garbage image classification. In this project, we use 3D-MNIST dataset to imitate 3D garbage images. During the literature review, we see two main-streaming approaches for applying deep learning on 3D point clouds: one converting 3D objects to 2D and the other processing 3D directly. We find that the existing method of extracting features from 3D cloud points data has some limitations and would like to develop an Image-based method that converts the 3D objects to spheres with a new way of sphere projections Gaussian Mixture Model(GMM) and apply the inputs with the Spherical CNNs. The main goal of this project is to project 3D images to spheres with GMM and use a Spherical CNNs-based model to do 3D image classifications.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project? (Runchang Zhou)

[1] Saifullahi Aminu Bello, Shangshu Yu, and Cheng Wang. 2020. Review: deep learning on 3D point clouds. arXiv [cs.CV]. Retrieved from http://arxiv.org/abs/2001.06280 This paper is a guide for beginners in the field of deep learning on the 3D point cloud like us, explains how difficult it is to apply deep learning on 3D point clouds, and offers some recent approaches to the challenge. It mentions some methods like converting the point cloud into a structured grid, and utilizing deep learning methods on point clouds directly, such as PointNet, and DGCNN. Furthermore, they also discuss some 3D Datasets and how these approaches work on the datasets.

[2] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 38, 5, Article 146 (October 2019), 12 pages. https://doi.org/10.1145/3326362. In this paper, Wang focuses on Dynamic Graph CNN(DGCNN), which updates edges dynamically after each edgeConv layer depending on the features in the previous layer. The method is inspired by PointNet and CNN and add dynamic updating with KNN. They also find that the geometric features are influencing 3D recognition. In their testing, DGCNN performs well in various tasks like classification, part segmentation, and semantic segmentation.

[3] Chems-Eddine Himeur, Thibault Lejemble, Thomas Pellegrini, Mathias Paulin, Loic Barthe, and Nicolas Mellado. 2021. PCEDNet: A Lightweight Neural Network for Fast and Interactive Edge Detection in 3D Point Clouds. ACM Trans. Graph. 41, 1, Article 10 (February 2022), 21 pages. https://doi.org/10.1145/3481804. Himeur introduces their new method: Point Cloud Edge Detection Network (PCEDNet), which takes 16 6D vectors grouped by orders and then applies them with all fully-connected layers and a sigmoid activation function. With this approach, the processing time is much faster than normal CNN. However, we all believe the idea of data processing is too complicated for our project, and we would like to do more work on the deep learning side.

[4] Zhangjie Cao, Qixing Huang, and Karthik Ramani. 2017. 3D Object Classification via Spherical Projections. arXiv. http://arxiv.org/abs/1712.04426. Compared with using deep learning methods on point cloud directly, which is influenced by resolution, converting to 2D allows us to compute large-scale datasets. However, in this case, it is vital to choose viewpoints for projection. The main idea of the paper is how to convert a 3D object to a form that can be used in 2D CNN by projecting a 3D object onto a spherical domain centered. There are two ways for spherical projection: Depth-based projection, which is generated by shooting a ray from each point on the sphere to the center, and Image-based projection, which shoots a 3x12 grid of images of the input object from 36 viewpoints in total. This method tries to solve the issues of resolution and viewpoints in traditional approaches.

[5] Vanessa S.E. Bierling, Paul D. McNicholas. 2018. A latent-class mixture model for incomplete longitudinal Gaussian Data. https://doi.org/10.1002/9780470510445.ch24. Bierling explains a method for clustering longitudinal data using a latent Gaussian mixture model, which allows the variability in measurements taken at a large number of time points, p, to be explained using a smaller number of time points, q. This theory could be applied to Depth-based Spherical Projections.

[6] Taco S. Cohen, Mario Geiger, Jonas Koehler, and Max Welling. 2018. Spherical CNNs. arXiv. http://arxiv.org/abs/1801.10130. This paper explains the theory of spherical CNNs, which take a three-dimensional manifold, denoted by SO(3), as an output in higher layers. They apply techniques from non-commutative harmonic analysis to develop a Generalized FFT-based correlation algorithm to solve their main challenges: 1. Hard to define the rotation of a spherical filter by one pixel. 2. Super high computational efficiency O(n^6). They use a generalized Fourier transform (GFT) and a corresponding fast algorithm (GFFT) as analogous transforms for the sphere and rotation group. Instead of using normal CNN, we plan to utilize Spherical CNNs for our project.

Data: What data are you using (if any)? (Hengguang Cui)

We use 3D MNIST as our dataset. The dataset includes 5000 trains and 1000 test 3D point clouds stored in HDF5 file format. We are still discussing how to preprocess the data so that we can turn the 3D dataset into 2D/sphere, or just process it in 3D directly. We have decided to apply CNN on the image dataset. Depending on which method we finally choose, we will use traditional CNN or Spherical CNNs. Our Methodology below is based on multiview Image-based approaches with a new approach of Spherical Projection --- Gaussian Mixture Model(GMM) ..

Methodology: What is the architecture of your model? (Qiwen Li)

How: Based on the literature review we have done, our initial idea is to follow Cao’s idea of Spherical Projections for the input and then apply it with Spherical CNNs. We consider using Gaussian Mixture Model(GMM) as our Spherical Projections to extract the shapes/ features of the 3D objects and turn them into the inputs for Spherical CNN, and then we train CNN on the input to do the classification task. We train the model on 3D MNIST. Our novelty mostly lies in how we use new methods based on Gaussian Mixture models to represent better the 3D features of the target object and how we convert them into suitable input for later training with CNN. We also try to make the model translational & rotational invariant. Our initial idea of improvements on using the GMM model to build a 3D representation of an object is that we think that the calculation of projection in GMM should not only be able to attain the depth variance but also reflect the importance of the data point, so we can define the importance by introducing a weight on a data point. The weight can be related to, for example, the neighboring data points around each data point. How do we define the neighborhood? Its radius of it can be left as a Hyperparameter for us to experiment with. Backup idea: one of the teammates proposed using Dynamic Graph CNN rather than traditional CNN with dynamic updating based on the previous layer.

Metrics:What constitutes “success?” (Qiwen Li)

  • What experiments do you plan to run? Experiments: For the GMM part, we plan to a) tune the (µ, σ) parameters, use spherical as the covariance type for GMM b) use multiple concentric spheres c) to be decided. Base: Apart from our proposed GMM + CNN model, we plan to experiment with some traditional classification models such as logistic regression and support vector machines(SVM), and K Nearest Neighbours(KNN), which are some well-known machine learning methods. We will turn it into a “one vs Rest” problem and modify our dataset accordingly. These models’ performance will be our base. Our base goal is that our proposed model outperformed these models.

  • Metrics: Accuracy rate.

  • Target: Our model accuracy should be a lot better than those traditional classification models (Hail Deep Learning) and reaches an accuracy of 75%.

  • Stretch goal: Even Higher Accuracy. Think of better improvements in the 3D representation

Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.) (Ruichen Zhu)

  • What broader societal issues are relevant to your chosen problem space? Currently, there are about 1.3 billion tons of solid waste generated around the world. This number will keep increasing, and the cost to manage these wastes has already reached hundreds of billions of US dollars. In addition, many waste treatments are ineffective and may even harm people’s health, environment, ocean, air, etc. As a result, improving the effectiveness of waste management and alleviating its harm to the world is imminent and needs everyone’s attention.

  • Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? Our community and the whole environment will be the biggest stakeholder in this project. Currently, there are so many mistakes in the process of garbage classification, both made by manual behavior and AI classification. There are an infinite number of times that a wrong classification treats recyclable wastes as not recyclable and does harm to the earth as the number of occurrences is really huge. Thus, a better algorithm is needed to improve the accuracy of waste classification.

Check-in 2

11/30/2022

Introduction: This can be copied from the proposal. We want to develop a model that could do 3D garbage image classification. In this project, we use the 3D-MNIST dataset to imitate 3D garbage images. During the literature review, we see two main-streaming approaches for applying deep learning on 3D point clouds: one converting 3D objects to 2D and the other processing 3D directly. We found that the existing method of extracting features from 3D cloud points data has some limitations and would like to choose an Image-based method that converts the 3D objects to spheres with sphere projections and apply them with the Spherical CNNs. The main goal of this project is to project 3D images to spheres with Gaussian Mixture Model(GMM) and use a Spherical CNNs-based model to do 3D image classifications.

Challenges: What has been the hardest part of the project you’ve encountered so far?

None of us is familiar with S2CNN, which is not a widely used package. It takes a long time for us to understand and fix the module error caused by the package itself. The main challenge for the whole project is converting the 3D data to spheres with our Gaussian Mixture Model, and how to convert it to the sphere is the hardest part. Currently, we are working on visualizing the 3D plots for the final posters and reports and getting a high accuracy with a better choice of layers for the model.

Insights: Are there any concrete results you can show at this point? How is your model performing compared with expectations? We have experimented 3d-based deep learning method using 3D CNN on the 3D MNIST dataset and achieved an accuracy of 66.95% on the test set. We have set up our spherical Gaussian Mixture model to convert data points in 3D dimension onto the sphere surface and spherical CNN, and not yet been tested. We then Experimented using Random Forest and achieved an accuracy of 64.47%. We have also experimented using logistic regression and SVM on our task. We adapted the idea of One vs. Rest (OvR) to convert the multiclass classification into binary classification. With 60000 images, the highest possible accuracy we can obtain is 35.56% for logistic regression and 40.33% for SVMs (how: To achieve that, we introduced a new algorithm that contains ten iterations based on the number of classes, and we update the label of each observation to 1(yes) or 0 (no) based on the number we are observing) in each interaction)

Plan: Are you on track with your project? What do you need to dedicate more time to? What are you thinking of changing, if anything?

Yes, we are on track with our project. We need to dedicate more time to tuning the parameters of the Gaussian mean and standard deviation and the choice of layers. We also need to take care of the multi spheres with the same center and different radii. All of these are used to make our model more accurate. And we are working on how to reflect the importance of the data point in addition to attaining the depth variance when calculating projections in GMMs. Most things work as expected, and nothing regarding our plan needs to change till now.

Other Link

Final Writeup: https://docs.google.com/document/d/1TqnpH3W6odgpgI6ly7SqVpP0B7D2OMN_jkCRvDgG41k/edit?usp=sharing

Outline: https://docs.google.com/document/d/1U1r9iyMy8qM_su_FrLOsW8GluhSEvCxOr3qq8HHowWs/edit?usp=sharing

Built With

Share this project:

Updates

posted an update

2470 Project Reflection 3D Garbage Image Classification Team Garbage Pro: Hengguang Cui(hcui15), Ruichen Zhu(rzhu30), Runchang Zhou(rzhou32), Qiwen Li(qli97) 11/30/2022

Introduction: This can be copied from the proposal. We want to develop a model that could do 3D garbage image classification. In this project, we use 3D-MNIST dataset to imitate 3D garbage images. During the literature review, we see two main-streaming approaches for applying deep learning on 3D point clouds: one converting 3D objects to 2D and the other processing 3D directly. We found that the existing method of extracting features from 3D cloud points data has some limitations and would like to choose a Image-based method which converts the 3D objects to spheres with sphere projections and apply them with the Spherical CNNs. The main goal of this project is to project 3D images to spheres with Gaussian Mixture Model(GMM), and use a Spherical CNNs based model to do a 3D image classifications.

Challenges: What has been the hardest part of the project you’ve encountered so far?

None of us is familiar with S2CNN and it is not a well-known package. It takes a long time for us to understand it, and fix the module error caused by the package itself. The main challenge for the whole project is converting the 3D data to spheres with our Gaussian Mixture Model, and how to convert it to sphere is the hardest part. Currently, we are working on visualizing the 3D plots for the final posters and reports and getting a high accuracy with a better choice of layers for the model.

Insights: Are there any concrete results you can show at this point? How is your model performing compared with expectations? We have experimented 3d-based deep learning method using 3D CNN on the 3D MNIST dataset and achieved an accuracy of 65.5%. We have set up our spherical Gaussian Mixture model to converts datapoints in 3D dimension onto sphere surface and spherical CNN and not yet tested. We then Experimented using Random Forest and achieved an accuracy of 64.47%. We have also experimented using logistic regression and SVM on our task. We adapted the idea of One vs. Rest (OvR) to convert the multiclass-classification into binary classification. With 60000 images, the highest possible accuracy we can obtain is 35.56% for logistic regression and 40.33% for SVMs (how: To achieve that, we introduced a new algorithm that contains ten iterations, based on the number of classes, and we update the label of each observation to 1(yes) or 0 (no) based on the number we are observing) in each interation)

Plan: Are you on track with your project? What do you need to dedicate more time to? What are you thinking of changing, if anything?

Yes, we are on track with our project. We need to dedicate more time to tuning the parameters of Gaussian mean and standard deviation as well as the choice of layers. We also need to take care of the multi spheres with the same center and different radius. All of these are used to make our model become more accurate. And we are working on a way to make calculation of projection in GMM s reflect the importance of the data point in addition to attaining the depth variance. Most things work as expected, and nothing regarding our plan needs to change till now.

Log in or sign up for Devpost to join the conversation.