Overview of Project




Project Summary

Our data set is hosted on the University of California, Irvine Machine Learning Repository and lists various physical characteristics present in the abalone samples. Some of these characteristics include age, sex, and weight, but the most important trait for this project was the number of rings within the shell. The larger the number of rings, the more mature and valuable the abalone is. After extracting, transforming, and loading the data, we began testing our data. We attempted to obtain a high coefficient of determination (R2) score for our model, yet we found that the data is not sufficient to allow us to confidently make a prediction.

Data Set

UC Irvine Machine Learning Abalone Data Set

Tools

Google Colaboratory was our environment of choice to run our models, as it is an easy way to collaborate, and it created a way for our code to be used irrespective of one's computer specifications. Within the notebook, we used Pandas, Matplotlib, SciKit-Learn, Tensor Flow, and Keras Tuner.

Models

The initial model that we used to do our test was a Multiple Linear Regression. This is one of the simpler machine learning models to employ, and one of the easiest to interpret. We wanted to see if there were linear relations relationship between the ring count in the abalone shells. After running and refining the regressions, we obtained an R2 score of 0.5459, which was average, but not what we were hoping for. In an attempt to recieve more favorable results for making the predictions, we used multiple other algorithms, including k-Nearest Neighbors, Support Vector Regression, and Random Forest Regression, and many different Neural Network models.