Project Summary
Our data set is hosted on the University of California, Irvine Machine Learning Repository
and lists various physical characteristics present in the abalone samples. Some of these
characteristics include age, sex, and weight, but the most important trait for this project was the number of rings
within the shell. The larger the number of rings, the more mature and valuable the abalone
is. After extracting, transforming, and loading the data, we began testing our data. We
attempted to obtain a high coefficient of determination (R
2) score for our model, yet we found
that the data is not sufficient to allow us to confidently make a prediction.
Data Set
UC Irvine Machine Learning Abalone Data Set
Tools
Google Colaboratory was our environment of choice to run our models, as it is an easy way to collaborate, and
it created a way for our code to be used irrespective of one's computer specifications. Within
the notebook, we used Pandas, Matplotlib, SciKit-Learn, Tensor Flow, and Keras Tuner.
Models
The initial model that we used to do our test was a Multiple Linear Regression. This is one of
the simpler machine learning models to employ, and one of the easiest to interpret. We wanted
to see if there were linear relations relationship between the ring count in the
abalone shells. After running and refining the regressions, we obtained an R
2 score of 0.5459, which was average,
but not what we were hoping for. In an attempt to recieve more favorable results for making
the predictions, we used multiple other algorithms, including k-Nearest
Neighbors, Support Vector Regression, and Random Forest Regression, and many different
Neural Network models.