This article contains supplementary materials for the Advanced Machine Learning course ECE5424 final project: Lifestyle Prediction via Biomarkers. Vector based figures, essential website links are provided in this article for better reading experiences. As python code and jupyter notebook are not mandatorily requested, thus part of the results are shown.
To balance the content display and links, only essential svg figures are shown inline in the article, other contents can be accessed via hyperlinks.
The server for hosting this site is located in China, there will be a delay in terms of loading, the content distribution server is also hosted in China, thus the content caching time may be longer than expected. Each embedded link will open a tab on your brower to bring you to the specific file, vector based figures shown inline can be viewed separately by right click and then “Open Image in New Tab” where you could zoom in to see the details.
A final version of report for the project can be accessed here.
Dataset
Raw dataset Pandas Profiling Report
Pairplot
Pairplot of the dataset is shown using module “Seaborn”.
The following link contains the pairplots of all features for your reference, however the file size is over 100MB, the transferring time could be very long. I initially thought about putting the file on Onedrive, but the link could only be valid in 30 days publicly, I decided to put it in my content distribution server for better management.
pairplot of all features using 1000 samples(long file caching time with 100MB pdf file size)
Preprocessing
Explortory Data Analysis
Feature Engineering
Feature Selection
This culmulative figure is not added to the final project because authors forgot to add it, this is an important figure that shows 25 features based on each metrics could achieve more than 80% of importance.
Machine Learning Models
Naive Bayes
The following hyperlink provides the same figure with more sampling data, however, the pdf file takes a longer time to render. In the beginning, it just shows a white windows which doesn’t indicate any error, it takes time for the cpu to help doing the heavylifting work to generate the graphics, you are welcome to check out the file.
Height vs Weight Decision Boundary with More Samples and Longer Rendering Time which needs 1 Minutes
Leave a Reply