Final project
Materials from class on Saturday, July 11, 2020
Overview
- Review the project requirements. Project is required for credit-seeking students and optional for guests
- Sign up for the final project at the BIOS691 Final project sign-up form, due 06/19/2020
- Teams of two are encouraged - both teammates should agree and sign up
- Project proposal submission due 06/26/2020
- Final project submission due 07/12/2020
- Ask questions by e-mail
Data
- Register on kaggle.com
- Select and enter one of the completed Kaggle competitions based on topics/code learned in-class. Examples:
- Kannada MNIST - handwritten digit recognition written using the Kannada script
- Kuzushiji Recognition - transcribe ancient Kuzushiji into contemporary Japanese characters
- Aerial Cactus Identification - determine whether an image contains a columnar cactus
- Movie Review Sentiment Analysis - classify the sentiment of sentences from the Rotten Tomatoes dataset
- Dog Breed Identification - determine the breed of a dog in an image
- Explore Kaggle datasets for other interesting projects
- Consult with the instructor if you would like to use your own data
Proposal Submission (due 06/26/2020)
- Register on GitHub. Learn about GitHub in the section 4 - Git and GitHub of the Reproducible research tools
- Create a repository for your project
- If teamwork, add your teammate as a collaborator
- Add README file describing the selected project. Address the following points:
- Data type (numerical, text, or images), link to the data source
- Problem type (binary or multiclass classification, regression)
- Proposed network architecture (feed-forward, CNN, RNN, etc.), testing multiple architectures encouraged
Solution
- Create your code in RMarkdown
- Describe and implement data download and processing
- Select, justify, and implement a network architecture suitable for the data
- Train and evaluate the performance of the network
- Tweak network architecture, add regularization, if needed, to improve model performance
- Explain all steps, describe the achieved performance
Many Kaggle competitions allow for late submissions in the form of text files with, e.g., samples and predicted classes. Such submissions will be benchmarked against previous submissions, and you will be able to compare how your model performs with respect to previous solutions.
Final Submission (due 07/12/2020)
- Add your code solving the selected competition to your GitHub repository
- Your code should run (knit) with minimal modifications, e.g., adjusting path
- If teamwork, ensure that both participants committed to the repository