Teaching Calendar 2024-2025

Quarter 1 Archive

Quarter 2

  • 11/6/2024 (Wednesday)
  • 11/8/2024 (Friday)
    • Continue k-NN. Finish notebook from last class
    • Work on Digits dataset (html)
    • Here’s a nice writeup of the approximate nearest neighbor problem with an eye towards ML applications like Spotify and Netflix recommendations (may be a paywall I’m not sure)
  • 11/12/2024 (Tuesday)
    • Topic: Intro to decision trees
    • Work on the Decision Tree Notebook (html)
    • Note: Oops I forgot sklearn can’t handle categorical features. Ugh. The fix is to one-hot or ordinal encode everything, but that makes for ugly trees.
    • Some notes on entropy are contained in this chapter. I go through the ABCDEFGH example from class here, starting on p. 63.
    • The wikipedia on Huffman Trees is pretty good to if you want notes on that algorithm
  • 11/14/2024 (Thursday)
  • 11/18/2024 (Monday)
  • 11/20/2024 (Wednesday)
    • Notes of the math behind SVMs; the ‘kernel trick’
    • Please finish SVM lab and pick a topic for research
    • Research project due next week: topic of your choice, in depth EDA, comparison of multiple techniques, selecting best technique, post analysis
  • 11/22/2024 (Friday)
    • Classtime for research projet
  • 11/24/204 (Tuesday)
    • Research project
    • Quick presentations of techniques and findings
  • 12/3/2024 (Tuesday)
    • Cross Validation and Grid Search notebook and optional dataset to gunzip (if the notebook open_ml doesn’t work)
    • Apply CV and GS to one of your notebooks (maybe the last research project)
    • Work on FAQ document as time allows (this will be project #4)
  • 12/5/2024 (Thursday)
  • 12/9/2024 (Monday)
    • Ensemble methods
    • Warm up question!
    • See notes here and complete all “to do” sections.
    • Upload your final notebook to grumpy. Please name it Ensemble_Methods.ipynb
    • Report #2 due end of next week. Find your dataset if you haven’t already. Plan to do CV, GridSearch and Ensemble methods on this report.
    • Friday and Tuesday will probably be report workdays.
  • 12/11/2024 (Wednesday)
    • ROC-AUC curves, tuning decision thresholds
    • Read through the notes here and here. The first link is all review for you, but has great visuals and interactive demos. The second one does a great job demonstrating ROC curves.
    • Revisit the Mushroom project. - (You may want to omit neural network from this because it’s slow) - Make ROC curves and compute AUC scores for each of the classifiers we sampled. - Make one plot with ROC curves for all the models on the same graph - For at least one model: - Plot an ROC curve with cross validation documentation here - Tune the classifier threshold using sklearn’s tuning ability documentation here
      • Upload your results to grumpy, named Mushroom-tuned.ipynb
      • Finalize your ideas for Report-02.ipynb, due next Friday
  • 12/13/2024 (Friday)
    • Turn in Ensemble and Mushroom notebooks
    • Work on Report-02.ipynb
      • Pick a new dataset (see me for rare exceptions!)
      • Limit yourself to binary classification (you can do multiclass as an extension if you want but start with binarized data)
      • Focus on new skills: CV, GridSearch, Ensembles and reporting AUC and drawing ROC curves
      • Even if your first model accuracy is 98.5, you still need to improve it using new techniques!
      • Due next Thursday
      • Quick notes on precision/recall curves, randomSearchCV, Naive Bayes and Bayes error rate.
  • 12/17/2024 (Tuesday)
    • Work on Report due this week
    • Be sure to consult specification from last class and rubric
    • PANDOC update. Looks like “sudo apt-get install texlive texlive-xetex pandoc” will get it working
  • 1/16/2025 (Thursday)
  • 1/23/2025 (Thursday)
    • Make sure report 2 is turned in
    • Turn in anomaly if you have it, or turn it in later. I won’t put that on this quarter due to snow days.
    • Today, let’s install tensorflow. See instructions.
  • 1/30/2025 (Thursday)