"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 10, 2022

Day 3 - Projects - Features - Ideas

Project summary list - 

Project #1 - A Machine Learning Approach to Assess Education Policies in Brazil

Report - Link

  • Regression model to predict current quality index of schools
  • Clustering model to identify groups of school with similar profiles
  • Classification model to predict goal achievement in schools
  • Key Features
  • Spendings with transportation for students
  • Spendings with food for students and workers
  • Spendings with constructions and maintenance of schools
  • Spendings with salaries of school employees
  • Number of students
  • Number of professors separated by level of education
  • Number of laboratories, computers and offices
  • School performance according to Ideb in 2013, 2015 and 2017
  • School goal for Ideb in 2013, 2015 and 2017

My Observations - We could use the same to apply for Indian schools, Based on publically available data we can predict the dropouts / aid / identify poor performers proactively

Project #2 - Fraud detection using Machine Learning

Dataset

  • PaySim - a Kaggle dataset for fraud detection
  • 6 million + mobile payment transactions
  • 6 different categories of transactions
  • 8312 fraudulent transactions

Class weight-based approach

  • In a fraud detection system, it’s more critical to correctly detect fraud transactions and acceptable to misclassify certain number of non-fraud transactions.
  • Penalize misclassification of fraud transactions more than non-fraud transactions
  • Assign higher weights to fraud class to obtain high recall on that class and counter data imbalance.
  • Ensure no more than ~1% false positives

Project #3 - Image Super-Resolution Via a Convolutional Neural Network

Key Notes

  • SRCNN comprises three convolutional blocks corresponding to patch extraction, non-linear mapping, and reconstruction

  • SRCNN surpasses non-neural methods for the task of super-resolution

Image Super-Resolution using an Efficient Sub-Pixel CNN

Using The Super Resolution Convolutional Neural Network for Image Restoration

Project #4 - Explore Co-clustering on Job Applications

Kaggle Job Recommendation Challenge

  • ~ 1.6m unique job applications
  • ~ 360k unique jobs
  • ~ 320k unique job applicants


  • Report Explore Co-clustering on Job Applications

Project #5 - FAKE NEWS IDENTIFICATION

Key Notes

  1. Tokenize the body and headline with the Punkt statement tokenizer from the NLTK NLP library. This tokenizer runs an unsupervised machine learning algorithm pre-trained on a general English corpus, and can distinguish between scentence punctuation marks, and position of words in a statement.
  2. Tokenize words with our algorithm, and take care of lemmatization.
  3. Tag each sample with the tokens obtained from entire headline set, and body set.

Poster - Link

Keep Exploring!!!

No comments: