Project summary list -
Project #1 - A Machine Learning Approach to Assess Education Policies in Brazil
Report - Link
- Regression model to predict current quality index of schools
- Clustering model to identify groups of school with similar profiles
- Classification model to predict goal achievement in schools
- Key Features
- Spendings with transportation for students
- Spendings with food for students and workers
- Spendings with constructions and maintenance of schools
- Spendings with salaries of school employees
- Number of students
- Number of professors separated by level of education
- Number of laboratories, computers and offices
- School performance according to Ideb in 2013, 2015 and 2017
- School goal for Ideb in 2013, 2015 and 2017
My Observations - We could use the same to apply for Indian schools, Based on publically available data we can predict the dropouts / aid / identify poor performers proactively
Project #2 - Fraud detection using Machine Learning
Dataset
- PaySim - a Kaggle dataset for fraud detection
- 6 million + mobile payment transactions
- 6 different categories of transactions
- 8312 fraudulent transactions
Class weight-based approach
- In a fraud detection system, it’s more critical to correctly detect fraud transactions and acceptable to misclassify certain number of non-fraud transactions.
- Penalize misclassification of fraud transactions more than non-fraud transactions
- Assign higher weights to fraud class to obtain high recall on that class and counter data imbalance.
- Ensure no more than ~1% false positives
Project #3 - Image Super-Resolution Via a Convolutional Neural Network
Key Notes
- SRCNN comprises three convolutional blocks corresponding to patch extraction, non-linear mapping, and reconstruction
- SRCNN surpasses non-neural methods for the task of super-resolution
Image Super-Resolution using an Efficient Sub-Pixel CNN
Using The Super Resolution Convolutional Neural Network for Image Restoration
Project #4 - Explore Co-clustering on Job Applications
Kaggle Job Recommendation Challenge
- ~ 1.6m unique job applications
- ~ 360k unique jobs
- ~ 320k unique job applicants
- Report Explore Co-clustering on Job Applications
Project #5 - FAKE NEWS IDENTIFICATION
Key Notes
- Tokenize the body and headline with the Punkt statement tokenizer from the NLTK NLP library. This tokenizer runs an unsupervised machine learning algorithm pre-trained on a general English corpus, and can distinguish between scentence punctuation marks, and position of words in a statement.
- Tokenize words with our algorithm, and take care of lemmatization.
- Tag each sample with the tokens obtained from entire headline set, and body set.
Poster - Link
Keep Exploring!!!
No comments:
Post a Comment