Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day 3 - Projects - Features

August 10, 2022

Day 3 - Projects - Features - Ideas

Project summary list -

2018 Lisr
2017 List
2019 List
2020 List
2018 List

Project #1 - A Machine Learning Approach to Assess Education Policies in Brazil

Report - Link

Regression model to predict current quality index of schools
Clustering model to identify groups of school with similar profiles
Classification model to predict goal achievement in schools
Key Features
Spendings with transportation for students
Spendings with food for students and workers
Spendings with constructions and maintenance of schools
Spendings with salaries of school employees
Number of students
Number of professors separated by level of education
Number of laboratories, computers and offices
School performance according to Ideb in 2013, 2015 and 2017
School goal for Ideb in 2013, 2015 and 2017

My Observations - We could use the same to apply for Indian schools, Based on publically available data we can predict the dropouts / aid / identify poor performers proactively

Project #2 - Fraud detection using Machine Learning

Dataset

PaySim - a Kaggle dataset for fraud detection
6 million + mobile payment transactions
6 different categories of transactions
8312 fraudulent transactions

Class weight-based approach

In a fraud detection system, it’s more critical to correctly detect fraud transactions and acceptable to misclassify certain number of non-fraud transactions.
Penalize misclassification of fraud transactions more than non-fraud transactions
Assign higher weights to fraud class to obtain high recall on that class and counter data imbalance.
Ensure no more than ~1% false positives

Project #3 - Image Super-Resolution Via a Convolutional Neural Network

Key Notes

SRCNN comprises three convolutional blocks corresponding to patch extraction, non-linear mapping, and reconstruction

SRCNN surpasses non-neural methods for the task of super-resolution

Image Super-Resolution using an Efficient Sub-Pixel CNN

Using The Super Resolution Convolutional Neural Network for Image Restoration

Project #4 - Explore Co-clustering on Job Applications

Kaggle Job Recommendation Challenge

~ 1.6m unique job applications
~ 360k unique jobs
~ 320k unique job applicants

Report Explore Co-clustering on Job Applications

Project #5 - FAKE NEWS IDENTIFICATION

Key Notes

Tokenize the body and headline with the Punkt statement tokenizer from the NLTK NLP library. This tokenizer runs an unsupervised machine learning algorithm pre-trained on a general English corpus, and can distinguish between scentence punctuation marks, and position of words in a statement.
Tokenize words with our algorithm, and take care of lemmatization.
Tag each sample with the tokens obtained from entire headline set, and body set.

Poster - Link

Keep Exploring!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

August 10, 2022

Day 3 - Projects - Features - Ideas

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts