"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label datascieneprojects. Show all posts
Showing posts with label datascieneprojects. Show all posts

August 15, 2022

AI Projects - Ideas / Inspirations

Project #1 - Fashion Clothing Category Classification

Report - Link


Models - Link

Good start to work on fashion attributes

Project #2 - Time Series based Wikipedia Traffic prediction to aid Caching algorithms

Key Notes

  • Caching algorithms like LRU, LFU are one of the most widely used algorithms in the industry ranging from storage systems and in-memory key-value stores to routers
  • Unlike many other ML projects such as image recognition etc., the performance comparison of the ML for caching is not measured against human annotated ground truth, but against LRU
  • LSTM and CNN based architectures with custom loss function
  • Develop a custom loss function by adding a term to maximize recall to the binary cross-entropy and tune the new hyper-parameter
  • Tune the loss function parameter to place a higher weight on positive samples. The custom loss function is as shown below

Loss functions - Link


From link

A custom loss function can be created by defining a function that takes the true values and predicted values as required parameters. The function should return an array of losses. The function can then be passed at the compile stage. 

Project #3 - Long Term Stock Prediction Based On Financial Statements

  • Feature engineering with key indicators, such as Price-to-Book Ratio, Price-to-Earnings Ratio, Debt-to-Equity Ratio
  • This project focuses on building an end-to-end LSTM model for long term stock prediction based on historical financial statements

Dataset - Link

  • Balance sheet features, including 30 data fields: cash and cash equivalents, short-term investments, net receivables, inventory, other current assets, total current assets, long-term investments, fixed assets, goodwill, intangible assets, other assets, deferred asset charges, total assets, accounts payable, short-term debt / current portion of long-term debt, other current liabilities, total current liabilities, long-term debt, other liabilities, deferred liability charges, misc. stocks, minority interest, total liabilities, common stocks, capital surplus, retained earnings, treasury stock, other equity, total equity, total liabilities and equity
  • Income statement features, including 18 data fields: total revenue, cost of revenue, gross profit, research and development, sales general and admin., non-recurring items, other
  • operating items, operating income, add’l income/expense items, earnings before interest and tax, interest expense, earnings before tax, income tax, minority interest, equity earnings/loss unconsolidated subsidiary, net income-cont. operations, net income, net income applicable to common shareholders.
  • Cash flow statement features, including 18 data fields: net income, depreciation, net income adjustments, accounts receivable, changes in inventories, other operating activities, liabilities, net cash flow-operating, capital expenditures, investments, other investing activities, net cash flows-investing, sale and purchase of stock, net borrowings, other financing activities, net cash flows-financing, effect of exchange rate, net cash flow.

Paper #4 - Stock Market Prediction using CNN and LSTM

  • Starting with a data set of 130 anonymous intra-day market features and trade returns
  • This study is based on a financial dataset extracted from the Jane Street Market Prediction competition on Kaggle [16]. The available dataset is composed of 2,390,491 record each defined using 130 anonymous features measured sequentially spanning 500 days at different time steps during each day.
  • Rolling cross validation

Predict bucket move 5 / 10 / 15  20 / break

Project #5 - Film Success Prediction Using NLP Techniques

  • Our dataset may be separated into two major parts: a set of structured categorical and numerical data retrieved from IMDb, and a set of scripts from which we generate word frequency vectors and scene description vectors 
  • For the categorical structured data, we use a dense layer without bias to serve as a trainable embedding layer which returns a 128 dimensional embedding of the data.

Paper #6 - Generating Six-Word Stories

  • The six-word story is a format of flash storytelling that rose to popularity through the famous tale allegedly written by Ernest
  • Hemingway
  • PRAW (the Python Reddit API Wrapper)
  • Query data from the r/sixwordstories subreddit Use this to generate meaningful tweets

Project #7 - Changing people’s hair color in images




  • The training set is split into two sets trainA and trainB. 
  • The images from trainB are presented to the discriminator with their actual hair color. 
  • Images from trainA are given as input to the generator along with a target hair color that is randomly sampled from all the hair colors occuring in trainB.
  • The discriminator tries to classify images from trainB (labeled with their actual hair color) as 1 and generated images (labeled with the target hair color) as 0, whereas the generator tries to fool the discriminator with realistic generated images. 
  • In order to be successful at this, the generator should have to match the target hair color in the generated image.

Code - Link

Project #8 - Detect Depression



Project #9 - Predict loan default


Project #10 - Link Prediction with Graph Neural Networks and Knowledge Extraction

  • Graph Neural Network. The number of GNN layers is limited due to the Laplacian smoothing

  • Knowledge Extraction: We use BERN [4] to extract named entities for the abstract of each articles.

Project #11 - Finding a hairstyle that fits your facial features

Adam and RMSProp and ended up using the AdamOptimizer described in the lecture

More Reads

Keep Exploring!!!

August 10, 2022

Day 3 - Projects - Features - Ideas

Project summary list - 

Project #1 - A Machine Learning Approach to Assess Education Policies in Brazil

Report - Link

  • Regression model to predict current quality index of schools
  • Clustering model to identify groups of school with similar profiles
  • Classification model to predict goal achievement in schools
  • Key Features
  • Spendings with transportation for students
  • Spendings with food for students and workers
  • Spendings with constructions and maintenance of schools
  • Spendings with salaries of school employees
  • Number of students
  • Number of professors separated by level of education
  • Number of laboratories, computers and offices
  • School performance according to Ideb in 2013, 2015 and 2017
  • School goal for Ideb in 2013, 2015 and 2017

My Observations - We could use the same to apply for Indian schools, Based on publically available data we can predict the dropouts / aid / identify poor performers proactively

Project #2 - Fraud detection using Machine Learning

Dataset

  • PaySim - a Kaggle dataset for fraud detection
  • 6 million + mobile payment transactions
  • 6 different categories of transactions
  • 8312 fraudulent transactions

Class weight-based approach

  • In a fraud detection system, it’s more critical to correctly detect fraud transactions and acceptable to misclassify certain number of non-fraud transactions.
  • Penalize misclassification of fraud transactions more than non-fraud transactions
  • Assign higher weights to fraud class to obtain high recall on that class and counter data imbalance.
  • Ensure no more than ~1% false positives

Project #3 - Image Super-Resolution Via a Convolutional Neural Network

Key Notes

  • SRCNN comprises three convolutional blocks corresponding to patch extraction, non-linear mapping, and reconstruction

  • SRCNN surpasses non-neural methods for the task of super-resolution

Image Super-Resolution using an Efficient Sub-Pixel CNN

Using The Super Resolution Convolutional Neural Network for Image Restoration

Project #4 - Explore Co-clustering on Job Applications

Kaggle Job Recommendation Challenge

  • ~ 1.6m unique job applications
  • ~ 360k unique jobs
  • ~ 320k unique job applicants


  • Report Explore Co-clustering on Job Applications

Project #5 - FAKE NEWS IDENTIFICATION

Key Notes

  1. Tokenize the body and headline with the Punkt statement tokenizer from the NLTK NLP library. This tokenizer runs an unsupervised machine learning algorithm pre-trained on a general English corpus, and can distinguish between scentence punctuation marks, and position of words in a statement.
  2. Tokenize words with our algorithm, and take care of lemmatization.
  3. Tag each sample with the tokens obtained from entire headline set, and body set.

Poster - Link

Keep Exploring!!!

AI Projects - Inspirations - Notes - Post 2 (10 Projects)

Project #1 - Music Recommendation

Key Observations

  • Free and customizable recommendations
  • Dataset - spotify million playlist
  • Collaborative filtering model
  • Two-stage nearest neighbourhood model

  • Popularity, Energy 



Project #2 - LegalEgo

Key Observations

  • Contract Review documents
  • Automated annotations

  • Output labels and annotations

Project #3 - Autosuggest

Key Observations

  • Articles that support our claim



Project #4 - Cryptocurrency prediction

Key Observations





Project #5 - Fashion Recommendation

Key Observations




Project #6 - Instaavatar

Key Observations

  • Random Avatars


Project #7 - Image to Calories detection

Key Observations



Project #8 - Augmented Image Search

Key Observations


Project #9 - Product search

Key Observation



Project #10 - Sign language

Key Observation


Keep Exploring!!!










August 09, 2022

AI Projects - Inspirations - Notes - Post 1 (10 Projects)

Project #1 - Generate NFT style images

Key observations

  • Generate images
  • Convex combination
  • Image denoising

  • Architecture

Project #2 - Emoji generation by text

Key observations

  • Emoji adoption on rising
  • Impact of Emoji

  • Architecture

Project #3 - Trading by breaking news

Key Observations

  • Collect headlines
  • Analyze and inform
  • Summary and Stock opinion
  • Bearish / Bullish



  • Keywords, similar articles


Project #4 - Vision powered cooking items

Key Observations

  • Use vision to map ingredients to the recipe
  • The key thing is vegetables, ingredients detection

Project #5 - Misinformation detection

Key Observations

  • Auto detect misinformation
  • Confirm or dispute info comparing sources


  • Prediction and Explanation of it

Project #6 - Edit audio by editing text

Key Observations

  • Audio corrections
  • Edit and remove and retain only needed items


Project #7 - Schedule power and reduce consumption

Key Observations

  • Reduce emissions / Track by activity
  • Retrain every hour
  • Hot swap models




Project #8 - Political bias detection

Key Observations

  • Informed in fairway
  • Classify biased news sources
  • Highlight bias in news sources
  • Polarity chrome extension to detect biases 
  • Highlight portions that attribute to bias


Project #9 _ Photoapp to summarize

Key Observations

  • Multiple instances of pics
  • Clusters images into sensible scene classes
  • Rank pics by the aesthetic quality
  • Clustering and quality assessment seamlessly


Project #10 - 911 operator assistant

Key Observations

  • Respond faster
  • NE, Emergency detection
  • Nearest Help
  • Google Maps API
  • Closest route



Stay Tuned for Part II from 1hr 20 mins

Keep Collecting inspiration!!!