"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 15, 2022

AI Projects - Ideas / Inspirations

Project #1 - Fashion Clothing Category Classification

Report - Link


Models - Link

Good start to work on fashion attributes

Project #2 - Time Series based Wikipedia Traffic prediction to aid Caching algorithms

Key Notes

  • Caching algorithms like LRU, LFU are one of the most widely used algorithms in the industry ranging from storage systems and in-memory key-value stores to routers
  • Unlike many other ML projects such as image recognition etc., the performance comparison of the ML for caching is not measured against human annotated ground truth, but against LRU
  • LSTM and CNN based architectures with custom loss function
  • Develop a custom loss function by adding a term to maximize recall to the binary cross-entropy and tune the new hyper-parameter
  • Tune the loss function parameter to place a higher weight on positive samples. The custom loss function is as shown below

Loss functions - Link


From link

A custom loss function can be created by defining a function that takes the true values and predicted values as required parameters. The function should return an array of losses. The function can then be passed at the compile stage. 

Project #3 - Long Term Stock Prediction Based On Financial Statements

  • Feature engineering with key indicators, such as Price-to-Book Ratio, Price-to-Earnings Ratio, Debt-to-Equity Ratio
  • This project focuses on building an end-to-end LSTM model for long term stock prediction based on historical financial statements

Dataset - Link

  • Balance sheet features, including 30 data fields: cash and cash equivalents, short-term investments, net receivables, inventory, other current assets, total current assets, long-term investments, fixed assets, goodwill, intangible assets, other assets, deferred asset charges, total assets, accounts payable, short-term debt / current portion of long-term debt, other current liabilities, total current liabilities, long-term debt, other liabilities, deferred liability charges, misc. stocks, minority interest, total liabilities, common stocks, capital surplus, retained earnings, treasury stock, other equity, total equity, total liabilities and equity
  • Income statement features, including 18 data fields: total revenue, cost of revenue, gross profit, research and development, sales general and admin., non-recurring items, other
  • operating items, operating income, add’l income/expense items, earnings before interest and tax, interest expense, earnings before tax, income tax, minority interest, equity earnings/loss unconsolidated subsidiary, net income-cont. operations, net income, net income applicable to common shareholders.
  • Cash flow statement features, including 18 data fields: net income, depreciation, net income adjustments, accounts receivable, changes in inventories, other operating activities, liabilities, net cash flow-operating, capital expenditures, investments, other investing activities, net cash flows-investing, sale and purchase of stock, net borrowings, other financing activities, net cash flows-financing, effect of exchange rate, net cash flow.

Paper #4 - Stock Market Prediction using CNN and LSTM

  • Starting with a data set of 130 anonymous intra-day market features and trade returns
  • This study is based on a financial dataset extracted from the Jane Street Market Prediction competition on Kaggle [16]. The available dataset is composed of 2,390,491 record each defined using 130 anonymous features measured sequentially spanning 500 days at different time steps during each day.
  • Rolling cross validation

Predict bucket move 5 / 10 / 15  20 / break

Project #5 - Film Success Prediction Using NLP Techniques

  • Our dataset may be separated into two major parts: a set of structured categorical and numerical data retrieved from IMDb, and a set of scripts from which we generate word frequency vectors and scene description vectors 
  • For the categorical structured data, we use a dense layer without bias to serve as a trainable embedding layer which returns a 128 dimensional embedding of the data.

Paper #6 - Generating Six-Word Stories

  • The six-word story is a format of flash storytelling that rose to popularity through the famous tale allegedly written by Ernest
  • Hemingway
  • PRAW (the Python Reddit API Wrapper)
  • Query data from the r/sixwordstories subreddit Use this to generate meaningful tweets

Project #7 - Changing people’s hair color in images




  • The training set is split into two sets trainA and trainB. 
  • The images from trainB are presented to the discriminator with their actual hair color. 
  • Images from trainA are given as input to the generator along with a target hair color that is randomly sampled from all the hair colors occuring in trainB.
  • The discriminator tries to classify images from trainB (labeled with their actual hair color) as 1 and generated images (labeled with the target hair color) as 0, whereas the generator tries to fool the discriminator with realistic generated images. 
  • In order to be successful at this, the generator should have to match the target hair color in the generated image.

Code - Link

Project #8 - Detect Depression



Project #9 - Predict loan default


Project #10 - Link Prediction with Graph Neural Networks and Knowledge Extraction

  • Graph Neural Network. The number of GNN layers is limited due to the Laplacian smoothing

  • Knowledge Extraction: We use BERN [4] to extract named entities for the abstract of each articles.

Project #11 - Finding a hairstyle that fits your facial features

Adam and RMSProp and ended up using the AdamOptimizer described in the lecture

More Reads

Keep Exploring!!!

No comments: