"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 20, 2023

Data vs Ideas vs Perspectives vs New Ideas Papers

  • Share your analysis not news, Every paper has some perspectives so analyze and connect to past lessons
  • Data engineering has similarities to feature engineering
  • Feature engineering needs domain and data science lens
  • Data engineering needs ETL / ELT
  • Vector databases / Multi models merge all text, data, and audio into one form 
  • Data science has multiple areas of the forecast, regression, recommendations, anomaly detection
  • NLP has all the NER, Summarization, Topic modeling, Sentiment Analysis
  • Vision has Classification, Segmentation, Object detection, Action recognition
  • 2015 to 18 - Age of ML (Regression, SVM, Decision Trees, Random Forest)
  • 2015 to 2019 - CNN, RNN, LSTM
  • 2020-22 - Transformers, BERT
  • 2023 - LLM Models, ChatGPT

More and more new tech will come, Filter signal from noise.

Bard has rewritten with more content 

Data engineering and feature engineering are both important steps in the machine learning process. 

  • Data engineering is the process of collecting, cleaning, and organizing data so that it can be used for machine learning. 
  • Feature engineering is the process of transforming data into features that are useful for machine learning models. 

Both data engineering and feature engineering are essential for creating accurate and reliable machine learning models.

Feature engineering requires a deep understanding of the domain and the data science lens. 

  • The domain knowledge helps the feature engineer to understand the meaning of the data and how to transform it into features that are relevant to the problem at hand. 
  • The data science lens helps the feature engineer to understand the statistical properties of the data and how to transform it into features that are useful for machine learning models.

Data engineering needs ETL (extract, transform, load) or ELT (extract, load, transform) processes

  • ETL or ELT processes are used to collect, clean, and organize data so that it can be used for machine learning. 
  • ETL processes typically involve extracting data from various sources, transforming it into a consistent format, and loading it into a data warehouse or data lake. 
  • ELT processes typically involve extracting data from various sources, loading it into a data warehouse or data lake, and then transforming it into a consistent format.

Vector databases and multi-model databases are emerging technologies that can be used to store and process large amounts of data

  • Vector databases are designed to store and process large amounts of text data. Multi-model databases are designed to store and process large amounts of data from a variety of sources, including text, audio, and video. 
  • These technologies can be used to improve the performance of machine learning models that are trained on large amounts of data.

Machine Learning has multiple areas of focus, including forecasting, regression, recommendations, and anomaly detection

  • Forecasting is the process of predicting future values of a variable. 
  • Regression is the process of finding a relationship between two or more variables. 
  • Recommendations are the process of suggesting items to users based on their past behavior. 
  • Anomaly detection is the process of identifying unusual or unexpected events. These areas of focus are all important for data scientists who are working to solve real-world problems.

Natural language processing (NLP) is a field of computer science that deals with the interaction between computers and human (natural) languages. 

  • NLP has a variety of applications, including text classification, sentiment analysis, and summarization. 
  • Text classification is the process of assigning a category to a piece of text. 
  • Sentiment analysis is the process of determining the sentiment of a piece of text, such as whether it is positive, negative, or neutral. 
  • Summarization is the process of creating a shorter version of a piece of text that retains the most important information.

Computer vision is a field of computer science that deals with the extraction of meaningful information from digital images or videos

  • Computer vision has a variety of applications, including image classification, object detection, and action recognition. 
  • Image classification is the process of assigning a category to an image. 
  • Object detection is the process of identifying objects in an image. Action recognition is the process of identifying actions in a video.

The field of machine learning has seen rapid progress in recent years. 

  • In the early 2010s, machine learning was primarily used for regression and classification tasks. 
  • In the mid-2010s, deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), were developed and began to be used for a wider range of tasks, such as image classification and natural language processing. 
  • In the late 2010s and early 2020s, even more powerful deep learning techniques, such as transformers, were developed and began to be used for a wider range of tasks, such as machine translation and text summarization.

The field of machine learning is constantly evolving and new technologies are emerging all the time. 

It is important for data scientists to stay up-to-date on the latest trends in machine learning so that they can use the most effective techniques for solving real-world problems.

Keep Exploring!!!


No comments: