Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Next Paper Read

November 06, 2020

Next Paper Read - Docker, RDBMS to ML

Paper #1 - An introduction to Docker for reproducible research

Key Notes

Docker provides a binary image in which all the software has already been installed, configured and tested

Technical Issues in Software Deployment

Software Dependency Hell
Imprecise documentation

Docker Features

Performing Linux container (LXC) based operating system (OS) level virtualization
Portable deployment of containers across platforms component reuse
Versioning of container images
Docker images share the Linux kernel with the host machine
Sharing the Linux kernel makes Docker much more lightweight and higher performing than complete virtual machines

Components

Dockerfiles provide a simple script (similar to a Makefile) that defines exactly how to build up the image
Docker also supports Automated Builds through the Docker Hub (hub.docker.com).

Paper #2 - The Relational Data Borg is Learning

Key Notes

RDBMS in Data Science
Widespread need for efficient data processing
Process beyond classical database workloads
From the Survey 65% data is Relational. Retail has maximum structured data :)

Automated Feature Learning Approach

Key Features for Retail Stores

Items in stores
Store information
Demographics for areas around the stores
Inventory units for items in stores on particular dates
Weather Information

Queries based on Filters

Feature extraction query that joins these relations on keys for dates, locations, zipcode, and items
LMFAO (Layered Multiple Functional Aggregates Optimisation)
PCA over relational data

Insights

Running aggregates over days, weeks, months; min, max, average, median aggregates, or aggregates over many-to-many relationships and categorical attributes

ML Tasks

One-hot encoded
Categorical attributes
New database workload motivated by a machine learning application
Similar aggregates are derived for k-means clustering

(Iterative Functional Aggregate Queries) Framework

IFAQ can automatically synthesise and optimise aggregates from ML+DB workloads

Key Insights / Lessons

Turn the learning problem into a database problem.
Exploit the problem structure to lower the complexity.
Generate optimised code to lower the constant factors

There is no Data Science without Database - RDBMS :) :)

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

November 06, 2020

Next Paper Read - Docker, RDBMS to ML

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts