Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Why RDBMS is more CA not CP

January 20, 2023

Why RDBMS is more CA not CP

CAP theorem states that it is possible to achieve two of these three properties as guaranteed features in a distributed network.

My perspective with some context

Example - For online banking why CA is more important than P

We cannot allow deduction on the incorrect amount
Every deduction has to be on recently committed data
It has to be real-time
Data may be partitioned by region/base account/branch for reducing the size of DB
Reads may be pointed to some other system and write will be on the target system to avoid overhead
Ideally, there is one main system, a backup copy system
Here main things met are consistency (always show your recent info)
Here mainly the data is Consistent and Available in an OLTP kind of setup
This DB may not be available multi-region unless you are an international customer
In this setup, you can achieve consistency and availability always. But if the same person when goes to another place/country the network latency might come into effect. We cannot replicate the same copy in real time if there are data constraints
Consider what majorly is met, Still, you achieve partition with BCP, replication, or some other options but there is an overhead for everything that you add to the system
Latency is dependent on storage type, query conditions, indexes too
There could be two-phase commit / Write ahead logging to retry if the timeout

CA database can be built in the form of a relational database (e.g. PostgreSQL) deployed to multiple nodes using replication. CA - Single node systems usually. Databases that adhere to ACID properties focus on consistency and represent the traditional approach

These are all points that CA it needs to consider before getting into 'P'. When you copy/divide and store you are accountable to manage all of it, how recent / what to do when it is not available to access

In practice, a distributed system always needs to be partition tolerant, thus leaving us to choose one property from Consistency or Availability. Hence, there is a trade-off between consistency and availability

There is much more to decide between CAP - Different perspectives to decide on choosing the right database?

Strict data types - Schema on write
Schemaless data - Schema on read
Read-only immutable data
Eventually consistent data
Dirty read vs Committed data
Multi-version concurrency control
Replicate data based on logs
Replay committed logs
Data sharding
Consistency options (2-phase-commit, Pessimistic locking)

Partition tolerance: understood as the ability of the system to continue operation in the presence of network partitions. These occur if two or more "islands" of network nodes arise, temporarily or permanently, which cannot connect to each other. Some also understand partition tolerance as the ability of a system to cope with the dynamic addition and removal of nodes

Every DB is built with some tradeoffs.

Keep Thinking!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

January 20, 2023

Why RDBMS is more CA not CP

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts