"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

March 18, 2020

Data Perspectives

Different perspectives to decide on choosing the right database?
  • Strict data types - Schema on write
  • Schemaless data - Schema on read
  • Read-only immutable data
  • Eventually consistent data
  • Dirty read vs Committed data
  • Multi-version concurrency control
  • Replicate data based on logs
  • Replay committed logs
  • Data sharding
  • High reads consistent data - RDBMS
  • High writes low reads - HBase, Cassandra
  • Document-based storage - Mongodb, Couchdb
  • CAP, ACID Properties
Things I Wished More Developers Knew About Databases

Almost similar and deep-dive techniques from the tweet conversation
  • Read heavy vs write heavy. Insert vs updates. Vaccuuming
  • Replication or not, transaction logging, why indexes matter, performance tuning, i/o scheduler, unicode, gender isn't binary
  • Locks, cache effects, isolation levels
  • IO bound vs network bound especially in the situation of replication, scaling strayegy, concurrency vs distributed.
  • Materialized views, and the dangers of invalidating them unexpectedly.
  • Connection pool, scaling techniques to handle distributed application / system, improve performance, optimization of query etc.
  • I'd be interested in how this applies to a distributed system. Concurrency (specifically MVCC), connections, DB threading, backpressure handling
  • Disk storage implementation and optimization

Keep Thinking!!! 

No comments: