Events
- Data has to go into the database
- Kafka - All of your data is events stream
- Kafka is having an opinion of the world
Sensors
- Sensor data are events
- Car companies with Internet Connected Devices
- Log Entries are an operational thing
- Logs are events
Databases
- Databases can also be events
- Table - Collection of Key-value pairs
- Modifications as messages
- Updates can be stream of messages
Uses of Stream
- Data Pipeline
- React / Process / Transform
- Web App -> Streaming Platform -> Hadoop
- "Product View Request"
- Forward Compatible (Receive Requests from Multiple Interfaces
- New services can listen and easy to extend the system to evolve the system going forward
Kafka Overview
- Producers - Kafla CLusters - Consumers
- Data Model - Log
- Write comes at the end
- Log file
- Multiple consumers can read from the log
- Reader - Consumer
- Writer - Producer
- Kafka Topic = Partitioned Log
- Kafka is Distributed Message Queue
- Each Topic is a partitioned Log
- Partitioned among multiple computers (Brokers)
- The producer decides partition to write to
- Kafka, we have ordering within the partition
- Ordering within the partition but not available globally
- There is no global ordering
- Table and stream are isomorphic
- Group of Consumers
- Consumer groups handy way to divide among multiple consumers
Scalability of File System
- Write Part
- Read Part
- Indexes, Merge, Log Tree
- Hundreds of MB / Sec throughput
- Commodity hardware
- O(1) writes
- Replication
- Fault Tolerance
- Partitioning
- Elastic Scaling
- Partitioning
- One Lead Partition
- Multiple Followers
- One broker acts as a controller
- Partitioning is to scale a Topic
- Leader and Follower Partitions
- Four Partitions and Three Replicas
- ISR - In Sync Replica - Caught up with Leader
- For a write to be committed it has to be commited by the leader and all other ISR
- Watch your ISR List
- Upgrade all the brokers in Rollout Fashion (Keep them in the same version)
- Broker Kept Failing
- Leader Failure
- More Partitions more throughput
- More partition longer to balance cluster
- Router Configuration problem
- Custom Environment
- Kafka Reassignment Partition Tool
- Generate Migration Instructions
Happy Learning Best Practices!!!
No comments:
Post a Comment