B Tree Basics - TalkPoints
Listen to the twitter space!
Join us this Sunday by 5pm GMT to review Chapter 2 of Database Internals by @ifesdjeen. @haile_lagi will lead discussions this week! See you soon.https://t.co/UtMsYoHXC2
— System Design Conversations (@sysdsgn) February 28, 2024
Simple walkthrough/guide
Recap storage engines at the heart B-Tree. Data structures contiguous vs pointer based.
- Honorable mention hashmap Binary Search Trees - Dictionary Data type/Map balancing going beyond Big-O mutability vs immutability briefly concurrency
disk considerations data set cannot fit in memory, we must cache and page in and out of memory.
“we have to perform balancing, relocate nodes, and update pointers rather frequently. Increased maintenance costs make BSTs impractical as on-disk data structuree”
HDDs vs SSDs how they work important - block device abstraction locality of reference
goals:
- minimize expensive disk seeks
- min random IO, max sequential IO
- minimize pointer management
B Trees: we actually mean Bplus Tree. sorted data structure, pointer based = binary search. why do we want search trees? balancing and order
B-Trees are useful for:
- In-memory indexes (also used here!)
- persisted on disk storage organisation. <– we’re here.
“Since B-Trees are a page organization technique (i.e., they are used to organize and navigate fixed-size pages), we often use terms node and page interchangeably.”
terminology: seperator key root internal node contains seperator key leaf node degree/branching factor occupancy
desired properties:
- high fanout
- low height
operations
- insert
- search (point and range) + block transfers
- deletion
internal operations:
- splits on insertion
- merges on deletion
not covered:
- how do we actually get to disk?
- encoding, concurrency etc
closing - implementation is on chapter four & six, we shall revisit again!