Today I Learned
  • Today I Learned
  • Database
    • Intro to Database Systems
      • Advanced SQL
      • Database Storage
        • Disk oriented Architecture
        • Storage Hierarchy
        • Disk-oriented DBMS
        • Why not use the OS?
        • Storage manager
        • Database Pages
        • Record IDs
        • Tuple layout
      • Database storage II
        • Log-structured file organization
        • Tuple storage
        • Postgres: numeric
        • Large value
        • External value storage
        • System catalogs
        • Observation
        • OLTP
        • OLAP
        • N-ARY STORAGE MODEL (NSM)
        • Decomposition storage model (DSM)
        • Tuple Identification
      • Buffer Pool & Memory management
        • Buffer pool organization
        • Buffer pool meta-data
        • Page Table vs. Page Directory
        • Allocation Polocies
        • Buffer Pool Optimizations
          • Multiple Buffer Pools
          • Pre-Fetching
          • Scan Sharing
        • Buffer Pool Bypass
        • OS Page Cache
        • Buffer Replacement Policies
        • Least-Recently Used
        • Clock
        • Problems
        • Better Policies: LRU-K
        • Better Policies: Localization
        • Better Policies: Priority Hints
        • Dirty Pages
        • Background Writing
      • Hash Table
        • Hash tables
          • Chain hashing
          • Extendible hashing
          • Linear hashing
      • Tree indexes I
        • Table indexes
        • B+ Tree
          • Properties
          • Nodes
        • B-Tree vs. B+Tree
        • Cluster indexes
        • Selection conditions
        • Node size
        • Merge threshold
        • Non-Unique indexes
        • Intra-node search
        • Prefix compression
        • Suffix truncation
        • Bulk insert
        • Pointer swizziling
      • Tree Indexes II
        • B+Tree: Duplicate Keys
        • Implicit Indexes
        • Partial Indexes
        • Covering Indexes
        • Index Include Columns
        • Functional/Expression Indexes
        • Observation
        • Trie Index
          • Trie Index Properties
          • Trie key span
          • Radix Tree
          • Observation
        • Inverted Index
        • Query Types
        • Design decisions
      • Multi-Threaded Index Concurrency Control
        • Locks vs. Latches
        • Latch modes
        • Latch implementations
        • Hash table latching
        • B+ Tree concurrency control
        • Latch crabbing/coupling
        • Leaf node scans
        • Page 5
      • Sorting & Aggregations
        • External Merge Sort
          • 2-Way External Merge Sort
        • Using B+ Trees For Sorting
          • Case #1 - Clustered B+ Tree
          • Case #2 - Unclustered B+ Tree
        • Aggragations
          • Sorting aggregation
          • Alternatives to sorting
          • Hashing aggregate
  • Back-end
    • Docker
      • Basic cmd
    • Network
      • The internet
        • Protocol Stacks and Packets
        • Networking Infrastructure
        • Internet Infrastructure
        • The Internet Routing Hierarchy
        • Application Protocols: HTTP and the World Wide Web
        • Transmission Control Protocol
        • Internet Protocol
    • Programming Languages
      • Compiled vs Interpreted
  • Personal
    • Reflections on the Tiny Things I Learned in
      • 2023
Powered by GitBook
On this page
  1. Database
  2. Intro to Database Systems
  3. Sorting & Aggregations
  4. Aggragations

Hashing aggregate

Populate an ephemeral hash table as the DBMS scans the table. For each record, check whether there is already an entry in the hash table:

DISTINCT: Discard duplicate

GROUP BY: Perform aggregate computation

If everything fits in memory, then it is easy

If the DBMS must spill data to disk, then we need to be smarter...

PreviousAlternatives to sortingNextDocker

Last updated 2 years ago