GMBA 612 โ€ข Database Design & SQL

The World of Data

An interactive guide to understanding where data lives, how it flows, and why it matters for your career.

The Data Ecosystem at a Glance

Before you write your first SQL query, let's see the big picture. Click any card to explore further.

๐Ÿ—„๏ธ

Databases

The organized, real-time storage that powers every app you use. When you check your bank balance or add items to a cart, a database responds in milliseconds.

Operational โ€ข Real-time
๐Ÿญ

Data Warehouses

Historical archives optimized for analysis. When Netflix decides what shows to produce, they're querying years of viewing data from their warehouse.

Historical โ€ข Analytics
๐ŸŒŠ

Data Lakes

Raw, unstructured storage for "we might need this later." Images, logs, sensor dataโ€”dump it all in and figure out what to do with it later.

Raw โ€ข Flexible
๐Ÿ“‹

Structured Data

Neat rows and columns, like a spreadsheet. Names, dates, pricesโ€”anything that fits cleanly into a table with predictable formats.

Tables โ€ข SQL-friendly
๐ŸŽจ

Unstructured Data

The messy stuff: emails, videos, social media posts, PDFs. It doesn't fit in rows and columns, but it's 80% of the world's data.

Messy โ€ข Growing fast
โšก

Operational Systems

The workhorses that keep businesses running second by second. Processing orders, updating inventory, recording transactions.

OLTP โ€ข Transactional
๐Ÿ“ˆ

Analytics Systems

Where you go to ask "what happened?" and "what should we do?" Dashboards, reports, and the queries that drive strategic decisions.

OLAP โ€ข Decision support
๐ŸŒ

Big Data

When your data is so large, fast, or varied that traditional tools can't handle it. Think billions of social media posts or IoT sensor readings.

Volume โ€ข Velocity โ€ข Variety
๐Ÿ’ก Think of it like a restaurant...
Database = The kitchen, handling orders in real-time
Data Warehouse = The accounting office, analyzing last year's sales
Data Lake = The storage room, holding everything "just in case"
๐Ÿ’ก Or like your closet...
Structured data = Neatly folded shirts in labeled drawers
Unstructured data = That pile of stuff you shoved in the back

Where Does Data Live?

Different storage solutions for different problems. Choosing the right one is a million-dollar decision.

๐Ÿ—„๏ธ

Operational Databases

The real-time engines powering every transaction

๐Ÿ“ What It Is

An operational database (also called OLTP - Online Transaction Processing) handles the day-to-day transactions of a business. Every time you swipe a card, place an order, or update your profile, an operational database is working behind the scenes.

These databases are optimized for speed and reliability. They need to handle thousands of small read/write operations per second without breaking a sweat.

๐Ÿข Real-World Examples

  • Amazon: Tracking inventory levels, processing orders, updating shipping status
  • Your Bank: Recording every deposit, withdrawal, and transfer
  • Uber: Matching riders with drivers, tracking locations in real-time
  • Starbucks App: Maintaining your rewards points and order history
๐Ÿญ

Data Warehouses

Historical archives optimized for complex analysis

๐Ÿ“ What It Is

A data warehouse pulls data from multiple operational systems and organizes it for analysis. Unlike operational databases that answer "what's happening now?", warehouses answer "what happened over time?" and "why?"

Data is typically loaded in batches (nightly, hourly) and structured to make analytical queries fastโ€”even across billions of rows.

๐Ÿข Real-World Examples

  • Netflix: Analyzing viewing patterns across years to decide which shows to produce
  • Walmart: Combining sales, inventory, and weather data to optimize stocking
  • Capital One: Analyzing transaction history for fraud patterns and credit risk
๐ŸŒŠ

Data Lakes

Raw storage for anything and everything

๐Ÿ“ What It Is

A data lake stores data in its raw, original formatโ€”no cleaning, no structuring. It's the "dump it now, figure it out later" approach. This is perfect for data you might need someday but aren't sure how you'll use it.

Data lakes are cheap and flexible, but can become "data swamps" without proper governance. The trade-off: flexibility vs. organization.

๐Ÿข Real-World Examples

  • Healthcare: Storing MRI images, doctor notes, sensor data from wearables
  • Social Media: Raw posts, images, click streams, engagement data
  • IoT/Manufacturing: Millions of sensor readings from factory equipment
โœจ

Data Lakehouses

The best of both worlds (newer approach)

๐Ÿ“ What It Is

A lakehouse combines the flexibility of a data lake with the performance and reliability of a data warehouse. It's a relatively new architecture that's gaining traction fast.

Think of it as a data lake that learned to organize itself. You get cheap storage for raw data AND the ability to run fast analytical queriesโ€”without maintaining two separate systems.

๐Ÿข Key Technologies

  • Databricks: Pioneer of the lakehouse concept
  • Snowflake: Cloud data platform with lakehouse capabilities
  • Delta Lake: Open-source storage layer that brings reliability to data lakes
  • Apache Iceberg: Open table format for huge analytic datasets
๐Ÿ“Š Quick Comparison
Feature Database (OLTP) Data Warehouse Data Lake Lakehouse
Purpose Run the business Analyze the business Store everything Both analyze & store
Data Structure Highly structured Structured Raw / any format Structured + raw
Query Speed Fast (small queries) Fast (big queries) Slow without prep Fast
Cost $$ $$$ $ $$
Best For Transactions BI & reporting ML, archival Modern analytics

Types of Data

Not all data is created equal. Understanding the differences shapes everything from storage to analysis.

๐Ÿ“‹

Structured Data

Neat, organized, and SQL-friendly

๐Ÿ“ What It Is

Structured data fits neatly into rows and columnsโ€”like a spreadsheet or database table. Each field has a defined type (text, number, date), and the format is consistent and predictable.

This is the data you'll work with most in this course. It's what SQL was built for.

๐Ÿ“ฆ Examples

  • Customer records: Name, email, phone, address
  • Financial transactions: Date, amount, account, category
  • Product catalogs: SKU, name, price, quantity
  • Employee data: ID, department, salary, hire date
๐ŸŽจ

Unstructured Data

The wild westโ€”80% of all data

๐Ÿ“ What It Is

Unstructured data doesn't fit into tables. It's free-form, variable, and messy. Think of all the content that doesn't naturally organize into rows: documents, images, videos, conversations.

This is where most of the world's data livesโ€”and it's growing fast. Traditional databases struggle with it, which is why we need data lakes and specialized tools.

๐Ÿ“ฆ Examples

  • Emails and chat messages
  • Social media posts and comments
  • Images, videos, audio files
  • PDF documents, contracts, legal filings
  • Sensor logs and IoT data streams
๐Ÿ”€

Semi-Structured Data

The middle ground

๐Ÿ“ What It Is

Semi-structured data has some organizationโ€”tags, markers, hierarchiesโ€”but doesn't enforce a rigid schema. It's more flexible than structured data but more organized than pure unstructured data.

This format has exploded with web applications and APIs. When apps talk to each other, they usually exchange semi-structured data.

๐Ÿ“ฆ Examples

  • JSON: The language of web APIs (you'll see this everywhere)
  • XML: Document markup, configuration files
  • CSV files: Tabular but no enforced types
  • Log files: Timestamped entries with variable content
๐ŸŒ

Big Data

When traditional tools hit their limits

๐Ÿ“ The 3 V's (Classic Definition)

Volume: Too much data to fit on one machine. We're talking terabytes, petabytes, exabytes.

Velocity: Data arriving too fast for traditional processing. Real-time streams from millions of users.

Variety: Too many formats and sources to handle with one tool. Structured + unstructured + everything in between.

๐Ÿข Real-World Scale

  • Google: Processes 8.5 billion searches per day
  • Facebook: Handles 4+ petabytes of new data daily
  • Netflix: Streams to 260M+ subscribers, all personalized
  • IoT: 41 billion connected devices expected by 2027
๐ŸŽฏ The Data Spectrum
๐Ÿ“‹
Structured
Tables, SQL
โ†”
๐Ÿ”€
Semi-Structured
JSON, XML
โ†”
๐ŸŽจ
Unstructured
Images, Text

Operational vs. Analytical Systems

Two different mindsets, two different architectures. Most companies need both.

โšก

OLTP: Online Transaction Processing

Running the business, one transaction at a time

๐Ÿ“ What It Does

OLTP systems handle the constant stream of small, fast transactions that keep a business running. Every time you tap your credit card, add to cart, or like a post, an OLTP system processes that action.

Key priorities: Speed, reliability, data integrity. If Amazon's order system is slow for even a second, they lose millions in sales.

โš™๏ธ Characteristics

  • Many concurrent users (thousands to millions)
  • Simple queries: INSERT, UPDATE, SELECT by ID
  • Small amounts of data per transaction
  • Normalized design (no redundant data)
  • ACID compliance (data is always consistent)
๐Ÿ“ˆ

OLAP: Online Analytical Processing

Understanding the business, one question at a time

๐Ÿ“ What It Does

OLAP systems answer complex business questions by crunching large volumes of historical data. When an executive asks "How did sales compare across regions last quarter?", an OLAP system provides the answer.

Key priorities: Query flexibility, historical depth, aggregation performance. Speed matters, but insight matters more.

โš™๏ธ Characteristics

  • Fewer users (analysts, executives, data scientists)
  • Complex queries: JOINs, GROUP BY, window functions
  • Large data scans (millions to billions of rows)
  • Denormalized design (optimized for reading)
  • Historical data spanning months or years
โš–๏ธ OLTP vs. OLAP: Side by Side
Aspect OLTP (Operational) OLAP (Analytical)
Primary Question "What's happening now?" "What happened and why?"
Users Customers, employees, apps Analysts, executives, data teams
Query Type Simple, targeted (find 1 record) Complex, broad (aggregate millions)
Data Volume per Query Small (1-100 rows) Large (millions of rows)
Update Frequency Continuous (real-time) Batch (hourly, daily, weekly)
Design Approach Normalized (avoid redundancy) Denormalized (optimize reads)
Example Tool MySQL, PostgreSQL, Oracle Snowflake, Redshift, BigQuery
๐Ÿ’ก Restaurant Analogy
OLTP = The kitchen during dinner rush. Fast, precise, handling many orders simultaneously. Each order is small and needs immediate attention.

OLAP = The accountant reviewing last month's sales. Looking at big-picture patterns, comparing periods, finding insights.
๐Ÿ’ก Hospital Analogy
OLTP = The ER system tracking patients right now. Real-time updates, critical for operations.

OLAP = Epidemiologists analyzing 10 years of patient data to spot health trends across regions.

How Data Flows Through an Organization

From raw capture to strategic insightโ€”the journey every byte takes.

๐Ÿ”„ The Data Pipeline
๐Ÿ“ฅ
Capture
Apps, sensors, users
โ†’
๐Ÿ—„๏ธ
Store
Databases, lakes
โ†’
๐Ÿ”„
Transform
ETL, cleaning
โ†’
๐Ÿ“Š
Analyze
Reports, ML
โ†’
๐ŸŽฏ
Act
Decisions, automation
๐Ÿ”„

ETL: Extract, Transform, Load

The plumbing that moves data between systems

๐Ÿ“ What It Is

ETL is the process of moving data from operational systems (where it's created) to analytical systems (where it's analyzed). It's the bridge between the chaos of raw data and the clean datasets analysts need.

๐Ÿ“ฆ The Three Steps

  • Extract: Pull data from source systems (databases, APIs, files)
  • Transform: Clean, validate, standardize, enrich the data
  • Load: Push the processed data into the destination (warehouse, lake)
๐Ÿ”—

APIs: How Systems Talk to Each Other

The universal language of modern data exchange

๐Ÿ“ What It Is

An API (Application Programming Interface) is how software systems request and share data with each other. When your weather app shows the forecast, it's calling a weather API. When you log in with Google, that's an API call.

APIs are the glue holding modern applications together. Understanding them is increasingly essential for anyone working with data.

๐Ÿข Examples You Use Daily

  • Payment processing: Stripe API handles credit cards
  • Maps: Google Maps API powers location features
  • Social login: OAuth APIs let you "Sign in with Google"
  • Internal tools: Companies expose APIs for their own systems to communicate
๐Ÿข A Real-World Example: E-Commerce Company
๐Ÿ›’
Orders DB
OLTP
๐Ÿ‘ฅ
Customers DB
OLTP
๐Ÿ“ฆ
Inventory DB
OLTP
๐Ÿ“ฑ
Clickstream
Event logs
โ†’
๐Ÿ”„
ETL Pipeline
Nightly processing
โ†’
๐Ÿญ
Data Warehouse
Snowflake / Redshift
โ†’
๐Ÿ“Š
Dashboards
Tableau, Looker
๐Ÿค–
ML Models
Recommendations
๐Ÿ“
Ad Hoc SQL
Analyst queries

Check Your Understanding

Let's see how well you've grasped the concepts. Click an answer to see if you're right.

๐ŸŽฏ Quick Quiz

Score: 0 / 6
Question 1 of 6
A bank recording your ATM withdrawal in real-time would use which type of system?
A Data Warehouse
B Operational Database (OLTP)
C Data Lake
D OLAP System

โœ… Operational databases (OLTP) handle real-time transactionsโ€”recording every deposit, withdrawal, and transfer as they happen. Data warehouses and OLAP systems are for historical analysis, not real-time operations.

Question 2 of 6
Netflix analyzing 5 years of viewing patterns to decide which shows to produce would use:
A Operational Database
B Data Warehouse
C Transactional System
D NoSQL Database

โœ… Data warehouses are optimized for analyzing large volumes of historical data. They're designed for complex queries that span years of dataโ€”perfect for strategic decisions like content investment.

Question 3 of 6
Customer email addresses, purchase dates, and order totals are examples of:
A Structured data
B Unstructured data
C Big data
D Semi-structured data

โœ… Structured data fits neatly into rows and columns with predictable formats. Email addresses, dates, and numbers are classic examplesโ€”they're exactly what SQL databases are built for.

Question 4 of 6
What does the "3 V's" of Big Data refer to?
A Value, Verification, Visualization
B Variables, Vectors, Versions
C Volume, Velocity, Variety
D Validation, Virtualization, Vulnerability

โœ… The 3 V's define when data becomes "big data": Volume (too much for one machine), Velocity (arriving too fast), and Variety (too many different formats).

Question 5 of 6
ETL stands for:
A Enter, Test, Launch
B Extract, Transform, Load
C Encode, Transfer, Log
D Evaluate, Track, Learn

โœ… Extract, Transform, Load is the process of moving data from source systems to analytical systems. You pull data out (extract), clean it up (transform), and put it where analysts can use it (load).

Question 6 of 6
A Data Lakehouse combines the features of:
A OLTP and OLAP systems
B Data Lakes and Data Warehouses
C Structured and Unstructured databases
D SQL and NoSQL databases

โœ… A Lakehouse combines the cheap, flexible storage of a data lake with the fast query performance of a data warehouse. It's a newer architecture that's becoming increasingly popular.