Not a book, but found a series of videos on database design that I found pretty helpful. Link: Database Design
- What is a Database?
- stores data
- more complex than spreadsheets
- easily get what you want and secure
- What is a Relational Database?
- relation is connection between data
- entity: anything we store data about (e.g. person)
- attribute: things we store (e.g. name, hair color, etc)
- attributes store information about the entity
- row is all the attributes for a specific entity
- column is all values for a specific attribute type
- entity type: category of entity (e.g. user)
- attribute type: category of attributes
- row is also a ‘tuple’
- Relational Database Management Systems (‘RDBMS’)
- query: searching data and doing things with it
- without management system, would have to go through and do things manually
- 1) can run fancy queries
- 2) view mechanism: change what is shown. more secure
- 3) transactions. complete all the way or doesn’t happen at all
- 4) creates consistency for front-end
- e.g. MySQL, Microsoft SQL Server, Oracle DB, PostgreSQL
- DB and DBMS used as one
- Structured Query Language (‘SQL’)
- language to communicate with databases
- 1) DDL: defines DB structure
- 2) DML: manipulates data within
- field names is data definition e.g. CREATE
- inserting is data manipulation e.g. UPDATE
- DB breaks things into pieces
- can join to create new views
- Naming Conventions
- pattern to keep things consistent
- write SQL in all CAPS
- all lowercase for things I write myself
- underscores for spaces
- for foreign keys, make sure named same thing
- might be different for other systems
- What is Database Design?
- data integrity: correct, up-to-date, no disconnects, no repeat
- 1) Conceptual Schema: how things are related, brainstorming, general
- 2) Logical Schema: plan tables, columns, etc
- 3) Physical Schema: what DBMS, what server, how do people access, security, etc?
- continuum of general idea to specific implementations
- DB design method to separate info and prevent duplication
- Data Integrity
- no repeating, no incorrect, no broken relationships
- relational from “relations” mathematical connection of sets
- 1) entity: unique entities. use id_keys.
- 2) referential: reference id of another table for dependent relationships. can use foreign key constraints (e.g. if parent is removed, remove child)
- 3) domain: acceptable values for a column. what and range? data types e.g. Char(20)
- Database Terms
- data: info we store
- database: where we store
- relational database: stores in tables
- DBMS: how to manage and control db with code
- RDBMS: specific for relational dbs
- Null: value is not entered
- anomalies: errors
- integrity (entity, referential, domain): protect against anomalies
- entity: anything we store data about
- attributes: things we store about entities
- relations: tables
- tuple: row, all attributes about an entity
- table: physical representation of a relation
- table rows: specific individual entry
- table columns: specific attribute
- file: table
- record: row
- field: column
- value: information that we put into a specific column
- entry: entered data or row
- DB design: design to remove anomalies and have integrity
- schema: drawn out structure of DB
- normalization: multiple steps to get best database design
- naming conventions: consistency used
- keys: make everything unique within a table
- More Database Terms
- SQL: connect to db
- Data Definition Language (‘DDL’): define db
- Data Manipulation Language (‘DML’): update, delete, search, etc
- SQL Keywords: reserved words (e.g. SELECT). Don’t use in naming
- Frontend: what user sees. Allow access to data in different ways.
- Backend: what goes on behind the scenes. server-side code.
- Client: accesses db/servers
- Server-side scripting language: runs server
- Views: take data from db and illustrate it in different ways
- Joins: connect data from multiple tables
- Atomic Values
- value stores one thing
- i.e. first, middle, and last name as separate columns
- allows more complex queries
- every column header should be singular
- Relationships
- in db, everything is connected
- break into small tables and maintain connections/relationships
- One-to-One Relationships
- one entity has one connection with another entity
- has 1, assigned to 1
- One-to-Many Relationships
- one entity has relationships with multiple entities
- but the other entity can only be connected to one
- Many-to-Many Relationships
- all entities can have multiple relationships
- many-to-many relationships don’t work very well in relational dbs
- Designing One-to-One Relationships
- store one-to-one relationships as an attribute
- if you want to include more information that is dependent on entity with relationship, may want to create new table and connect with foreign keys
- keep table to be about one thing, make sure all attributes are about the entity
- Designing One-to-Many Relationships
- include foreign key to the many side
- parent table won’t have foreign key
- Parent Tables and Child Tables
- primary key is the parent
- child always had the foreign key
- foreign key will be on many side (child) and have value of parent
- parent doesn’t inherit anything from the child
- when we have entity, think about if it has a parent
- Designing Many-to-Many Relationships (M:N)
- doesn’t work because parent is child and parent
- need intermediary or junction table
- 1 to many to intermediary from both sides
- intermediary table has foreign keys pointing to both sides
- child is individual connection, shouldn’t repeat
- Summary of Relationships
- not always defined relationships for all circumstance
- depends on the application of the db
- 1:1. include as attribute
- 1:M. new table. make child have foreign key.
- M:N. include new intermediary table.
- binary is about two entities
- Introduction to Keys
- 1) keeps everything unique. email, username is example of natural key
- 2) should also be never changing bc connect things by keys. if changes, need to update connections
- 3) should never be null
- Primary Key Index
- index points you to the data
- without indexes, has to go through every single row
- key is a type of index
- SELECT * FROM users WHERE first_name = ‘Caleb’
- Look up Table
- lookup table has all options with a key
- another table can reference with id in lookup table
- only need to change one item, requires less maintenance
- 1) keys protect integrity. Don’t have to worry about only some updating.
- 2) keeps everything unique. w/o lookup table, must be repeated.
- 3) increases speed
- 4) makes updating easier.
- 5) allows for added complexity. can include more information in lookup table.
- often used when a set number of options
- Superkey and Candidate Key
- primary and foreign are main two you hear about
- superkey: any number of columns that forces a row to be unique.
- superkey can be multiple columns
- candidate key: least number of columns that force a row to be unique
- e.g. username can be enough
- can every row be unique? what superkey is questioning
- how many columns are needed?
- how many candidate keys do I have?
- fn + ln + mn + birthday + address
- defining key to use for table
- called candidate key because you can pick
- Primary Key and Alternate Key
- pick one of candidate keys
- unique, never changing, never null
- user name is a good natural key
- candidate keys not picked are alternate keys
- if you have good alternate key index it to make faster (e.g. email)
- Surrogate Key and Natural Key
- categories of primary keys
- natural key: fits criteria of primary key by itself
- either try to always use surrogate or natural
- surrogate ids are added (E.g. user_id, sale_id, comment_id)
- auto-increment automatically increases by 1
- Should I use Surrogate Keys or Natural Keys?
- natural key pros: don’t define new data, real world meaning
- natural key cons: sometimes hard to find, could change
- surrogate key pros: typically numbers, easier to work with
- surrogate key cons: have to add a new column, can be confusing
- typically pick one and use throughout database
- Foreign Key
- reference to a primary key in same or different table
- foreign key columns can have reference rules
- make every value in row point to a primary key in another table
- every table has one primary key, which can have multiple columns
- foreign keys connect tables
- NOT NULL Foreign Key
- db will not accept an empty answer
- primary keys always required
- foreign keys not required, can be empty, or updated
- foreign keys can change because references can change
- sometimes want foreign keys to be nullable other times not
- Foreign Key Constraints
- Foreign keys keep things consistent
- referential integrity
- make sure children update with parent
- prevent creating children w/ a parent that doesn’t exist
- ON DELETE or ON UPDATE. talking about parent.
- when update or delete parent, want something to happen to child
- RESTRICT: throw error. parent won’t be deleted or changed.
- CASCADE: do same to child. e.g. if parent is deleted, children deleted.
- SET NULL: set children to null for that value. children can’t have not null characteristic or will error and not change.
- Simple Key, Composite Key, Compound Key
- Simple Key: key has one column
- Composite Key: two or more columns. Tends to be natural keys
- Compound Key: sometimes interchangeable with composite. Key with multiple column where they are all keys themselves. i.e. intermediary tables. Every combination must be unique.
- With intermediate table, some people add a surrogate key on top
- not a requirement, but some DBMS not good with composite keys
- Review and Key Points
- Superkey: any number of columns that ensures uniqueness
- Candidate Key: least number of columns to ensure uniqueness
- Primary Key: main key for table
- Alternate Keys: non picked candidate keys
- Foreign Key: references a primary key and makes connections between tables
- primary key and foreign key most important
- Surrogate: Random number and no real world meaning
- Natural: Already in db
- pick one or the other
- Simple Key: one column
- Composite: multiple column
- Compound: all columns are keys themselves
- Foreign Key Constraints: ON UPDATE, ON DELETE and name them (restrict, cascade, set null)
- Introduction to Entity Relationship Modeling
- A standard for drawing db
- EER, ERD or ER
- method used to draw out db structure
- DDL – Data Definition Language
- There are entity relationship management programs
- give it a name on top of a square
- add the columns you want in there
- Cardinality
- relationship type between different tables
- can be one to one or one to many
- +—< (three/crow feet) one row can connect to many rows
- +—+ one to one
- Modality
- whether child is required
- +—–o+ : 0 means does not have not null characteristic. accept nulls. zero or up to 1.
- +—–++ : at least one card. can’t have null
- +—–o< : zero or more
- +—–+< : 1 or more
- Introduction to Database Normalization
- process of going through and planning to avoid errors
- systematic way to create a good structured database
- 3 Normal Forms: 1NF, 2NF, 3NF
- must be in an earlier normal form before going to next
- First Normal Form of Database Normalization (‘1NF’)
- domain of each attribute contains only atomic values and and value of each attribute contains only a single value from that domain
- e.g. address column can have a lot of pieces of data so is not atomic
- you could break columns into pieces
- e.g. if you have two values in one spot
- you could break up problem column and break into a new table
- Second Normal Form of Database Normalization (‘2NF’)
- Deals with partial dependencies
- every non-prime attribute (non candidate key) is dependent on the whole of every candidate key
- for intermediary tables, information pertaining to both tables should be contained here
- error would be if ISBN is in a book_author table because only dependent on book
- Third Normal Form of Database Normalization (‘3NF’)
- transitive dependency
- e.g. if you have a review table with reviews, stars, star_meaning
- star_meaning is dependent on stars and stars may change if review changes
- create a new table with star_id, star, and star_meaning
- drop stars and star_meaning in last table and add star_id
- makes sure all attributes of table are determined only by the candidate keys of table and not any non-prime attributes
- 1NF: make things atomic
- 2NF: remove partial dependencies
- 3NF: remove transitive dependencies
- Indexes (Clustered, Non-clustered, Composite Index)
- index is like a list with information of where things are
- non-clustered: data is not where index is. just sorts data and shows you where to go.
- clustered: actual data is reorganized in a way easy to use.
- table scan: go through all the data
- index seek, db knows where to start searching for the data
- downside to index, table has to update whenever you update and the indexes themselves
- WHERE clause using primary key, usually faster bc sorted and easier to find
- if you don’t use an indexed column on where, can take very long time
- join on indexed columns to make it faster
- composite: can make an index for multiple columns
- Data Types
- Each column should have some type
- 3 main categories
- 1) Date: date, time, datetime. timestamp is exact moment in time.
- 2) Numeric: decimal, binary. floating points done in binary. decimal generally more accurate for math. binary has problem with some math like fractions. integer types only whole numbers. signed or unsigned (can’t be neg). changes max value.
- 3) String: any character or letter in ” “. e.g. char(8) length will be 8, varchar(8) length can change
- Introduction to Joins
- present information in more user friendly way
- DML – Data Manipulation Language. Changing presentation
- design in user-friendly way
- Inner Join
- takes two tables and return entries based on some intersecting point
- will eliminate other points
- i.e. the intersecting parts of a Venn diagram
- DML done through SELECT (syntax varies by language)
- SELECT col1, col2, coletc FROM table1 INNER JOIN table2 ON table1.col1 = table2.col1
- the dot makes it a qualified column
- Inner Join on 3 Tables
- inner joins result in smaller data sets
- Inner Join on 3 Tables (Example)
- user table, comment table, video table
- user only connected to video through comment
- SELECT username, title, comment from user INNER JOIN comment ON user.user_id = comment.user_id INNER JOIN video ON video.video_id = comment.video_id
- users with no comments will be excluded
- videos without comments will be excluded
- all comments will be returned
- Introduction to Outer Joins
- returns all rows on one of the tables
- LEFT JOIN will return all rows on left and right info only where connected
- 3 types: left, right, and full
- Right Outer Join
- return all rows on the right and data where connection from left table
- SELECT * FROM table(left)
- most people don’t use right, usually will use left join and switch tables
- JOIN with NOT NULL Columns
- some columns will be NOT NULL
- every single row in that column will have a value
- can have same result with LEFT JOIN and INNER JOIN
- LEFT JOIN with users will give all users and all comments
- Outer Join Across 3 Tables
- initial join will create a new table
- next join will be on this table
- Alias
- give something more user friendly name
- AS keyword
- e.g. SELECT user AS customer
- e.g. SELECT first_name as “first name”
- can rename tables too
- Self Join
- Join table with itself
- could use when a column has a foreign key referring to a primary key on same table
- e.g. referred_by_id references user_id. want more information of referred_by_id such as email
- uses alias on table