Database Lecture-DBA

最新推荐文章于 2025-04-01 14:14:06 发布

hay987

最新推荐文章于 2025-04-01 14:14:06 发布

阅读量362

点赞数

分类专栏： lecture 文章标签： database mysql

lecture 专栏收录该内容

1 篇文章

订阅专栏

Database Administration

The DBA role:

primarily concerned with "maintenance"/"ops" phase
but should be consulted during all phases of development
"Database Administrator" or "DBA" often framed as a "job" or a "person"
large companies have many DBAs
small company developer is the DBA
DBA role can be made redundant by Cloud-based DBMS or "database as a service" DAAS

Data Administrator(management role):

data policies, procedures and standards
planning
data conflict resolution
managing info repository(data dictionary)
internal marketing(use data to persuade leader to take certain action)
similar to "Chief Data Officer"

Database Administrator(technical role):

analyze and design DB
select DBMS/tools/vendor
install and upgrade DBMS
tune DBMS performance
manage security, privacy, integrity
backup and recovery

Architecture of a Database Management System

Exists as one entity in two places

In memory(when the database is running)
Physically on disk

Both places

Manage Data
Manage Performance(how it performs as it is used and grows)
Manage Concurrency(manages high volumes of users)
Manage Recoverability(assist in recovery and availability)

One place is persistent the other transient

Disk representation is always present
Memory-transient-only exists when DBMS is running

Query Processing

Parsing(compiling):

Syntax is correct & can "compile"
DBMS User Permissions
Available to resources(Data, Code, be able to Record Changes/Retrieve results)

Optimizing:

Execution Plan & Execution Cost
Evaluate indexes, table scans, hashing
Eliminate worst, consider best options
Lowest cost theoretically "best"

Execution:

Meet the ACID test
Atomic: All rows succeed or all fail
Ensure resources are available(data, log changes, memory, cursor to do the work for the USER)

Concurrency Control

Manages the work of the DBMS
Transaction Manager handles all aspects of the SQL transaction - which DBMS user wants WHAT resource
Lock Manager is a list of what resources are locked and by which user at what level
not only tables, indexes(buffers, cursor, memory addresses of resources)
Essential to manage large scalable DBMS
Enables 100,000s of concurrent users

Storage

File & Access Methods:

Disk to memory to disk
read a buffer or a block of buffer

Buffer Pool

Data in memory(row data, index data)
Organized

Disk Space Management:

how to organize growth of data on disk efficiently by writing efficiently

Disk writing level is at buffer level.

Buffer pool:

Many object types(tables, indexes, undo)
Each buffer contains rows, btree leaf etc.

Each buffer can have one of four status types:

Current(in use current committed version of data)
Active(most recent change, may not be committed)
Stale(an old version of the data)
Aged(old and about to be removed from buffer pool)

Log Manage

Recovery

Log Manager records all changes

Statement
Transaction
- Statement
- Rollback values
- Before and After values
- Timestamp begin
Database
- Data Dictionary Changes

Database Performance

What affects database performance?

caching data in memory, e.g. data buffers
placement of data files across disc drives
use of fast storage such as SSD
database replication and server clustering
use of indexes to speed up searches and joins
good choice of data types(especially PKs)
good program logic
good query execution plans
good code(no deadlocks)

Caching Data in Memory:

data and code found in memory
avoids a read
reads are expensive
goal into minimize reads(and writes), writes are necessary(recovery logs, changed data)
"in memory databases", all code all data loaded into memory on db start and stays until shutdown

Data file location & Fast Disks(SSD)

spread the files across the physical server
we can't avoid writes
- spread files across many disks, avoid contention(many users competing for same resource)
- recovery logs, always writing, use faster disk
SSD(Solid state drives)
- no moving parts - nothing to break down
- faster I/O

Distribution & Replication

Distributed data

spreads the load
data kept only where it is needed
less work per physical server - faster response times

Replicated data:

spreads load
less work per physical server - faster response times

When to create indexes:

column is queried frequently(used in WHERE clauses)
columns that are used for joins(PK to FK)
primary keys(automatic in most DBMS)
foreign keys(automatic in MySQL)
unique columns(automatic in most DBMS)
large tables only - small tables do not require indexes
wide range of values(good for regular indexes)
small range of values(good for bitmap/hash indexes)

Good Choices

good data types(integers for PK FK & PFK)
good program logic & code
- Transaction design(BEGIN TRANSACTION, SELECT, UPDATE, COMMIT)
- Avoid long complex transactions that never commit or savepoint
- Avoid coding deadlocks
- Appropriate Locking strategy, consider lock timeouts

Security

Threats to databases

Loss of integrity
- keep data consistent
- free of errors or anomalies
Loss of availability
- must be available to authorized users for authorized purposes
Loss of confidentiality
- must be protected against unauthorized access
To protect databases against these types of threats, different kinds of countermeasures can be implemented:
- access control
- encryption

Access Control

The security mechanism of a DBMS must include provisions for restricting access to data

Access control is handled by the DBA creating user accounts for those with a legitimate need to access the DB

The database keeps track of all operations on the database for all users(usage log)(audit logs)

When tampering is suspected, perform an audit. A database audit consists of reviewing the log to examine all accesses and operations applied to the database during a certain time period

Need to control online and physical access to the database

Types of discretionary privileges

account level, DBA specifies the particular privileges that each user holds regarding the database as a whole, i.e. the operations they can carry out on the database
table level, DBA controls a user's privilege to access particular tables or views
schema level, DBA controls a user's privilege to access a particular schema in the database

Using views

CREATE VIEW vEmployee As
SELECT employeeid, firstname, lastname, departmentid, bossid
FROM employee;
GRANT select on vEmployee to SCOTT;

Encryption

Particular tables or columns may be encrypted to:

protect sensitive data(e.g. password) when they are transmitted over a network. prevents interception by third party
encrypt data in the database(e.g. credit card numbers). provides some protection in case of unauthorized access

Data is encoded using an algorithm, authorized users are given keys to decipher data

Web security

Injection: Injection flaws, such as SQL, NoSQL, OS and LDAP injection, occur when untrusted data is sent to an interpreter as part of a command or query. The attacker's hostile data can trick the interpreter into executing unintended commands or accessing data without proper authorization.

Broken Authentication: Application functions related to authentication and session management are often implemented incorrectly, allowing attackers to compromise passwords, keys, or session tokens, or to exploit other implementation flaws to assume other users' identities temporarily or permanently.

How to prevent injection:

sanitize user inputs
pass inputs as parameters to a stored procedure, rather than directly building the SQL string in the code

Backup and Recovery

A back up is a copy of your data.

If data becomes corrupted or deleted or held to ransom it can be restored from the backup copy.

A backup and recovery strategy is needed to plan how data is backed up and to plan how it will be recovered.

Protect data from human error, hardware or software malfunction, malicious activity, natural or man made disasters and government regulation.

Failures can be divided into the following categories:

Statement failure(Syntactically incorrect)
User Process failure(The process doing the work fails)
Network failure(Network failure between the user and the database)
User error(User accidentally drops the rows, tables, database)
Memory failure(Memory fails, becomes corrupt)
Media failure(Disk failure, corruption, deletion)

Types of backups:

Physical vs Logical
Online vs Offline
Full vs Incremental
Onsite vs Offsite

Physical vs Logical backup

Physical backup:

raw copies of files and directories
suitable for large databases that need fast recovery
database is preferably offline("cold" backup) when backup occurs
- MySQL Enterprise automatically handles file locking, so database is not wholly offline
backup should include logs
backup is only portable to machines with a similar configuration
to restore
- shut down DBMS
- copy backup over current structure on disk
- restart DBMS

Logical backup:

backup completed through SQL queries
slower than physical
- SQL selects rather than OS copy
output is larger than physical
doesn't include log or config files
machine independent
server is available during the backup
in MySQL can use the backup using
- Mysqldump
- SELECT ... INTO OUTFILE
to store
- Use mysqlimport, or LOAD DATA INFILE within the mysql client

Online vs Offline backup

Online(or HOT) backup:

backups occur when the database is "live"
clients don't realize a backup is in progress
need to have appropriate locking to ensure integrity of data

Offline(or COLD) backup:

backups occur when the database is stopped
to maximize availability to users, to backup from replication server not live server
simpler to perform
cold backup is preferable, but not available in all situations: e.g. applications without downtime

Full vs Incremental backup

Full:

a full backup is where the complete database is backed up
it includes everything you need to get the database operational in the event of a failure

Incremental:

only the changes since last backup are backed up
for most databases this means only backup log files
to restore:
- stop the database, copy backed up log files to disk
- start the database and tell it to redo the log files

Create a backup policy

Backup strategy is usually a combination of full and incremental backups. For example, weekly full backup and weekday incremental backup.

Conduct backups when database load is low

If you replicate the database, use the mirror database for backups to negate any performance concerns with the main database.

TEST your backup before you NEED your backup!

Offsite backup

Enables disaster recovery, because backup is not physically near the disaster site.

Example solutions:

backup tapes transported to underground vault
remote mirror database maintained via replication
backup to Cloud

Other ways to reduce risk of data loss

Replication

MySQL Master-Slave replication

one writer and many readers
some protection against server failure
multiple copies of data
replicates accidental data deletion

MySQL Replication（Master与Slave基本原理）

Mysql的复制（replication）是一个异步的复制，从一个Mysql instace（称之为Master）复制到另一个Mysql instance（称之Slave）。实现整个复制操作主要由三个进程完成的，其中两个进程在Slave（Sql进程和IO进程），另外一个进程在 Master（IO进程）上。
要实施复制，首先必须打开Master端的binary log（bin-log）功能，否则无法实现。因为整个复制过程实际上就是Slave从Master端获取该日志然后再在自己身上完全顺序的执行日志中所记录的各种操作。

Replication架构通常由一个master和一个或者多个slaves构成，master接收应用的writes操作（事务中的read、write操作均有master处理），slaves接收read操作。在master上发生的数据变更，都将会复制给slaves，从直观而言，replication架构解决了：1）数据多点备份，提高数据可用性 2）读写分流，提高集群的并发能力。（并非是负载均衡）3）让一些非实时的数据操作，转移到slaves上进行。

Replication具有如下优点：

1）扩展：将负载分布在多个slaves上以提高性能，所有的writes以及事务中的read操作都将有master处理，其他reads将转发给slaves；对于“读写比”较高的应用，replication可以通过增加slaves节点来提高并发能力；因为write只能在master上提交，因此架构扩展对提升write并发能力并不明显，对于writes密集性应用我们应该考虑其他架构。

2）数据安全：slave可以中断自己的replication进程，这不会打断master上的数据请求，所以可以在slave上运行backup服务，定期全量backup是保护数据的手段之一。（如果在master上执行backup，需要让master处于readonly状态，这也意味这所有的write请求需要阻塞）。

3）分析：数据在master上创建，那么数据分析可以在slave上进行，这将不会影响master的性能。利用mysql做数据分析（或者数据分析平台的源数据），通常都是将某个slave作为数据输入端。

4）远距数据分布：如果master的物理位置较远，你可以在临近需求的地方部署slaves，以便就近使用数据，而不需要总是访问远端的master，这在数据分析、数据备份与容灾等方面有很大帮助。

Clusters

Many Writers and Many Readers:

usually Linux/Unix Only
automatic synchronous partition
protection against server failure
multiple copies of data
replicates accidental data deletion

MySQL Cluster支持更大规模的数据，其架构模式和原理也更加复杂。Cluster是一个易于扩展、实时的、ACID兼容的事务性数据库，支持“全分布式”、“多Master”架构，无单点问题；MySQL Cluster可以部署在普通的商业机器上，多节点水平扩展、Server间数据自动sharding和负载均衡，用于服务read、write都密集的应用，可以使用SQL和NOSQL接口访问数据。Cluster的架构思想与Hadoop非常类似，它设计的前提是“认为每个Node都是易于出错的”、集群规模巨大、多租户，所以它提供了数据备份机制、自动迁移、自动Failover等特性，来保证可用性、健壮性。

MySQL Cluster使用了一个插件式的存储引擎，与MySQL 存储引擎（InnoDB、MySAM）架构有很大不同，我们在此不做详细介绍，只需要知道它的核心特性为：数据集并不是存储某个特定的MySQL实例上，而是被分布在多个Data Nodes中，即一个table的数据可能被分散在多个物理节点上，任何数据都会在多个Data Nodes上冗余备份。任何一个数据变更操作，都将在一组Data Nodes上同步（严格意义上的同步，synchronous，二阶段提交？）以保证数据的一致性。

Replication架构简单、易于管理；Fabric是Replicaiton模式的完善和补充，增加了自动Failover和sharding机制，以支撑更大规模的数据访问，减少人工干预；Cluster是一个全分布式架构，是面向大规模数据存储的解决方案。

RAID

RAID attributes:

software or hardware RAID
available on Windows, Linux, Unix
some RAID levels protect against drive failure

磁盘阵列（Redundant Arrays of Independent Disks，RAID），有“独立磁盘构成的具有冗余能力的阵列”之意。
磁盘阵列是由很多价格较便宜的磁盘，组合成一个容量巨大的磁盘组，利用个别磁盘提供数据所产生加成效果提升整个磁盘系统效能。利用这项技术，将数据切割成许多区段，分别存放在各个硬盘上。
磁盘阵列还能利用同位检查（Parity Check）的观念，在数组中任意一个硬盘故障时，仍可读出数据，在数据重构时，将数据经计算后重新置入新硬盘中。

中文参考：https://blog.csdn.net/blackmanren/article/details/53376868