Databases and SQL for Data Science课程笔记-第一周

本课程教授SQL语言基础,以及关系数据库模型

  • Describe SQL, data, database, relational database
  • 列举5个SQL basic commands

Terms: SQL, Data, Database, Relational, RDBMS

SQL (Structured Query Language):

Used for accessing data in relational databases

关系型数据库

Data stored in tabular form is a relational database.

关系型数据库管理系统

RDBMS=Relational database management system: a set of software tools that controls the data: access, organization and storage

5 basic SQL queries: CREATE, INSERT, SELECT, UPDATE, DELETE

CREATE TABLE and SELECT statement

  • 区分Data Definition Language statements and Data Manipulation Language statements
  • 如何使用Entity name和attributes来构建关系型数据库表格

DDL & DML

DDL: Data Definition Language. for define, change or drop data
e.g. CREATE Table: DDL which is used for defining structure of the database objects

create table TEST (
    ID integer,
    NAME varchar(30)
    );
    
// drop a table and create a new one
drop table COUNTRY;
create table COUNTRY (
    ID integer PRIMARY KEY NOT NULL,
    CCODE char(2),
    NAME varchar(60)
    );
-- Examples to CREATE and DROP tables

create table TABLENAME (
    COLUMN1 datatype,
    COLUMN2 datatype,
    COLUMN3 datatype,
        ...
    ) ;

create table TEST (
    ID integer,
    NAME varchar(30)
    );

create table COUNTRY (
    ID int NOT NULL,
    CCODE char(2),
    NAME varchar(60),
    PRIMARY KEY (ID)
    );

drop table COUNTRY;
create table COUNTRY (
    ID integer PRIMARY KEY NOT NULL,
    CCODE char(2),
    NAME varchar(60)
    );

DML: Data Manipulation Language. for read and modify data
e.g. SELECT: The output of a select statement is called the Result Set or a Result Table.
INSERT, UPDATE, DELETE

select * from COUNTRY where ID < 5 ;
select * from COUNTRY where CCODE = 'CA'; 
COUNT, DISTINCT, LIMIT built-in functions with SELECT

COUNT retrieves specific rows
DISTINCT is used to remove duplicate values
LIMIT is used for restricting the number of rows retrieved from the database

INSERT, UPDATE, DELETE

INSERT: add rows to the table
UPDATE: change one or more rows
DELETE: remove one or more rows

Relational model & databases

Information and data models

Difference between information and data models

An information model is an abstract formal representation of entities that includes their properties, relationships, and the operations that can be performed on them. The entities being modeled can be from the real world, such as a library.
在这里插入图片描述
Information models and data models are different and serve different purposes.
An information model is at the conceptual level and defines relationships between objects. Data models are defined in a more concrete level, are specific, and include details.
A data model is the blueprint of any database system.

Types of information models:

在这里插入图片描述
The most familiar is the hierarchical, typically used to show organization charts.
The hierarchical model organizes its data using a tree structure. The root of the tree is the parent node followed by child nodes. A child node cannot have more than one parent. However, a parent can have many child nodes.
在这里插入图片描述

the advantage of relational model

The relational model is the most used data model for databases, because this model allows for data independence. Data is stored in simple data structure: tables. This provides logical data independence, physical data independence, and physical storage independence. An entity-relationship data model or ER data model, is an alternative to a relational data model. Using a simplified library database as an example, this figure shows an Entity Relationship Diagram or ERD, that represents entities called tables and their relationships.
We have authors who write books, borrowers who take books out on loan, various copies of each book etc. This is the final ER diagram.
An entity-relationship model proposes thinking of a database as a collection of entities, rather than being used as a model on its own. The ER model is used as a tool to design relational databases. In the ER model, entities are objects that exist independently of any other entities in the database. It is simple to convert an ER diagram into a collection of tables.

The building blocks of an ER diagram are entities and attributes.

Entities have attributes, which are the data elements that characterize the entity.
Attributes tell us more about the entity.
In an ER diagram, an entity is drawn as a rectangle, and attributes are drawn as ovals. Entities can be a noun, person, place, or thing.
在这里插入图片描述
Using a simplified library as an example, a book is an example of an entity. Attributes are certain properties of characteristics of an entity and tell us more about the entity. The Entity book has attributes such as the book title, the addition of the book, the year the book was written etc. Attributes are connected to exactly one entity. The entity book becomes a table in the database, and the attributes become the columns in a table.
在这里插入图片描述
Continuing the simplified library example, books are written by authors, book is an entity, an author is an entity. For the entity author, the ER diagram would look like this. The entity author has attributes, such as the author’s last name, first name, email, city, country, and an author ID to uniquely identify the author. The entity author becomes a table in the database, and the attributes become the columns in the table.
在这里插入图片描述
In the simplified library database, we progress through the process of identifying entities, such as borrowers who take books out on loan, various copies of each book, and copies of books out on loan. This is the final ER Diagram. Each entity becomes a table in the database.

Types of relationships

the building blocks of a relationship

, explain the symbols used in a relationship set, and describe the difference between the one-to-one and many-to-many relationship.
在这里插入图片描述
The building blocks of a relationship are: entities, relationship sets, and crows foot notations.

the symbols used in a relationship set

Entity sets are represented by a rectangle.
Relationship sets are represented by a diamond, with lines connecting associated entities.
Different techniques are used in representing relationships. For ease of understanding, this lesson uses the crows foot notations. Some of these are the greater than symbol, the less than symbol and a vertical line.

three kinds of relationships between entities

在这里插入图片描述
The thick lines indicate each entity in the entity set is involved in at least one and exactly one relationship. This is called a one-to-one relationship. Only entities are used in the relationship diagrams. Attributes are omitted because they can clutter the diagrams.

在这里插入图片描述

More than one author can write a book. This can be represented with the different notation called crow’s foot notation. In this case a less than symbol. This indicates that one book entity is participating in more than one relationship in the relationship set. This is called a one-to-many relationship. This could also be called a many-to-one relationship, in that many authors write a single book.
在这里插入图片描述
To represent many authors writing many books use the greater than and less than symbols on either side of the relationship set. This is called a many-to-many relationship. Each entity in the entity set is participating in more than one relationship. Many books being written by many authors, or many authors writing many books.

Mapping Entities to Tables将实体关系图映射到表

explain how an ER Diagram translates into a relational database table with rows and columns转换成关系型数据库中以行和列组成的表格
Entity relationship diagrams are the basic foundations for designing a database. In the relational database design, begin with an ER Diagram, or ERD and later map the ERD to the tables in the database. In this example, we use the ERD for entity book.

在这里插入图片描述
我们把实体和属性分开,把实体变成表格,属性变成columns列。
In this case, entity book becomes a table with the same name, book. All the attributes translate into columns in the table. We will now see how a table would be represented in a relational database model.
在这里插入图片描述

Relational Model Concepts

the history of the relational model

The relational model was first proposed in 1970 and is based on a mathematical model and mathematical terms.

relational terms: relation, degree and cardinality

在这里插入图片描述
The building blocks of the relational model are relation and sets. The relational model of data is based on the concept of relation. A relation is a mathematical concept based on the idea of sets.
A set is an unordered collection of distinct elements. It is a collection of items of the same type. It would have no order and no duplicates.
关系就代表表格。
schema不包括行,只有列名attributes。
instance就是表格的行cardinality与列degree。
A relational database is a set of relations.
A relation is also the mathematical term for a table. A table is a combination of rows and columns.

the difference between a relational schema and a relational instance

A relation is made up of two parts, relational schema and relational instance.
在这里插入图片描述
A relational schema specifies the name of a relation and the attributes. The example here is the entity author. Author is the name of the relation. Author_ ID is an attribute which can hold the data type char, which is a character string of a fixed length. Likewise, lastname, firstname, email and city have the data type varchar, which is a character string of a variable length. The last attribute country, also has a data type of char. This constitutes the relational schema.
在这里插入图片描述
A relational instance is a table made up of the attributes or columns and the tuples or rows. The columns are the attributes or fields. The rows are tuples. 在这里插入图片描述
Degree refers the number of attributes or columns in a relation. Cardinality refers to the number of tuples or rows. In this example, the degree is six because there are six columns, and the cardinality is five because there are five tuples or rows.

在这里插入图片描述
A relation is a mathematical concept based on idea of sets and is the mathematical term for a table.
A relation is made up of two parts: relational schema and relational instance.
A relational schema specifies the name of a relation and the attributes.
A relational instance is a table made up of the attributes or columns and the tuples or rows. Degree refers to the number of attributes or columns in a relation.
Cardinality refers to the number of tuples or rows in a relation.

Additional Information

difference between schema and table

A (database) schema is the formal description of the organization and the structure of data in the database. This description includes the definitions of tables, columns, data types, indexes and much more. In a database, a table is a data set in which the data is organized in to set of vertical columns and horizontal rows. Number of columns in a table is specified in the database schema, but it can hold any number of rows. Tables also contain information such as constraints on the values in the columns and this information are called meta-information.

Create Schema

A SQL schema is identified by a schema name, and includes a authorization identifier to indicate the user or account who owns the schema. Schema elements include tables, constraints, views, domains and other constructs that describe the schema.

A schema is created using the CREATE SCHEMA statement. For example, we can create a schema called LIBRARY for this course:

CREATE SCHEMA LIBRARY AUTHORIZATION ‘Robert’

The data types used can be: numeric, character-string, bit-string, Boolean, DATE, timestamp, etc.

CREATE TABLE Statement
The CREATE TABLE statement includes these clauses:

· DEFAULT

· CHECK

Use the DEFAULT clause in the CREATE TABLE statement to specify the default value for the database server to insert into a column when no explicit value for the column is specified.

Use the CHECK clause to designate conditions that must be met before data can be assigned to a column during an INSERT or UPDATE statement.

During an insert or update, if the check constraint of a row evaluates to false, the database server returns an error. The database server does not return an error if a row evaluates to NULL for a check constraint. In some cases, you might want to use both a check constraint and a NOT NULL constraint.

SELECT Statement
The basic structure of the SELECT statement is formed from three clauses: SELECT, FROM and WHERE.

is a list of attribute names whose values are to be retrieved by the query

is a list of the relation names required to process the query

is a conditional(Boolean) expression that identifies the tuples to be retrieved by the query

In situations where you might want to use multiple IF-THEN-ELSE statements, you can often use a single SELECT statement instead. The SELECT statement allows a CLIST to select actions from a list of possible actions. An action consists of one or more statements or commands. The SELECT statement has the following syntax, ending with the END statement. You can use the SELECT statement with or without the initial test expression.

SELECT [test expression]

WHEN [expression1]

(action)

WHEN [expression2]

WHEN [expression3]

[OTHERWISE]

(action)

END

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
R is one of the most popular, powerful data analytics languages and environments in use by data scientists. Actionable business data is often stored in Relational Database Management Systems (RDBMS), and one of the most widely used RDBMS is Microsoft SQL Server. Much more than a database server, it’s a rich ecostructure with advanced analytic capabilities. Microsoft SQL Server R Services combines these environments, allowing direct interaction between the data on the RDBMS and the R language, all while preserving the security and safety the RDBMS contains. In this book, you’ll learn how Microsoft has combined these two environments, how a data scientist can use this new capability, and practical, hands-on examples of using SQL Server R Services to create real-world solutions. How this book is organized This book breaks down into three primary sections: an introduction to the SQL Server R Services and SQL Server in general, a description and explanation of how a data scientist works in this new environment (useful, given that many data scientists work in “silos,” and this new way of working brings them in to the business development process), and practical, hands-on examples of working through real-world solutions. The reader can either review the examples, or work through them with the chapters. Who this book is for The intended audience for this book is technical—specifically, the data scientist—and is assumed to be familiar with the R language and environment. We do, however, introduce data science and the R language briefly, with many resources for the reader to go learn those disciplines, as well, which puts this book within the reach of database administrators, developers, and other data professionals. Although we do not cover the totality of SQL Server in this book, references are provided and some concepts are explained in case you are not familiar with SQL Server, as is often the case with data scientists.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值