15-445 lecture#1 Relational Model & Relational Algebrae

1 Databases
A database is an organized collection of inter-related data that models some aspect of the real-world (e.g.,modeling the students in a class or a digital music store). People often confuse “databases” with  “database management systems” (e.g., MySQL, Oracle, MongoDB). A database management system (DBMS) is the software that manages a database.

DBMS是管理DB的系统,DB是一组数据

2 Flat File Strawman
Database is stored as comma-separated value (CSV) files that the DBMS manages. Each entity will be stored in its own file. The application has to parse files each time it wants to read or update records. Each entity has its own set of attributes, so in each file, different records are delimited by new lines, while each of the corresponding attributes within a record are delimited by a comma.

将DB以CSV的形式保存为文本文件,应用读写记录时需要自己parse文件的含义

每个record占一行,每个atrr由逗号分开

Issues with Flat File
• Data Integrity
– How do we ensure that the artist is the same for each album entry?
– What if somebody overwrites the album year tith an invalid string?
– How do we store theat there are multiple artists on one album?
• Implementation
– How do you find a particular record?
– What if we now want to create a new application that uses the same database?
– What if two threads try to write to the same file at the same time?
• Durability
– What if the machine crashes while our program is updating a record?
– What if we want to replicate the database on multiple machines for high availability?

↑ 这一段非常重要,以后需要注意DBMS是怎么解决这些问题的

3 Database Management System
A DBMS is a software that allows applications to store and analyze information in a database.
A general-purpose DBMS is designed to allow the definition, creation, querying, updation, and administra-tion of databases.

DBMS的定义,作用

Early DBMSs
Database applications were difficult to build and maintain because there was a tight coupling between logical and physical layers. The logical layer is which entities and attributes the database has while the physical layer is how those entities and attributes are being stored. Early on, the physical layer was defined in the application code, so if we wanted to change the physical layer the application was using, we would have to change all of the code to match the new physical layer.

早期的DB应用没有把逻辑层和物理层分开,耦合在一起,开发很不方便

逻辑层是指DB有哪些entity和attr(?)
物理层则是DB是具体如何存储的

4 Relational Model
Ted Codd noticed that people were rewriting DBMSs every time they wanted to change the physical layer,so in 1970 he proposed the relational model to avoid this. This relational model has three key points:
• Store database in simple data structures (relations).
• Access data through high-level language.
• Physical storage left up to implementation.

在以前,每次人们想要修改DB的物理层,都要重写DBMS,所以提出了关系模型

  • 用relation表示DB的存储
  • 用高级语言来访问数据
  • Physical storage left up to implementation.(?)

A data model is a collection of concepts for describing the data in a database. The relational model is an example of a data model.
A schema is a description of a particular collection of data, using a given data model.

机翻:

数据模型是描述数据库中数据的概念的集合。关系模型是数据模型的一个例子。
模式是使用给定数据模型对特定数据集合的描述。

The relational data model defines three concepts:
• Structure: The definition of relations and their contents. This is the attributes the relations have and the values that those attributes can hold.
• Integrity: Ensure the database’s contents satisfy constraints. An example constraint would be that any value for the year attribute has to be a number.
• Manipulation: How to access and modify a database’s contents.

TODO

A relation is an unordered set that contains the relationship of attributes that represent entities. Since the relationships are unordered, the DBMS can store them in any way it wants, allowing for optimization.
A tuple is a set of attribute values (also known as its domain) in the relation. Originally, values had to be atomic or scalar, but now values can also be lists or nested data structures. Every attribute can be a special value, NULL, which means for a given tuple the attribute is undefined.
A relation with n attributes is called an n-ary relation.

ralation就是指表?tuple指的是表中的一行?

Keys
A relation’s primary key uniquely identifies a single tuple. Some DBMSs automatically create an internal primary key if you do not define one. A lot of DBMSs have support for autogenerated keys so an application does not have to manually increment the keys.
A foreign key specifies that an attribute from one relation has to map to a tuple in another relation.

表中的主键用来唯一确定表中的一个tuple,有的DBMS如果你没有设置主键,会自动创建一个。很多DBMS支持设置自增主键,这样就不需要插入数据的人手动设置

外键TODO

5 Data Manipulation Languages (DMLs)
A language to store and retrieve information from a database. There are two classes of languages for this:
• Procedural: The query specifies the (high-level) strategy the DBMS should use to find the desired result.
• Non-Procedural: The query specifies only what data is wanted and not how to find it.

DML:用来访问DB,一种是命令式,一种是声明式,区别在于后者只需要说明自己需要哪些数据,而前者还需要制定如何去查找这些数据

SQL属于后者

6 Relational Model & Relational Algebra
Relational Algebra is a set of fundamental operations to retrieve and manipulate tuples in a relation. Each operator takes in one or more relations as inputs, and outputs a new relation. To write queries we can “chain” these operators together to create more complex operations.

关系代数是一组用来从relation中存取数据的运算符

Select
Select takes in a relation and outputs a subset of the tuples from that relation that satisfy a selection predicate.
The predicate acts like a filter, and we can combine multiple predicates using conjunctions and disjunctions.
Syntax: σ predicate (R).

选出所有符合条件的tuple

Projection
Projection takes in a relation and outputs a relation with tuples that contain only specifed attributes. You can rearrange the ordering of the attributes in the input relation as well as manipulate the values.
Syntax: π A1,A2,. . . ,An (R).

从relation中选出自己关注的attr(仍然是所有的tuple都在内,但是只返回给定的attr)

Union
Union takes in two relations and outputs a relation that contains all tuples that appear in at least one of the input relations. Note: The two input relations have to have the exact same atttributes.
Syntax: (R ∪ S).

union需要两个relation的attr相同

union的时候,合并两个relation的tuple,不过有一点不解,搜的时候说是要去掉重复的,但是不知道为什么本课的slides中没有去重

Intersection
Intersection takes in two relations and outputs a relation that contains all tuples that appear both of the input relations. Note: The two input relations have to have the exact same atttributes.
Syntax: (R ∩ S).

和上面相同,两个relation需要有相同的attr

取共同的tuple

Difference
Difference takes in two relations and outputs a relation that contains all tuples that appear in the first relation but not the second relation. Note: The two input relations have to have the exact same atttributes.
Syntax: (R − S).

取在R中,不在S中的

Product
Product takes in two relations and outputs a relation that contains all possible combinations for tuples from the input relations.
Syntax: (R × S).

将两个relation中的tuple进行组合,全部的组合

Join
Join takes in two relations and outputs a relation that contains all the tuples that are a combination of two tuples where for each attribute that the two relations share, the values for that attrubite of both tuples is the same.
Syntax: (R ./ S).

如果共同的attr具有相同的值,就可以进行连接

Observation
Relational algebra is a procedural language because it defines the high level-steps of how to compute a query. For example, σ b id=102 (R ./ S) is saying to first do the join of R and S and then do the select,
whereas (R ./ (σ b id=102 (S))) will do the select on S first, and then do the join. These two statements will actually produce the same answer, but if there is only 1 tuple in S with b id=102 out of a billion tuples, then
(R ./ (σ b id=102 (S))) will be significantly faster than σ b id=102 (R ./ S).
A better approach is to say the result you want, and let the DBMS decide the steps it wants to take to compute
the query. SQL will do exactly this, and it is the de facto standard for writing queries on relational model
databases.

关系代数是过程式的,因为写关系代数的时候,写明了查询数据的方式

SQL不一样

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1 目标检测的定义 目标检测(Object Detection)的任务是找出图像中所有感兴趣的目标(物体),确定它们的类别和位置,是计算机视觉领域的核心问题之一。由于各类物体有不同的外观、形状和姿态,加上成像时光照、遮挡等因素的干扰,目标检测一直是计算机视觉领域最具有挑战性的问题。 目标检测任务可分为两个关键的子任务,目标定位和目标分类。首先检测图像中目标的位置(目标定位),然后给出每个目标的具体类别(目标分类)。输出结果是一个边界框(称为Bounding-box,一般形式为(x1,y1,x2,y2),表示框的左上角坐标和右下角坐标),一个置信度分数(Confidence Score),表示边界框中是否包含检测对象的概率和各个类别的概率(首先得到类别概率,经过Softmax可得到类别标签)。 1.1 Two stage方法 目前主流的基于深度学习的目标检测算法主要分为两类:Two stage和One stage。Two stage方法将目标检测过程分为两个阶段。第一个阶段是 Region Proposal 生成阶段,主要用于生成潜在的目标候选框(Bounding-box proposals)。这个阶段通常使用卷积神经网络(CNN)从输入图像中提取特征,然后通过一些技巧(如选择性搜索)来生成候选框。第二个阶段是分类和位置精修阶段,将第一个阶段生成的候选框输入到另一个 CNN 中进行分类,并根据分类结果对候选框的位置进行微调。Two stage 方法的优点是准确度较高,缺点是速度相对较慢。 常见Tow stage目标检测算法有:R-CNN系列、SPPNet等。 1.2 One stage方法 One stage方法直接利用模型提取特征值,并利用这些特征值进行目标的分类和定位,不需要生成Region Proposal。这种方法的优点是速度快,因为省略了Region Proposal生成的过程。One stage方法的缺点是准确度相对较低,因为它没有对潜在的目标进行预先筛选。 常见的One stage目标检测算法有:YOLO系列、SSD系列和RetinaNet等。 2 常见名词解释 2.1 NMS(Non-Maximum Suppression) 目标检测模型一般会给出目标的多个预测边界框,对成百上千的预测边界框都进行调整肯定是不可行的,需要对这些结果先进行一个大体的挑选。NMS称为非极大值抑制,作用是从众多预测边界框中挑选出最具代表性的结果,这样可以加快算法效率,其主要流程如下: 设定一个置信度分数阈值,将置信度分数小于阈值的直接过滤掉 将剩下框的置信度分数从大到小排序,选中值最大的框 遍历其余的框,如果和当前框的重叠面积(IOU)大于设定的阈值(一般为0.7),就将框删除(超过设定阈值,认为两个框的里面的物体属于同一个类别) 从未处理的框中继续选一个置信度分数最大的,重复上述过程,直至所有框处理完毕 2.2 IoU(Intersection over Union) 定义了两个边界框的重叠度,当预测边界框和真实边界框差异很小时,或重叠度很大时,表示模型产生的预测边界框很准确。边界框A、B的IOU计算公式为: 2.3 mAP(mean Average Precision) mAP即均值平均精度,是评估目标检测模型效果的最重要指标,这个值介于0到1之间,且越大越好。mAP是AP(Average Precision)的平均值,那么首先需要了解AP的概念。想要了解AP的概念,还要首先了解目标检测中Precision和Recall的概念。 首先我们设置置信度阈值(Confidence Threshold)和IoU阈值(一般设置为0.5,也会衡量0.75以及0.9的mAP值): 当一个预测边界框被认为是True Positive(TP)时,需要同时满足下面三个条件: Confidence Score > Confidence Threshold 预测类别匹配真实值(Ground truth)的类别 预测边界框的IoU大于设定的IoU阈值 不满足条件2或条件3,则认为是False Positive(FP)。当对应同一个真值有多个预测结果时,只有最高置信度分数的预测结果被认为是True Positive,其余被认为是False Positive。 Precision和Recall的概念如下图所示: Precision表示TP与预测边界框数量的比值 Recall表示TP与真实边界框数量的比值 改变不同的置信度阈值,可以获得多组Precision和Recall,Recall放X轴,Precision放Y轴,可以画出一个Precision-Recall曲线,简称P-R
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值