计算机专业英语教程（第二版）Chapter 4 Database Fundamentals

最新推荐文章于 2024-06-11 10:00:47 发布

weixin_30378623

最新推荐文章于 2024-06-11 10:00:47 发布

阅读量2.5k

点赞数

文章标签：数据库 java 开发工具

原文链接：http://www.cnblogs.com/kangzhibao/archive/2013/06/06/3120877.html

版权

这里的英语文章是《计算机专业英语教程第二版》，为了学习英语手打在这里。对版权不熟悉，完全为了学习的目的——算是免责声明吧（如果有的话）

Chapter 4 Database Fundamentals

4.1 Introduction to DBMS

　　A database management system(DBMS) is an important type of programming system,used today on the biggest and the smallest computers.As for other major forms of system software,such as compilers and operatiing systems,a well-understood set of principles for database management systems has developed over the years,and these concepts are userful both for understanding how to use these systems effectively and for designing and implementing DBMS's.DBMS is a collection of programs that enables you to store,modify,and extract information from a database.Threre are many different types of DBMS's,ranging from small systems that run on personal computers to huge systems that run on mainframes.The following are the location of database between application programs and end-users.

The Capabilities of DBMS

　　There are two qualities that distinguish database management systems from other sorts of programming systems.

　　1) The ability to manage persistent data,and

　　2) The ability to access large amounts of data efficiently.

　　Point 1) merely states that there is a database which exists permanently;the contents of this database is the data that a DBMS accesses and manages.

　　Point 2) distinguishes a DBMS from a file system,which also manages persistent data.A DBMS's capabilities are needed most when the amount of data is very large, because for small amounts of data,simple access techniques,such as linear scans of the data,are usually adequate.

　　While we regard the above two properties of a DBMS as fundamental,there are a number of other capabilities that are almost universally found in commercial DBMS's.There are:

　　*　　Support for at least one data model,or mathematical abstraction through which the user can view the data.

　　*　　Support for certain high-level languages that allow the user to define the structure of data,access data,and manipulate data。/*allow the user to define...access....manipulate,allow sb. do*/

　　*　　Transaction management,the capability to provide correct,concurrent access to the database by many users at once.

　　*　　Access control,the ability to limit access to data by unauthorized users,and the ability to check the validity of data.

　　* 　　Resiliency,the ability to recover from system failures without losing data.

　　Data Models——Each DBMS provides at least one abstract model of data that allows the user to see information not as raw bits,but in more understandable terms.In fact,it is usually possible to see data at several levels of abstraction.At a relatively low level,a DBMS commonly allows us to visualize data as composed of files.

　　Efficient File Access——The ability to store a file is not remarkable: the file system associated with any operating system does that.The capability of a DBMS is seen when we access the data of a file. For example, suppose we wish to find the manager of employee "Clark Kent". If the company has thousands of employees.It is very expensive to search the entire file to find the one with NAME="Clark Kent". A DBMS helps us to set up "index files," or "indices," that allow us to access the record for "Clark Kent" in essentially one stroke no matter how large file is. Likewise, insertion of new records or deletion of old ones can be accomplished in time that is small and essentially constant,independent of the file's length. Another thing a DBMS helps us to do is navigate among files, that is, to combine values in two or more files to obtain the information we want.

　　Query Languages——To make access to files easier, A DBMS provides a query language, or data manipulation language, to express operations on files. Query languages differ in the level of detail they require of the user, with systems based on the relational data model generally requiring less detail than languages based on other models.

　　Transaction Management——Another important capability of a DBMS is the ability to manage simultaneously large numbers of transactions, which are procedures operating on the database. Some database are so large that they can only be useful if they are operated upon simultaneously by many computer: often these computers are dispersed around the country or the world. The database systems used by banks, accessed almost instantaneously by hundreds or thousands of automated teller machines (ATM),as well as by an equal or greater number of employees in the bank branches, is typical of this sort of database. An airline reservation system is another good example.

　　Sometimes, two accesses do not interfere with each other. For example, any number of transactions can be reading your bank balance at the same time,without any inconsistency. But if you are in the bank depositing your salary check at the exact instant your spouse is extracting money from an automatic teller, the result of the two transactions occurring simultaneously and without coordination is unpredictable. Thus,transactions that modify a data item must "lock out" other transactions trying to read or write that item at the same time. A DBMS must therefore provide some form of concurrency control to prevent uncoordinated access to the same data item by more than one transaction.

　　Even more complex problems occur when the database is distributed over many different computer systems, perhaps with duplication of data to allow both faster local access and to protect against the destruction of data if one computer crashed.

　　Security of Data—— A DBMS must not only protect against loss of data when crashes occur, as we just mentioned, but it must prevent unauthorized access. For example, only users with a certain clearance should have access to the salary field of an employee file, and the DBMS must be able associate with the various users their privileges to see files,fields within files, or other subsets of the data in the database. Thus a DBMS must maintain a table telling for each user known to it/*tell sb. known to sth. ???*/,what access privileges the user has for each object. For example, one user may be allowed to read file, but not to insert or delete data; another may not be allowed to see the file at all, while a third may be allowed to read or modify the file at will.

DBMS Types

　　Designer developed three different types of database structures: hierarchical, network, and relational. Hierarchical and network were first developed but relational has become dominant. While the relational design is dominant, the older databases have not been dropped. Companies that installed a hierarchical system such as IMS in the 1970s will be using and maintaining these databases for years to come even though new development is being done on relational system. These older system are often referred to as legacy systems.

4.1 Reading Material

PostgreSQL

　　PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database system much later.

　　Features—— PostgreSQL is an open-source descendant of this original Berkeley code. It supports SQL92 and SQL99 and offers many modern features:

　　　　*　　complex queries

　　　　*　　foreign keys

　　　　*　　triggers

　　　　*　　views

　　　　*　　transactional integrity

　　　　*　　multiversion concurrency control

　　　　Additionally,PostgreSQL can be extended by the user in many ways, for example, by adding new:

　　　　*　　data types

　　　　*　　functions

　　　　*　　operations

　　　　*　　aggregate functions

　　　　*　　index methods

　　　　*　　procedural languages

　　　　And because of the liberal license, PostgreSQL can be used, modified, and distributed by everyone free of charge for any purpose, be it private, commercial, or academic.

　　Advantages—— PostgreSQL offers many advantages for your company or business over other database systems.

　　1) Immunity to over-deployment

　　Over-deployment is what some proprietary database vendors regard as their #1 licence compliance problem. With PostgreSQL, no-one can sue you for breaking licensing agreement, as there is no associated licensing cost for the software.

　　This has several additional advantages:

　　*　　More profitable business models with wide-scale deployment.

　　*　　No possibility of being audited for license compliance at any stage.

　　*　　Flexibility to do concept research and trial deployments without needing to include additional licensing costs.

　　2) Better support than the proprietary vendors

　　In addition to our strong support offerings, we have a vibrant community of PostgrSQL professionals and enthusiasts that your staff can draw upon and contribute to.

　　3) Significant saving on staffing costs

　　Our software has been designed and creadted to have much lower maintenance and tuning requirements than the leading proprietary databases, yet still retain all of the features, stability, and performance.

　　In addition to this our training programs are generally regarded as being far more cost effective, manageable, and practical in the real world than that of the leading proprietary database vendors.

　　4) Legendary reliability and stability

　　Unlike many proprietary database, it is extremely common for companies to report that PostgreSQL has never, ever crashed for them in several years of high activity operation. Not even once. It just works.

　　5) Extensible

　　The source code is available to all at no charge. If your staff have a need to customise or extend PostgreSQL in any way then they are able to do so with a minimum of effort, and with no attached costs. This is complemented by the community of PostgreSQL professionals and enthusiasts around the globe that also actively extend PostgreSQL on a daily basis.

　　6) Cross platform

　　PostgreSQL is a available for almost every brand of Unix (34 platform with the latest stable release), and Windows compatibility is available via the Cygwin framework. Native Windows compatibility is also available with version 8.0 and above.

　　7) Designed for high volume environments

　　We use a multiple row data storage strategy called MVCC to make PostgreSQL extremely responsive in high volume environments. The leading proprietary database vendor uses this technology as well, for the same reasons.

　　8) GUI database design and administration tools

　　Several high quality GUI tools exist to both administer the database (pgAdmin,pgAccess) and do database design (Tora, Data Architect).

4.2 Structure of the Relational Database

　　The relational model is the basis for any relational database management system (RDBMS). A relational model has three core components: a collection of objects or relations, operators that act on the objects or relations, and data integrity methods. In other words, it has a place to store the data, a way to create and retrieve the data, and a way to make sure that the data is logically consistent.

　　A relational database uses relations, or two-dimensional tables, to store the information needed to support a business. Let's go over the basic components of a traditional relational database system and look at how a relational database is designed. Once you have a solid understanding of what rows, columns, tables, and relationships are, you'll be well on your way to leveraging the power of a relational database.

Tables, Rows, and Columns

　　A table in a relational database, alternatively known as a relation, is a two-dimensional structure used to hold related information. A database consists of one or more related tables.

　　Note: Don't confuse a relation with relationships. A relation is essentially a table, and a relationship is a way to correlate, join, or associate two tables.

　　A row in a table is a collection or instance of one thing, such as one employee or one line item on a invoice. A column contains all the information of a single type, and the piece of data at the intersection of a row and a column, a field, is the smallest piece of information that can be retrieved with the database's query language. For example, a table with information about employees might have a column called LAST_NAME that contains all of the employees' last names. Data is retrieved from a table by filtering on both the row and the column.

Primary Keys, Datatypes, and Foreign Keys

　　The examples throughout this article will focus on the hypothetical work of Scoot Smith, database developer and entrepreneur. He just started a new widget company and wants to implement a few of the basic business functions using the relational database to manage his Human Resources (HR) department.

　　Relation: A two-dimensional structure used to hold related information, also known as a table.

　　Note: Most of Scott's employees were hired away from one of his previous employers, some of whom have over 20 years of experience in the field. As a hiring incentive, Scott has agreed to keep the new employee' original hire date in the new database.

　　Row: A group of one or more data elements in a database table that describes a person, place, or thing.

　　Column: The component of a database table that contains all of the data of the same name and type across all rows.

　　You'll learn about database design in the following sections, but let's assume for the moment that the majority of the database design is completed and some tables need to be implemented. Scott creates the EMP table to hold the basic employee information, and it looks something like this:

EMPNO	ENAME	JOB	MGR	HIREDATE	SAL	COMM	DEPTNO
7369	SMITH	CLERK	7902	17-DEC-80	800		20
7499	ALLEN	SALEMAN	7698	20-FEB-81	1600	300	30
7521	WARD	SALEMAN	7898	22-FEB-81	1250	500	30
7566	JONES	MANAGER	7839	02-APR-81	2975		20
7839	KING	PRESIDENT		17-NOV-81	5000		10
7902	FORD	ANALYST	7566	03-DEC-81	3000		20

　　Notice that some fields in the Commission (COMM) and Manager (MGR) columns do not contain a value; they are blank. A relational database can enforce the rule that fields in a column may or may not be empty. In this case, it makes sense for an employee who is not in the Sales department to have a blank Commission field. It also makes sense for the president of the company to have blank Manager field, since that employee doesn't report to anyone./* make sense for sb to do sth.????*/

　　Field: The smallest piece of information that can be retrieved by the database query language. A field is found at the intersection of a row and a column in a database table.

　　On the other hand, none of the fields in the Emplyee Number (EMPNO) column are blank. The company always wants to assign an emplyee number to an employee, and that number must be different for each employee. One of the features of a relational database is that it can ensure that a value is entered into this column and that it is unique. The EMPNO column, in this case, is the primary key of the table.

　　Primary Key: A column (or columns) in a table that makes the row in the table distinguishable from every other row in the same table.

　　Notice the different datatypes that are stored in the EMP table:numberic values, character or alphabetic values, and date values.

　　As you might suspect, the DEPTNO column contains the department number for the employee. But how do you know what department name is associated with what number? Scott created the DEPT table to hold the descriptions for the department codes in the EMP table.

DEPTNO	DNAME	LOC
10	ACCOUNTING	NEW YORK
20	RESEARCH	DALLAS
30	SALES	CHICAGO
40	OPERATIONS	BOSTON

　　The DEPTNO column in the EMP table contains the same values as the DEPTNO column in the DEPT table. In this case, The DEPTNO column in the EMP table is considered a foreign key to the same column in the DEPT table.

　　A foreign key enforces the concept of referential integrity in a relational database. The concept of referential integrity not only prevents an invalid department number from being inserted into the EMP table, but it also prevents a row in the DEPT table from being deleted if there are employees still assigned to that department.

　　Foreigh Key: A column (or columns) in a table that draws its values from a primary or unique key column in another table. A foreign key assists in ensuring the data integrity of a table.

　　Referential Integrity: A method employed by a relational database system that enforces one-to-many relationships between tables.

Data Modeling

　　Before Scott created the actual tables in the database, he went through a design process known as data modeling. In this process, the developer conceptualizes and documents all the tables for the database. One of the common methods for modeling a database is called ERA, which stands for entities, relationships, and attributes. The database designer uses an application that can maintain entities, their attributes, and their relationships. In general, an entity corresponds to a table in the database, and the attributes of the entity correspond to columns of the table.

　　Data Modeling: A process of defining the entities, attributes, and relationships between the entities in preparation for creating the physical database.

　　The data-modeling process involves defining the entities, defining the relationships between those entities, and then defining the attributes for each of the entities. Once a cycle is complete, it is repeated as many times as necessary to ensure that designer is capturing what is important enough to go into the database. Let's take a closer look at each step in the data-modeling process.　　

Defining the Entities

　　First, the designer identifies all of the entities within the scope of the database application. The entities are the persons, places, or things that are important to the organization and need to be tracked in the database. Entities will most likely translate neatly to database tables. For example, for the first version of Scott's widget company database, he identifies four entities: employees, departments, salary grades, and bonuses. These will become the EMP, DEPT, SALGRADE, and BONUS tables.

Defining the Relationships Between Entities

　　Once the entities are defined, the designer can proceed with defining how each of the entities is related. Often, the designer will pair each entity with every other entity and ask, "Is there a relationship between two entities?" Some relationships are obvious; some are not.

　　In the widget company database, there is most likely a relationship between EMP and DEPT, but depending on the business rules, it is unlikely that the DEPT and SALGRADE entities are related. If the business rules were to restrict certain salary grades to certain departments,there would most likely be a new entity that defines the relationship between salary grades and departments. This entity would be known as an associative or intersection table and would contain the valid combinations of salary grades and departments.

　　Associative Table: A database table that stores the valid combinations of rows from two other tables and usually enforces a business rule. An associative table resolves a many-to-many relationship.

　　In general, there are three types of relationships in a relational database:

　　* 　　One-to-many　　The most common type of relationship is on-to-many. This means that for each occurrence in a given entity, the parent entity, there my be one or more occurrences in a second entity, the child entity, to which it is related. For example, in the widget company database, the DEPT entity is a parent entity, and for each department, there could be one or more employees associated whith that department. The relationship between DEPT and EMP is one-to-many.

　　*　　One-to-one　　In a one-to-one relationship, a row in a table is related to only one or none of the rows in a second table. This relationship type is often for subtyping. For example, an EMPLOYEE table may hold the information common to all employees, while the FULLTIME, PARTTIME, and CONTRACTOR tables hold information unique to full-time employees, part-time employee, and contractors, respectively. These entities would be considered subtypes of an EMPLOYEE and maintain a one-to-one relationship with the EMPLOYEE table. These relationships are not as common as one-to-many relationships, because if one entity has an occurrence for a corresponding row in another entity, in most cases, the attributes from both entities should be in a single entity.

　　*　　Many-to-many　　In a many-to-many relationship, one row of a table may be related to many rows of another table, and vice versa. Usually, when this relationship is implemented in the database, a third entity is defined as an intersection table to contain the associations between the two entities in the relationship. For example, in a database used for school class enrollment, the STUDENT table has a many-to-many relationship with the CLASS table--one student may take one or more classes, and a given class may have one or more students. The intersection table STUDENT_CLASS would contain the combinations of STUDENT and CLASS to track which students are in which classes.

Assigning Attributes to Entities

　　Once the designer has defined the entity relationships, the next step is to assign the attributes to each entity. This is physically implemented using columns, as shown here for the SALGRADE table as derived from the salary grade entity.

GRADE	LOSAL	HISAL
1	700	1200
2	1201	1400
3	1401	200

Iterate the Process: Are We There Yet?

　　After the entities, relationships, and attributes have been defined, the designer may iterate the data modeling many more times. When reviewing relationships, new entities may be discovered. For example, when discussing the widget inventory table and its relationship to a customer order, the need for a shipping restrictions table may arise.

　　Once the disign process is complete, the physical database tables may be created. Logical database design sessions should not involve physical implementation issues, but once the design has gone through an iteration or two, it's the DBA's job to bring the designers "down to earth". As a result, the design may need to be revisited to balance the ideal database implementation versus the realities of budgets and schedules.

4.2.1 Reading Material

National Geochemical Database

　　The broad objective of this project is to maintain and enhance the National Geochemical Database (NGDB). The NGDB consists of 1) the original RASS and PLUTO data from the USGS labs, which are now stored in a common format under the ORACLE relational database management system; 2) the NURE data, which have been reformatted and reside currently on the following web site: http://pubs.usgs.gov/of/1997/ofr-97-0492/ where downloads may be made on the basis of 1:250,000-scale quadrangles; and 3) the newly generated data (approximately 1996-present) which reside on the Laboratory Information Management System. The enhancements to the NGDB will enable both USGS scientists and external customers to more easily extract immediately useable data on a national, regional, and local scale to help establish a baseline for the abundance and spatial distribution of chemical elements in the Earth's surficial materials.Specific short-term objective include:

　　Linking the newly developed ORACLE-based database to the Laboratory Information Manangement System (LIMS) to provide for the smooth transfer of newly generated data from the LIMS to the NGDB.

　　Implement the new Sample Submittal Information procedure on a nationwide basis throughout the USGS. This procedure has only been implemented at this time (Juen 2002) in the Central Region. Without this new system in place, it is possible that more errors and omissions regarding the nature and location of sample may be generated.

　　Complete the re-formatting of the NURE HSSR database based on 1:250,000-scale quadrangles, compile the quadrangle-based data into one large data set, and provide these data to the public via a web site and CD/DVD.

　　Complete the upgrading of archival USGS geochemical data for Alaska and release these to the public via a web site and CD/DVD.

　　Initiate the upgrading of the remainder (non-Alaska) portion of the USGS-generated data. Generate subsets of the master databases containning data in a format more useful to geochemists so they do not have to wade through the process of extracting the data they need from the entire database.

　　Communicate and coordinate the work within this Project with other data delivery efforts within the Bureau such as NatWeb, GEODE, and Spatial Data Delivery.

　　Product map representations of the database showing the spatial variation of chemical species throughout the nation and within sub-regions that are of priority to the USGS.

　　Relevance and Impact

　　An Accurate, easily accessible geochemical database containing multi-element information on the surficial materials of the nation is vital if the USGS is to respond quickly to earth science issues raised by Congress and land management and environmental protection agencies. A nationally consistent geochemical database provides baseline information on the natural abundance and spatial variation of chemical elements to which changes caused by agricultural and irrigation practices, waste disposal, urbanization, industrial pollution, mineral exploration and mining activities, environmental remediation and restoration activities, and other land-use proctices can be compared. Human-induced chemical changes to the environment are superimposed on a variable natural geochemical background where trace-element abundances can range over several orders of magnitude within short distances. These variations are inadequately documented and their existence is often overlooked in the setting of public policy.Important aspects of change cannot be measured, or their consequences anticipated, unless the present composition of the earth's surface materials is known. In her 2000 Presidential address to the Geological Society of America, Mary Lou Zoback indentified six "grand challenges in earth and environmental science". The first of these was "recognizing the signal within the natual variability". Zoback stated that "documenting and understanding natural variability is a vexing topic in almost very environmental problem. How do we recognize and understand changes in natual systems if we don't understand the range of baseline values?" Preserving and enhancing the vast amount of geochemical data within MRP's databases will provide a powerful tool for addressing this "grand challenge". The ultimate goal of producing and electronically disseminating the vast amount of geochemical data within MRP's databases directly supports many of the goals and objectives as stated in the Science Strategy of the Geologic Division (Bohlen and others, 1999). These database are essential for understanding the relationship between geologic processes and human health, ecosystem structure and function, and the distribution of energy and mineral resources. This project also serves as the focal point of requests for geochemical data from outside customers. From June 2001 through May 2002, the predecessor project (National Geochemical Database Project) received over 100 requests for data from Federal, state, and local goverment clients; private sector clients; and internal USGS clients. At a conservatively estimated cost of $300 per sample for collection, preparation, and chemical analysis, the geochemical databases under MRP management represent an expenditure of over $500 million of taxpayer money. To realize the fullest possible return for this investment, these data must be archived in perpetuity in an easily accessible and user-friendly format for full utilization by the wide array of customers that need geochemical data to accomplish their work.

4.3 Brief Introduction of SQL

　　SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database. According to ANSI, it is the standard language for relational database management system. SQL statements are used to perform tasks such as update data on a database, or retrieve data from a database. Some common relational database management system that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. Although most database system use SQL, most of them also have their own additional proprietary extensions that are usually only used on their system.However, the standard SQL commands such as "Select", "Insert", "Update", "Delete", "Create", and "Drop" can used to accomplish almost everything that one needs to do with a database.

　　A relational database system contains one or more objects called tables. The data or information for the database are stored in these tables. Tables are uniquely identified by their names and are comprised of columns and rows. Columns contain the column name, data type, and any other attributes for the column. Rows contain the records or data for the columns. Here is a sample table called "weather".

　　City, state, hight, and low are the columns. The rows contains the data for this table:

Weather
city	state	hight	low
Phoenix	Arizona	105	90
Tucson	Arizona	101	92
Flagstaff	Arizona	88	69
San Diego	California	77	60
Albuquerque	NewMexico	80	82

　　The select statement is used to query the database and retrieve selected data that match the criteria that you specify. Here is the format of a simple select statement:　　　　　

select "column1"
　　　　　　　　[,"column2",etc]
　　　　　　from "tablename"
　　　　　　[where "condition"];

　　[] = optional

　　The column names that follow the select keyword determine which columns will be returned in the results. You can select as many column names that you'd like, or you can use a "*" to select all columns.

　　The table name that follows the keyword from specifies the table will be queried to retrieve the desired results.

　　The where clause (optional) specifies which data values or rows whill be returned or displayed, based on the criteria described after the keyword where.

　　The create table statement is used to create a new table. Here is the format of a simple create table statement:

create table "tablename"
("column1" "data type",
"column2" "data type",
"column3" "data type");

　　Format of create table if you were to use optional constraints:

create table "tablename"
(
"column1" "data type" [constraint],
"column2" "data type" [constraint],
"column3" "data type" [constraint]
);

[] = optional

Note: You may have as many columns as you'd like, and the constraints are optional.

Example:

create table employee
(
first varchar(15),
last varchar(20),
age number(3),
address varchar(30),
city varchar(20),
state varchar(20)
);

To create a new table, enter the keywords create table followed by the table name, followed by an open parenthesis, followed by the first column name, followed by the data type for that column, followed by any optional constraints, and followed by a closing parenthesis. It is important to make sure you use an open parenthesis before the beginning table, and a closing parenthesis after the end of the last column definition. Make sure you separate each column definition with a comma. All SQL statements should end width a ";".

　　The table and column names must start with a letter and can be followed by letters, numbers, or underscores -- not to exceed a total of 30 characters in length. Do not use any SQL reserved keywords as names for tables or column names (such as "select", "create", "insert", etc/*et cetera*/).

　　Data types specify what the type of data can be for that particular column. If a column called "Last_Name", is to be used to hold names, then that particular column should have a "varchar" (variable-length character) data type.

　　Note: Here are the most common data types:

char(size)	Fixed-length character string. Size is specified in parenthesis. Max 255 bytes.
varchar(size)	Variable-length character string. Max size is specified in parenthesis.
number(size)	Number value with a max number of column digits specified in parenthesis.
Date	Date value
number(size,d)	Number value with a maximum number of digits of "size" total, with a maximum number of "d" digits to the right of the decimal.

　　What are constraints? When tables are created, it is common for one or more columns to have constraints associated with them. A constraint is basically a rule associated with a column that the data entered into that column must follow. For example, a "unique" constraint specifies that no two records can have the same value in a particular column. They must all be unique. The other two most popular constraints are "not null" which specifies that a column can't be left blank, and "primary key". A "primary key" constraint defines a unique identification of each record (or row) in a table.

　　The insert statement is used to insert or add a row of data into the table.

　　To insert records into a table, enter the key words "insert into" followed by the table name, followed by an open parenthesis, followed by a list of column names separated by commas, followed by a closing parenthesis, followed by the keyword "values", followed by the list of values enclosed in parenthesis. The values that you enter will be held in the rows and they will match up with the column names that you specify. Strings should be enclosed in quotes, and numbers should not.

insert into "tablename"
(first_column,...last_column)
values
(first_value,...last_vlaue);

　　In the example below, the column name first will match up with value 'Luke', and the column name state will match up with the value 'Georgia'.

　　Example:

insert into employee
    ( first, last, age, address, city, state)
    values ('Luke', 'Duke', 45, '2130 Boars Nest',
                'Hazard Co', 'Georgia') ;

　　Note: All strings should be enclosed between single quotes:'string'

　　Updating Records

　　The update statement is used to update or change records that match a specified criteria.This is accomplished by carefully constructing a where clause.

update "tablename"
set "columnname"="newvalue"
    [,"nextcolumn"="newvalue2"...]
where "columnname" 
    OPERATOR "value"
    [and|or "column"
    OPERATOR "value"];
    
[]=optional

Examples:

update phone_book
    set area_code=623
    where prefix=979;

update phone_book
    set last_name='Smith', prefix=555, suffix=9292
    where last_name='Jones';

update employee
    set age=age+1
    where first_name='Mary' and last_name='Williams';

　　Deleting Records

　　The delete statement is used to delete records or rows from the table.

delete from "tablename"
where "columnname"
    OPERATOR "value"
    [and|or "column"
    OPERATOR "value"];

[]=optional

　　Examples:

delete from employee;

　　Note: if you leave off the where clause, all records will be deleted!

delete from employee
    where lastname='May';

delete from employee
    where firstname='Mike' or firstname='Eric'

　　To delete an entire record/row from a table, enter "delete from" followed by the table name, followed by the where clause which contains the conditions to delete. If you leave the where clause, all records will be deleted.

　　The drop table command is used to delete a table and all rows in the table.

　　To delete an entire table including all of its rows, issue the drop table command followed by the table name. Drop table is different from deleting all of the records in the table. Deleting all of the records in the table leaves the table including column and constraint information. Dropping the table removes the table definition as well as all of its rows.

drop table "tablename";

Example:

drop table myemployees_ts0211;

4.3.1 Reading Material

What's .NET?

　　.NET is both a business strategy from Microsoft and its collection of programming support for what are known as Web services, the ability to use the Web rather than your own computer for various services. Microsoft's goal is to provide individual and business users with a seamlessly interoperable and Web-enabled interface for applications and computing activities increasingly Web browser-oriented. The .NET platform includes servers; building-block services, such as Web-based data storage; and device software. It also includes Passport, Microsoft's fill-in-the-form-only-once identity verification service.

　　The .NET platform is expected to provide:

　　*　　The ability to make the entire range of computing devices work together and to have user information automatically updated and synchronized on all of them.

　　*　　Increased interactive capability for Web sites, enabled by greater use of XML (Extensible Markup Language) rather than HTML.

　　*　　A premium online subscription service, that will feature customized access and delivery of products and services to the user from a central starting point for the management of various applications, such as e-mail, for example, or software, such as Office.NET.

　　*　　Centralized data storage, which will increase efficiency and ease of access to information, as well as synchronization of information among users and devices.

　　*　　The ability to integrate various communications media, such as e-mail, faxes, and telephones.

　　*　　For developers, the ability to create reusable modules, which should increase productivity and reduce the number of programming errors.

　　According to Bill Gates, Microsoft expects that .NET will have as significant and effect on the computing world as the introduction of Windows. One concern being voiced is that although .NET's services will be accessible through any browser, they are likely to function more fully on products designed to work with .NET code.

　　The full release of .NET is expected to take several years to complete, with intermittent releases of products such as a personal security service and new versions of Windows and Office that implement the .NET strategy coming on the market separately. Visual Studio .NET is a development environment that is now available. Window XP supports certain .NET capabilities.

4.4 Applications of Database

　　Database systems are designed to manage large bodies of information. Management of data involves both defining structures for storage of information and providing mechanisms for the manipulation of information. In addition, the database system must ensure the safety of the information stored, despite system crashes or attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible anomalous result.

　　Because information is so important in most organizations, computer scientists have developed a large body of concepts and techniques for managing data.

　　Databases are widely used.Here are some representative applications:

　　Banking: For customer information, accounts, and loans, and banking transactions.

　　Airlines: For reservations and schedule information Airlines were among the first to use database in a geographically distributed manner--terminals situated around the world accessed the central database system through phone lines and other data networks.

　　Universities: For student information, course registrations, and grades.

　　Credit card transactions:For purchases on credit cards and generating of monthly statements.

　　Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on prepaid calling cards, and storing information about the communication networks.

　　Finance: For storing information about holdings, sales, and purchases of financial instruments such as stocks and bonds.

　　Sales: For customer, product, and purchase information.

　　Manufacturing: For management of supply chain and for tracking production of items in factories, inventories of items in warehouses/stores, and orders for items.

　　Human Resources: For information about employees, salaries, payroll taxes and benefits, and for generation of paychecks.

　　Databases form an essential part of almost all enterprises today.

　　Over the course of the last four decades of the twentieth century, use of databases grew in all enterprises. In the early days, very few people interacted directly with database systems, although without realizing it they interacted with databases indirectly--through printed reports such as credit card statements, or through agents such as bank tellers and airline reservation agents. Then automated teller machines came along and let users interact directly with databases. Phone interfaces to computers (interactive voice response systems) also allowed users to deal directly with databases-- a caller could dial a number, and press phone keys to enter information or to select alternative options, to find flight arrival/departure times, for example, or to register for courses in a university.

　　The Internet revolution of the late 1990s sharply increased direct user access to databases. Organizations converted many of their phone interfaces to databases into Web interfaces, and made a variety of services and information available online. For instance, when you access an online bookstore and browse a book or music collection, you are accessing data stored in a database. When you enter an order online, your order is stored in a database. When you access a bank Web site and retrieve your bank balance and transaction information, the information is retrieved from the bank's database system. When you access a Web site, information about you may be retrieved from a database, to select which advertisements should be shown to you.Furthermore,data about your Web accesses may be stored in a database.

　　Thus, although user interfaces hide details of access to a database, and most people are not even aware they are dealing with a databse, accessing database forms an essential part of almost everyone's life today.

　　The importance of database systems can be judged in another way -- today, database system vendors like Oracle are among the largest software companies in the world, and database systems form an important part of the product line of more diversified companies like Microsoft and IBM.

4.4.1 Reading Material

Database New Horizon

　　The modern database era began in 1970, when E.F.Codd published his paper "A Relational Model of Data for Large Shared Data Banks." His idea enabled the logical manipulation of data to be independent of its physical location, greatly simplifying the work of application developers.

　　Now we are poised for another leap forward. Databases will scale to gargantuan proportions, span multiple locations and maintain information in heterogeneous formats. And they will be autonomous and self-tuning. The major database vendors are pursuing these goals in different ways.

　　Thirty years ago, IBM researcher Selinger invented "cost-based" query optimization, by which searches against relational databases such as IBM's DB2 minimized computer resources by finding the most efficient access methoeds and paths. Now Selinger is leading an effort at IBM called Leo-- for Learning Optimizer--that she says will push DB2 optimization into a new realm.

　　Rather than optimizing a query once, when it's compiled, Leo will watch production queries as they run and fine-tune them as it learns about data relationships and use needs. For example, Leo would come to realize that a ZIP code can be associated with only one state, or that a Camry is made only by Toyota, even if those rules aren't specified in advance.

　　Selinger sys Leo will be most helpful in large and complex databases, and in databases where interdata relationships exist but aren't explicitly declared by database designers. Leo is likely to be included in commercial releases of DB2 in about three years, she says.

　　Whether the future of databases is the traditional, relational and SQL model with XML technologies incorporated into it or a new XML-based model is a matter of debate. XML will become the dominant format for data interchange with its flexibility and ability to provide self-description, according to Don Chamberlin, a database technology researcher at IBM.

　　Relational databases, he said, will be fitted with front ends to support XML and process queries based on the XQuery standard. XML will become the "lingua franca" for exchange of data. "We'll also see some large relational systems adapt to XML as a native format," Chamberlin said. Technologists are in the early stages of development of XML technologies. SQL will not go away, but there are new data formats for which it just was not designed, he said.

　　Sun's Rick Cattell, a distinguished engineer at the company, had a less dominant outlook for XML, saying very few people are going to store XQuery data in an XML format. "I think the momentum behind relational databases is insurmountable," Cattell said, adding that he was drawing on his experience with object-oriented databases, which were unable to unseat relational database in enterprise IT shops. Developers, Cattell said, will need tools to convert relational data to XML and vice versa.

　　Currently, performance on the Web is hindered because of translations between Java and XML data formats. Eventually, an extension of XQuery will replace both Java and SQL, according to some experts.

　　The next step in the evolution of databases is to provide a more powerful way to query theme than what is being done on search sites such as Google today.

　　Experts are expecting tuple space technology, which is intended to make it easier to store and fetch data by recognizing patterns. And in-memory databases technology is a "no-brainer," but there is not enough memory available yet to accommodate it.

　　Microsoft Corp. says users will never be persuaded to dump everything--e-mail, documents, audio/video, prictures, spreadsheets and so on -- into one gigantic database. Therefore, the software vendor is developing technology that will allow a user to seamlessly reach across multiple, heterogeneous data stores with a single query.

　　Microsoft's Unified Data project involves thress steps. First, the company will devise "schema" based on XML that define data types. Then it will develop methods for distributed databases. For example, I want to search for a document that references Microsoft, and the document "tells" the query that there's also a media file in another place that references Microsoft.

　　The technology will appear in 18 months in SQL Server. It will be added to other Microsoft products in ensuing years.

　　Oracle Corp. says its customers are moving toward data stores of huge size and complexity, spread over multiple locations. The company says its products will not only evolve to handle those kinds of jobs, but will also do them extraordinarily well. "Over the next couple of releases, we'll see essentially fully autonomous databases," says Robert Shimp, vice president of database marketing.

　　Oracle also wants to facilitate collaboration for people in different companies with widely varying information types. "What doesn't exist today is the underlying infrastructure, or plumbing, that's capable of managing all these divers types of data," Shimp says. "What you need is the ability to link all these clustered databases around the globe into a single, unified view for the individual user."

　　Elsewhere, researchers are finding that the best design for some database application isn't a traditional database at all, but rather data streams. Researchers at Stanford University are working on ways that continuous flows of information--such as Web site hits, stock trades or telecommunications traffic-- can be passed through queries and then archived or discarded. A query might, for example, be written to look continuously for suspicious patterns in network traffic and then spit out an alert.

　　The problem in handling some kinds of problems with a traditional databse management sytem is one of timeliness, says Jennifer Widom, a computer science professor at Stanford. "If you want to put a stream of data into a DBMS, you have to at some point stop, create a load file, load the data and then query it, " she says. "Data stream queries are continuous; they just sit there and give you new answers automatically."

　　Widom and her colleagues are developing algorithms for stream queries, and she says her group will develop a comprehensive data stream management system. A prototype of such a system will take a number of years to develop, and the underlying technology will then be either licensed or offered as freeware, she says.