[关闭]
@Sarah 2015-12-06T16:07:15.000000Z 字数 49459 阅读 1103

SQL 完整PPT 检索专用

SQL


Chapter 7 SQL Fundamentals

标签 : SQL



The basic commands and functions of SQL
SQL for Data Declaration(DDL)
SQL for Data Management and Retrieval (DML)
Anatomy of an SQL Statement

一、Data definition language(DDL)

physical design+ maintenance

Commands that define a database, including creating, altering, and dropping tables and establishing constraints
Define schema objects, primarily table structures and indexes.

◦ CREATE
◦ ALTER
◦ DROP

1.Create

  1. //Syntax
  2. CREATE TABLE tablename (
  3. column name1 datatype constraint,
  4. column name 2 datatype constraint
  5. );

SQL Constraints

  • NOT NULL constraint
    • Ensures that column does not accept nulls
    • UNIQUE constraint
    • Ensures that all values in column are unique
    • DEFAULT constraint
    • Assigns value to attribute when a new row is added to table
    • CHECK constraint
    • Validates data when attribute value is entered

2.Alter

  1. //ALTER TABLE Syntax:
  2. ALTER TABLE tablename ADD PRIMARY KEY(fieldname);
  3. //For foreign key, use FOREIGN KEY in place of PRIMARY KEY
  1. //Syntax:
  2. ALTER TABLE tablename
  3. ADD|MODIFY|RENAME COLUMN| DROP columnnames;
  4. //Three options:
  5. ADD //adds a column
  6. MODIFY //changes column characteristics
  7. DROP //deletes a column
  8. //Can also be used to:
  9. //Add table constraints
  10. ALTER TABLE tablename ADD PRIMARY KEY(fieldname);
  11. //Remove table constraints
  12. ALTER TABLE emp DROP CONSTRAINT ck_bonus2;

3.Drop

  1. //Syntax:
  2. DROP TABLE tablename;


二、Data manipulation language(DML)

implementation

Commands that maintain and query a database
Manipulate schema objects, primarily table rows.

◦ INSERT
◦ SELECT
◦ COMMIT
◦ UPDATE
◦ ROLLBACK
◦ DELETE

1.SELECT

Displaying table information
most powerful and complex statement
Nearly any question regarding who, what, when, what if and how may can be answered with SELECT.
Once you understand the SELECT statement you can apply it to UPDATE, INSERT, and DELETE statements.
SELECT statement is composed of several distinct keywords known as clauses. Each clause has one or more keywords.
There is no limit on the number of columns you can specify in the SELECT clause.

  1. //Viewing a User’s Table Names
  2. SELECT table_name FROM user_tables;
  3. SELECT * FROM user_tables;
  4. //View constraint information
  5. SELECT * FROM user_constraints
  6. SELECT constraint_name, constraint_type, table_name FROM USER_CONSTRAINTS
  7. //Viewing Tablespace information
  8. DESCRIBE user_tablespaces
  9. SELECT * FROM USER_TABLESPACES
  10. //Used to list contents of table
  11. SELECT columnlist
  12. FROM tablename;
  13. //Columnlist represents one or more attributes, separated by commas
  14. //Asterisk (*) can be used as wildcard character to list all columns

Fine-tune SELECT command by adding restrictions to search criteria using:

◦ Conditional restrictions
◦ Arithmetic operators
◦ Logical operators
◦ Special operators

Selecting Rows with Conditional Restrictions
Select partial table contents by placing restrictions on rows to be included in output

  1. //Add conditional restrictions to SELECT statement, using WHERE clause
  2. SELECT columnlist
  3. FROM tablelist
  4. [ WHERE conditionlist ] ;

Arithmetic Operators: The Rule of Precedence

Perform operations within parentheses
Perform power operations
Perform multiplications and divisions
Perform additions and subtractions

Additional SELECT Query Keywords

Logical operators work well in the query environment
SQL provides useful functions that:

Count

Find minimum and maximum values
Calculate averages, etc.

SQL allows user to limit queries to:
◦ Entries having no duplicates
◦ Entries whose duplicates may be grouped

2.Saving Table Changes

Changes made to table contents are not physically saved on disk until:

◦ Database is closed
◦ Program is closed
◦ COMMIT command is used

  1. //Syntax
  2. COMMIT [WORK];
  3. //Will permanently save any changes made to any table in the database

3.INSERT

◦ Used to enter data into table

  1. //Syntax:
  2. INSERT INTO columnname
  3. VALUES (value1, value2, , valueN);

When entering values, notice that:

◦ Row contents are entered between parentheses
◦ Character and date values are entered between apostrophes
◦ Numerical entries are not enclosed in apostrophes
◦ Attribute entries are separated by commas
◦ A value is required for each column
Use NULL for unknown values

4.UPDATE

Modify data in a table

  1. //Syntax:
  2. UPDATE tablename
  3. SET columnname = expression [, columnname = expression]
  4. [WHERE conditionlist];

5.Indexes

6.ROLLBACK

Undoes changes since last COMMIT
Brings data back to prechange values

COMMIT and ROLLBACK only work with commands to add, modify, or delete table rows

  1. //Syntax:
  2. ROLLBACK;

7.DELETE

Deletes a table row

  1. //Syntax:
  2. DELETE FROM tablename
  3. [WHERE conditionlist ];
  4. //WHERE condition is optional

8.Logical Operators: AND, OR, and NOT

9.Special Operators

  • BETWEEN: checks whether attribute value is within a range

  • IS NULL: checks whether attribute value is null

  • LIKE: checks whether attribute value matches given string pattern

  • IN: checks whether attribute value matches any value within a value list

  • EXISTS: checks if subquery returns any rows

10.Ordering a Listing

ORDER BY clause is useful when listing order is important

  1. //Syntax:
  2. SELECT columnlist
  3. FROM tablelist
  4. [WHERE conditionlist]
  5. [ORDER BY columnlist [ASC | DESC]];

Ascending order by default

11.Listing Unique Values

DISTINCT clause produces list of only values that are different from one another

  1. //Example:
  2. SELECT DISTINCT V_CODE
  3. FROM PRODUCT;
  • Access places nulls at the top of the list
  • Oracle places it at the bottom
  • Placement of nulls does not affect list contents
    Aggregate Functions

12.function

13.Grouping Data

Frequency distributions created by GROUP BY clause within SELECT statement [在where后面 在order by 前面]

  1. //Syntax:
  2. SELECT columnlist
  3. FROM tablelist
  4. [WHERE conditionlist]
  5. [GROUP BY columnlist]
  6. [HAVING conditionlist]
  7. [ORDER BY columnlist [ASC | DESC] ] ;

14.Joining Database Tables

Joining Tables with an Alias

Recursive Joins

Alias is especially useful when a table must be joined to itself

Recursive query

Use aliases to differentiate the table from itself



SELECT Clauses

1.SELECT(required)
◦ List the columns (and expressions) that should be returned from the query
2.FROM(required)
◦ Indicate the table(s) or view(s) from which data will be obtained
** 3.WHERE**
◦ Indicate the conditions under which a row will be included in the result
4.GROUP BY
◦ Used with aggregate functions to summarize data
5.HAVING
◦ Filters the result of aggregate functions in grouped information, similar to WHERE
6. ORDER BY
◦ Sorts the result according to specified criteria



三、Data Control Language (DCL)

implementation+ maintenance

Commands that control a database, including administering privileges and committing data



SQL Environment

Terminology

Relation = table
Tuple or record = row
Attribute or field = column

Data Types

Data type selection is usually dictated by nature of data and by intended use
Supported data types:

◦ Number(L,D), Integer, Smallint, Decimal(L,D)
◦ Char(L), Varchar(L), Varchar2(L)
◦ Date, Time, Timestamp
◦ Real, Double, Float
◦ Interval day to hour
◦ Many other types
◦ RDBMS specific



有用的ppt
此处输入图片的描述

此处输入图片的描述
此处输入图片的描述
此处输入图片的描述
此处输入图片的描述
此处输入图片的描述

CH 8

标签(空格分隔): SQL


Why should you join tables together?

combining records from diffrient tables is called joining.

How do you join tables together?

through the common columns.(primary and foreign-key columns).

Two steps to join tables

Place the names of the tables in the FROM clause. The table names must be separated by commas.
Add a condition to the where clause, describing how the tables are related.

SQL Join Operators

Join operation merges rows from two tables and returns the rows with one of the following:

Have common values+common columns (Natural join)

Mee given join condition (Equality or inequality)

Have common values in common columns or have no matching values (Outer join)

Only returns rows meeting criteria (Inner join)

Types of joins

1.Equi Join
Natural Join
matching values in the matching columns
Eliminates duplicate columns
Used when tables share one or more common attributes with common names

  1. Syntax:
  2. SELECT column-list FROM table1 NATURAL JOIN table2
  3. SELECT column-list FROM table 1, table 2
  4. WHERE table1.column = table2.column

JOIN USING Clause

Returns only rows with matching values in the column indicated in the USING clause

  1. Syntax:
  2. SELECT column-list FROM table1 JOIN table2 USING (common-column)

JOIN USING operand does not require table qualifiers
Oracle returns error if table name is specified

JOIN ON Clause

Used when tables have no common attributes
Returns only rows that meet the join condition
Typically includes equality comparison expression of two columns

  1. Syntax:
  2. SELECT column-list FROM table1 JOIN table2 ON join-condition

What about duplicate column names?

Fully qualify column names when tables share the same column names (e.g. cus_code)
Is this invoice cus_code or customer cus_code?
Need to fully qualify the column so the analyzer knows which column to display (e.g. invoice.cus_code and customer.cus_code)
Isn’t that a lot of typing?
To minimize typing, use aliases
Invoice i and customer c
i.cus_code and c.cus_code

2.Non-Equi Join

Joins records from two tables that do not have common columns and values.
An operator rather than the equal sign is used.
Same as Cross Join

  1. Syntax:
  2. SELECT column-list FROM table 1, table 2 WHERE table1.column [between, >, <] table2.column

Outer Joins

Returns rows matching the join condition
Also returns rows with unmatched attribute values for tables to be joined

Three types

Left
Right
Full

Left and right designate order in which tables are processed

Outer Joins

Left outer join

Returns rows in left side table with unmatched values
Returns rows matching the join condition

  1. Syntax: SELECT column-list FROM table1 LEFT JOIN table2 ON join-condition

Right outer join

Returns rows matching join condition
Returns rows in right side table with unmatched values
Outer Joins
Full outer join
Returns rows matching join condition
Returns all rows with unmatched values in either side table

  1. Syntax:
  2. SELECT column-list
  3. FROM table1 RIGHT JOIN table2
  4. ON join-condition

SELF Joins

Returns rows from the same table.
Example: employee table that contains a column for the employee’s supervisor.
When joining a table to itself you are listing the same table multiple times in the FROM clause
Requires a table alias

  1. Syntax:
  2. SELECT column-list
  3. FROM table1 alias, table2 alias
  4. WHERE alias.1column_name = alias2.column_name

Cross Join

Performs relational product of two tables
Also called Cartesian product
Same as Non-Equi Join
Joins each record in the first table to every record in the second

  1. Syntax:
  2. SELECT column-list FROM table1 CROSS JOIN table2
  3. SELECT column-list FROM table 1, table 2 (no where clause specified)

Quick tip

Always check your WHERE clause.
You should have one less join condition than the number of tables listed in the FROM clause.
Good practice to test your join expressions before adding additional expressions and clauses to your statement.

Aliases

Column Alias

Allow you to change the column’s heading in the output.
SELECT columnname AS alias

Table Alias

Allow you to avoid lengthy table names, helpful when joining tables.
Table aliases can be used as qualifiers for column names in the select clause

  1. SELECT column-list FROM tablename alias
  2. Example:
  3. SELECT name FROM employee e
  4. SELECT e.name, e.salary FROM employee e, payroll p
  5. WHERE e.id = p.id

Powerful tool for enhancing basic queries.
Takes 0 or more arguments and returns a single value.
Used for performing calculations, converting data, modifying data, manipulating rows, formatting columns.

Types of Built-in Functions

Single-row functions
Work on columns from each row and return one result per row.
Group functions or Aggregate functions
Manipulate data in a group of rows and return a single result.

SQL Functions

Generating information from data often requires many data manipulations
SQL functions are similar to functions in programming languages
Functions always use numerical, date, or string value
Value may be part of a command or attribute in a table
Function may appear anywhere in an SQL statement

Single-Row Functions

Work on data items from each row
Return one value for each row
Types of single-row functions

Character Functions
Number Functions
Date Functions
Conversion Functions
General Functions
Character Functions

String manipulations are the most used functions in programming
String manipulation function examples:
Concatenation
Printing in uppercase or lowercase
Finding length of an attribute

CONCATENATION
UPPER/LOWER
SUBSTR
LENGTH

Numeric Functions

Numeric functions take numeric values and return a numeric value.
Common numeric functions

TRUNC
POWER
ABS
MOD
SIGN
FLOOR
CEIL
ROUND
CEIL/FLOOR

Date and Time Functions

All SQL-standard DBMSs support date and time functions
Date functions take one parameter and return a value
Date/time data types are implemented differently by different DBMS vendors
ANSI SQL standard defines date data types, but not how data types are stored
Oracle stores dates internally with day, month, year, century, hour, minute, and second information. The default date display format is DD-MON-YY.
SYSDATE – common date function, doesn’t take any arguments, returns system’s current date.

TO_CHAR
TO_DATE
ADD_MONTHS
LAST_DAY

Conversion Functions

Take a value of given data type and convert it to the equivalent value in another data type
Oracle conversion functions:
TO_CHAR: takes a date value, converts to character string
TO_DATE: takes character string representing a date, converts it to actual date in Oracle format
SQL Server uses CAST and CONVERT functions

TO_NUMBER

General Functions

NVL: converts a null value to an actual value supplied as an argument

  1. SYNTAX: NVL(column, value)

NVL2: checks for null as well as not null values

  1. SYNTAX: NVL2(column, notnullvalue, nullvalue)

DECODE: conditional statement, tests a column or expression and for each of tis matching value, provides an action.

Group or Aggregate Functions

Perform an operation on a group of rows and return one result.
Multi-row functions.

Aggregate Functions

AVG compute average value for a table or group

COUNT function tallies number of non-null values of an attribute

Takes one parameter: usually a column name

MAX and MIN find highest (lowest) value in a table or group

Compute MAX value in inner query

Compare to each value returned by the query

STDDEV computes the standard deviation for a table or group

SUM computes total sum for any specified attribute

VARIANCE computes variance of the table or group

Grouping Data

Works on data that is grouped.
Tell the database how to group or categorize the data with ###a GROUP BY clause.
Used with Aggregate functions.

  1. Syntax:
  2. SELECT columnlist
  3. FROM tablelist
  4. [WHERE conditionlist]
  5. [GROUP BY columnlist]
  6. [HAVING conditionlist]
  7. [ORDER BY columnlist [ASC | DESC] ] ;

Having Clause
Restricts groups.
The WHERE clause restricts rows, the GROUP BY clause groups remaining rows.
The GROUP function works on each group, and the HAVING clause keeps the groups that match the group condition.
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Grouping Data
Important tips:

subqueries

Subqueries and Correlated Queries
Often necessary to process data based on other processed data
Subquery is a query inside a query, normally inside parentheses

Subqueries Can be used within a WHERE, HAVING, FROM, or SELECT

2 types of subqueries

Single-row: a subquery that returns only one row of data
Multiple-row: a subquery that returns more than one row of data

Single-Row subquery
Subquery must be enclosed within parentheses.
Returns one column form one row, of no row is returned the value is NULL. If more than one row returned an error occurs.
ORDER BY clause cannot be used in a subquery.

Common uses of subqueries:
As a conditional expression in the WHERE clause or HAVING clause
Create a table using a subquery: create table and populate with data
Insert data into a table using a subquery
Update data in a table using a subquery: update table set columnname = value where operator (select subquery)
Delete date from a table using a subquery

WHERE Subqueries
Most common type uses inner SELECT subquery on right side of WHERE comparison
Requires a subquery that returns only one single value
Value generated by subquery must be of comparable data type
Can be used in combination with joins
syntax 补充

HAVING Subqueries
HAVING clause restricts the output of a GROUP BY query
Applies conditional criterion to the grouped rows
Syntax 补充

Multirow Subquery
Returns more than one row
Uses special operators
IN

IN Subqueries
Used when comparing a single attribute to a list of values
Syntax 补充

FROM Subqueries
Specifies the tables from which the data will be drawn
Can use SELECT subquery in the FROM clause
View name can be used anywhere a table is expected
The subquery is similar to a virtual table or temporary table.
The subquery is executed first, creating a temporary instance of the table.
The temporary or virtual table can be joined with other tables.
syntax 补充

Attribute List Subquery
SELECT statement uses attribute list to indicate columns to include in the result set.
Columns can be attributes of base tables
Result of aggregate function
Attribute list can also include subquery expression: inline subquery
Must return one value; otherwise an error code is raised.
Good for computations on each row.
Cannot use an alias in the attribute list
SQL Example File
Syntax

  1. SELECT standard_disclaimer FROM company_stuff WHERE meaningful_content = 'NONE';
  2. SELECT * FROM management WHERE clue > 0;
  3. SELECT * FROM client WHERE clue > 0;
  4. SQL> select count(*) from developer_brain;
  5. select count(*) from developer_brain
  6. *
  7. ERROR at line 1: ORA-00942: table or view does not exist

Correlated Subqueries
Inner query can reference columns form the outer query.
Inner query executes once for each row in the outer query.
Correlated because inner query is related to the outer query
Inner query references column of outer subquery
Similar to a nested loop.
Can also be used with the EXISTS special operator

Syntax

Tips for Writing SQL

Quick tip

Relational Set Operators

Keywords that are used to combine records from different result sets.

SELECT statements produce the record sets that the set operators combine.

Relational Set Operators

UNION
INTERSECT
MINUS

Work properly if relations are union-compatible
Names of relation attributes must be the same and their data types must be identical

SET Operator rules

The SET operator is placed between the two SELECT statements.
Each of the SELECT clauses must have the same number of expressions.
Each expression in the SELECT clause must have the same data type as the corresponding expression in the other SELECT clauses.
Each SELECT statement is a part of one overall statement.

UNION

Combines rows from two or more queries without including duplicate rows

  1. //Example:
  2. SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE,
  3. FROM CUSTOMER
  4. UNION
  5. SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE,
  6. FROM CUSTOMER_2
  7. //Can be used to unite more than two queries

UNION ALL

Produces a relation that retains duplicate rows

  1. Example query:
  2. SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE, FROM CUSTOMER
UNION ALL
SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE, FROM CUSTOMER_2;
  3. //Can be used to unite more than two queries

INTERSECT

Combines rows from two queries, returning only the rows that appear in both sets

  1. //Syntax:
  2. query INTERSECT query
  3. Example query:
  4. SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE, FROM CUSTOMER
INTERSECT
SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE, FROM CUSTOMER_2

MINUS

Combines rows from two queries
Returns only the rows that appear in the first set but not in the second

  1. Syntax: query MINUS query
  2. Example:
  3. SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE, FROM CUSTOMER
MINUS
  4. SELECT CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE, FROM CUSTOMER_2

Oracle Sequences

MS Access AutoNumber data type fills a column with unique numeric values
Independent object in the database
Named, used anywhere a value expected
Not tied to a table or column
Generate numeric values that can be assigned to any column in any table
Created and deleted at any time

Create Sequence
View sequences (Table)
Inserting a new row into a table that uses a sequence
Delete (DROP) a Sequence

Views

Oracle object that gives the user a logical view of data.
Logical representation of a subset of data from one or more tables.
Users can view data from multiple tables without using complex join conditions.
Hide the names of the underlying tables so the user doesn’t know where the data came from.
Stored as a SELECT statement in the Data Dictionary.

Views: Why uses views?

Mask complexity
Enhance the confidentiality of data: the SELECT and WHERE clauses can filter the expressions and data available to the user.
Some information can only be produced through views (joining views)
Can be used for reports

Views

Can use the name of a view anywhere a table name is expected in a SQL statement.
Dynamically updated view is re-created on demand each time it is invoked.

Virtual Tables: Creating a View
View: Virtual table based on a SELECT query
Base tables: Tables on which the view is based

  1. Syntax:
  2. CREATE VIEW viewname AS SELECT query

See all the views and the SQL query associated with that view.

  1. SELECT * FROM ALL_VIEWS
  2. SQL Example File
  3. Create Views
  4. Select data from the view

PL/SQL

Procedural Language SQL
- Makes it possible to use and store procedural code with SQL statements.
- Merge SQL and traditional programming constructs such as variables, conditional processing (IF-THEN-ELSE), basic loops (FOR WHILE) and error trapping.
- Used to create Anonymous PL/SQL blocks, Triggers, Stored Procedures, PL/SQL functions

PL/SQL guidelines

Pl/sql block

Why use Pl/SQL?
Isolate critical code
All applications access shared code
Better maintenance and logic control

Persistent stored module (PSM) is a block of code containing:
Standard SQL statements
Procedural extensions
Stored and executed at the DBMS server
Procedural SQL (PL/SQL) enables you to:
Store procedural code and SQL statements in database
Merge SQL and traditional programming constructs

Stored Procedures
Named collection of procedural and SQL statements.
Advantages?
Substantially reduce network traffic and increase performance.
No transmission of individual SQL statements over network
Reduce code duplication by means of code isolation and code sharing.
Minimize chance of errors and cost of application development and maintenance

Stored Procedures: Guidelines

Follows same rules as Anonymous PL/SQL block.

To execute the stored procedure:

  1. EXEC procedure_name(parameter_list);

Stored Procedures
Stored Procedure

PL/SQL Processing with Cursors

What if the SQL statement returns more than one value?
It will return an error unless you use a cursor.
Cursor: special construct in procedural SQL to hold data rows returned by SQL query (similar to an array)
Implicit cursor: automatically created when SQL returns only one value
Explicit cursor: holds the output of an SQL statement that may return two or more rows
Cursor-style processor retrieves data from cursor one row at a time
Current row is copied to PL/SQL variables

triggers

Block of PL/SQL code stored within the database.
Fired by a database transaction.
Cannot be called directly.

Triggers: syntax
Requires the following parts:
Triggering timing: BEFORE, AFTER
Triggering event: on INSERT, UPDATE, or DELETE
Triggering level: statement-level versus row-level
statement-level: executed once, before or after the triggering statement is completed.
row-level: executed once for each row affected by the triggering statement; if you update 10 rows, the trigger executes 10 times.
Triggering action: code enclosed between the BEGIN and END keywords.

Uses of triggers

When would a trigger be useful?
Generating values for derived columns.
Performing validity checks.
Implementing business rules.
Maintaining referential integrity.
Enforce constraints.
Used to provide auditing features.
Can be used to update, insert, and call other stored procedures.

##
Embedded SQL 

SQL statements contained within an application programming language such as C# or Java.
Run-time mismatch: SQL is a nonprocedural, interpreted language; each instruction is parsed, its syntax is checked, and it is executed on instruction at a time. All of the processing takes place at the server.
Data type mismatch: SQL provides several data types, but some of them might not match data types used in different host languages.

Embedded SQL framework defines:
Standard syntax to identify embedded SQL code within host language
Standard syntax to identify host variables
Communication area exchanges status and error information between SQL and host language
Embedded SQL
Static SQL
Embedded SQL in which programmer uses predefined SQL statements and parameters.
End users of programs are limited to actions that were specified in application programs
SQL statements will not change while application is running.
Embedded SQL

Dynamic SQL
SQL statement is not known in advance, but instead is generated at run time
Program can generate SQL statements at run-time that are required to respond to ad hoc queries
Attribute list and condition are not known until end user specifies them
Tends to be much slower than static SQL
Requires more computer resources

Summary

PL/SQL can be used to create triggers, stored procedures, and PL/SQL functions
A stored procedure is a named collection of SQL statements
When SQL statements return more than one value inside the PL/SQL code, cursor is needed
Embedded SQL uses SQL statements within an application programming language

CH 10

标签(空格分隔): SQL


Transaction Management
and Concurrency Control

What Is a Transaction?

Logical unit of work that must be either entirely completed or aborted
Successful transaction changes database from one consistent state to another
One in which all data integrity constraints are satisfied
Most real-world database transactions are formed by two or more database requests

Transaction Sequence

Create Invoice (insert)
Create Line(s) (insert)
Update product (update PROD_QOH)
Create Acct_transaction (INSERT)

What if insert acct_transaction fails? The customer paid but it isn’t recorded?

What if prod_qoh update fails?
What if prod_qoh = 1 and transaction 1 takes last product after transaction 2 already checks inventory status?

Transaction Properties (ACIDS)

Atomicity: All operations of a transaction must be completed
Consistency: Permanence of database’s consistent state
Isolation: Data used during transaction cannot be used by second transaction until the first is completed
Durability: Once transactions are committed, they cannot be undone
Serializability: Concurrent execution of several transactions yields consistent results. Multiuser databases are subject to multiple concurrent transactions

Transaction Management with SQL

ANSI has defined standards that govern SQL database transactions

Transaction support is provided by two SQL statements:
COMMIT and ROLLBACK

Transaction sequence must continue until:
COMMIT statement is reached
ROLLBACK statement is reached
End of program is reached
Program is abnormally terminated

Autocommit

Environment variable – default setting is to OFF
When AUTOCOMMIT is set to ON, every DML statement is written to the disk as soon as it is executed and there is no rollback.
If the system crashes, any statements after the last COMMIT are rolled back, so partial changes to tables are not permanently written.
Before the COMMIT is executed, data is being pulled form the database’s temporary storage area.
Other users cannot view the results of DML queries until the user commits the changes.
Other users cannot change the data in the locked rows.
After a COMMIT, the changes become permanent, all users can view the changes, locks are released.

The Transaction Log

A record for the beginning of transaction

For each transaction component:
Type of operation being performed (update, delete, insert)
Names of objects affected by transaction
“Before” and “after” values for updated fields
Pointers to previous and next transaction log entries for the same transaction
Ending (COMMIT) of the transaction

Concurrency Control

Ensures database integrity – entity integrity, referential integrity and attribute integrity.
Coordination of simultaneous transaction execution in a multiprocessing database
Objective is to ensure serializability of transactions in a multiuser environment

Three main problems:
Lost updates
Uncommitted data
Inconsistent retrievals

Lost update problem:
Two concurrent transactions update same data element
One of the updates is lost, overwritten by the other transaction
Lost Update

Uncommitted data phenomenon:
Two transactions are executed concurrently
First transaction rolled back after second already accessed uncommitted data
Uncommitted data

Inconsistent retrievals:
First transaction accesses data
Second transaction alters the data
First transaction accesses the data again
Transaction might read some data before they are changed and other data after changed
Yields inconsistent results

The Scheduler
Special DBMS program
Purpose is to establish order of operations within which concurrent transactions are executed

Interleaves execution of database operations:
Ensures serializability
Ensures isolation
Serializable schedule
Interleaved execution of transactions yields same results as serial execution

Concurrency Control
with Locking Methods
Guarantees exclusive use of a data item to a current transaction
Required to prevent another transaction from reading inconsistent data

Pessimistic locking
Use of locks based on the assumption that conflict between transactions is likely

Lock manager
Responsible for assigning and policing the locks used by transactions

Lock Granularity
Indicates level of lock use
Locking can take place at following levels:
Database
Table
Page
Row
Field (attribute)
Lock Granularity (cont’d.)
Database-level lock: Entire database is locked
Table-level lock: Entire table is locked
Page-level lock: Entire diskpage is locked
Row-level lock: Allows concurrent transactions to access different rows of same table, even if rows are located on same page
Field-level lock: Allows concurrent transactions to access same row, requires use of different fields (attributes) within the row

Lock Types

Binary lock
Two states: locked (1) or unlocked (0)
Exclusive lock
Access is specifically reserved for transaction that locked object
Must be used when potential for conflict exists
Shared lock
Concurrent transactions are granted read access on basis of a common lock
Two-Phase Locking
to Ensure Serializability
Defines how transactions acquire and relinquish locks
Guarantees serializability, but does not prevent deadlocks
Growing phase: Transaction acquires all required locks without unlocking any data
Shrinking phase: Transaction releases all locks and cannot obtain any new lock
Governed by the following rules:
Two transactions cannot have conflicting locks
No unlock operation can precede a lock operation in the same transaction
No data are affected until all locks are obtained

Deadlocks
Condition that occurs when two transactions wait for each other to unlock data
Possible only if one of the transactions wants to obtain an exclusive lock on a data item
No deadlock condition can exist among shared locks
Deadlocks

Three techniques to control deadlock:
Prevention
Detection
Avoidance

Choice of deadlock control method depends on database environment
Low probability of deadlock; detection recommended
High probability; prevention recommended

Locking in Oracle
User does not have to write any explicit statements to lock tables.
Oracle uses automatic locking with the least restrictions.
2 lock classes – exclusive and shared.
Exclusive prevents sharing a resource until a lock is released.

Shared allows sharing for read only purposes.
Database Recovery Management
Restores database to previous consistent state
Based on atomic transaction property
All portions of transaction are treated as single logical unit of work

All operations are applied and completed to produce consistent database
If transaction operation cannot be completed:

Write-ahead-log protocol: ensures transaction logs are written before data is updated

Redundant transaction logs: ensure physical disk failure will not impair ability to recover
Buffers: temporary storage areas in primary memory

Checkpoints: operations in which DBMS writes all its updated buffers to disk
Transaction Recovery
Deferred-write technique
Only transaction log is updated

Recovery process: identify last checkpoint
If transaction committed before checkpoint: Do nothing
If transaction committed after checkpoint: Use transaction log to redo the transaction
If transaction had ROLLBACK operation: Do nothing

Transaction Recovery

Write-through technique
Database is immediately updated by transaction operations during transaction’s execution
Recovery process: identify last checkpoint
If transaction committed before checkpoint: Do nothing
If transaction committed after last checkpoint: DBMS redoes the transaction using “after” values
If transaction had ROLLBACK or was left active: Do nothing because no updates were made

CH 11

标签(空格分隔): SQL


Database Performance-Tuning Concepts
Goal of database performance is to execute queries as fast as possible

Database performance tuning

Set of activities and procedures designed to reduce response time of database system
All factors must operate at optimum level with minimal bottlenecks
Good database performance starts with good database design
Database Performance-tuning
Activities and procedures designed to reduce the response time of database systems.

Goal: Execute queries as fast as possible; Operate at optimum level with minimal bottlenecks
Good database performance starts with good database design

Database Performance Tuning
Client
Server – Hardware, Software
Architecture - Network
Database Processor – Optimizer, Database Statistics
Query Processing
SQL Statements
Indexes
Caching
Guidelines for better performance
DBMS Architecture
All data in database are stored in data files
Data files
Automatically expand in predefined increments known as extends
Grouped in file groups or table spaces
Table space or file group
Logical grouping of several data files that store data with similar characteristics
Performance Tuning: 
Client and Server
Client side – Examples?
Examples:
Connection time and connection speed
SQL Query result set (size)
SQL Query Processing – using minimal amount of server resources
Solution: SQL Query performance tuning, Network Performance
Server side – Examples?
Examples:
Server processing capabilities
Database processing of requests
DBMS – optimal use of resources
Solution: DBMS performance tuning, Server Capacity Planning
DBMS Architecture
DBMS Architecture
Data cache or buffer cache: shared, reserved memory area. Stores most recently accessed data blocks in RAM
SQL cache or procedure cache: stores most recently executed SQL statements
DBMS retrieves data from permanent storage and places it in RAM.
Typical DBMS processes:
Listener, user, scheduler, lock manager, optimizer

#Performance Tuning:
DBMS Architecture
All data in database are stored in data files.
Data files automatically expand in predefined increments known as extends. Grouped in file groups or table spaces.
Input/output request: low-level data access operation to/from computer devices
Data cache is faster than data in data files. DBMS does not wait for hard disk to retrieve data.
Majority of performance-tuning activities focus on minimizing I/O operations

#Performance Tuning:
DB Processing
Query Optimizer
Database Statistics
Query Processing
Indexing

Database Query Optimizer

The optimizer is built-in software that determines the most efficient way to execute a SQL statement.
Cost-based query optimization algorithm
Uses statistical information about the database
Algorithms based on statistics about objects being accessed
Adds up processing cost, I/O costs, resource costs to derive total cost
Rule-based query optimization algorithm
Based on a set of user-defined rules to determine the best query access strategy
Rules assign a fixed cost to each operation

What does Oracle use? CBO

Database Query optimizer

The database can execute a SQL statement in multiple ways
full table scans
index scans
nested loops
hash joins
The optimizer considers many factors related to the objects and the conditions in the query when determining an execution plan.
This determination is an important step in SQL processing and can greatly affect execution time.

r
When the user submits a SQL statement for execution, the optimizer performs the following steps:
The optimizer generates a set of potential plans for the SQL statement.
The optimizer estimates the cost of each plan based on statistics in the data dictionary.
The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan. The optimizer calculates the cost of access paths and join orders based on the estimated computer resources, which includes I/O, CPU, and memory.
The optimizer compares the plans and chooses the plan with the lowest cost.
The output from the optimizer is an execution plan  that describes the optimum method of execution.
Database Query optimizer
Database Statistics

Measurements about database objects and available resources:
Tables
Indexes
Number of processors used
Processor speed
Temporary space available
Make critical decisions about improving query processing efficiency
Can be gathered manually by DBA or automatically by DBMS

Query Processing

DBMS processes queries in three phases
Parsing
DBMS parses the query and chooses the most efficient access/execution plan
Execution
DBMS executes the query using chosen execution plan
Fetching
DBMS fetches the data and sends the result back to the client

Database statistics
Query Processing

SQL Parsing Phase

Break down query into smaller units
Transform original SQL query into slightly different version of original SQL code
Query optimizer analyzes SQL query and finds most efficient way to access data
Validated for syntax compliance
Validated against data dictionary (Tables and column names are correct, User has proper access rights)
Analyzed and decomposed into components
Optimized
Prepared for execution

SQL Parsing Phase
Access plans are DBMS-specific
Translate client’s SQL query into a series of complex I/O operations
Required to read the data from the physical data files and generate result set
DBMS checks if access plan already exists for query in SQL cache
DBMS reuses the access plan to save time
If not, optimizer evaluates various plans and chosen plan is placed in SQL cache.

Data Access Paths

Full table scan used even when there are indexes.
Why is a full table scan faster for accessing large amounts of data?
Full table scans are cheaper than index range scans when accessing a large fraction of the blocks in a table. Full table scans can use larger I/O calls, and making fewer large I/O calls is cheaper than making many smaller calls.
Rowid scan: The rowid of a row specifies the data file and data block containing the row and the location of the row in that block. Locating a row by specifying its rowid is the fastest way to retrieve a single row, because the exact location of the row in the database is specified.
Index scan: More efficient to use index to access table than to scan all rows in table sequentially

Query Processing Bottlenecks

Delay introduced in the processing of an I/O operation that slows the system
CPU
RAM
Hard disk
Network
Application code
SQL Execution Phase & Fetching Phase
All I/O operations indicated in access plan are executed
Locks acquired
Data retrieved and placed in data cache
Transaction management commands processed
Rows of resulting query result set are returned to client
DBMS may use temporary table space to store temporary data

Indexes and Query Optimization

Indexes
Crucial in speeding up data access
Facilitate searching, sorting, and using aggregate functions as well as join operations
Ordered set of values that contains index key and pointers
Row is retrieved by traversing the index, using the indexed column values specified by the statement. An index scan retrieves data from an index based on the value of one or more columns in the index. If the statement accesses only columns of the index, then Oracle Database reads the indexed column values directly from the index, rather than from the table.

Indexes and Query Optimization
Data sparsity: number of different values a column could possibly have
Indexes implemented using:
Hash indexes
B-tree indexes
Bitmap indexes
DBMSs determine best type of index to use

#Using Hints to Affect 
Optimizer Choices
Optimizer might not choose best plan
Makes decisions based on existing statistics
Statistics may be old
Might choose less-efficient decisions
Optimizer hints: special instructions for the optimizer embedded in the SQL command text

SQL Performance Tuning

Evaluated from client perspective
Most current relational DBMSs perform automatic query optimization at the server end
Most SQL performance optimization techniques are DBMS-specific
Rarely portable
Majority of performance problems are related to poorly written SQL code
Carefully written query usually outperforms a poorly written query
Index Selectivity
Indexes are used when:
Indexed column appears by itself in search criteria of WHERE or HAVING clause
Indexed column appears by itself in GROUP BY or ORDER BY clause
MAX or MIN function is applied to indexed column
Data sparsity on indexed column is high
Measure of how likely an index will be used
General guidelines for indexes:
Create indexes for each attribute in WHERE, HAVING, ORDER BY, or GROUP BY clause
Do not use in small tables or tables with low sparsity
Declare primary and foreign keys so optimizer can use indexes in join operations
Declare indexes in join columns other than PK/FK

Conditional Expressions

Normally expressed within WHERE or HAVING clauses of SQL statement
Restricts output of query to only rows matching conditional criteria
Common practices for efficient SQL:
Use simple columns or literals in conditionals
P_PRICE > 10 is faster than P_QOH > P_MIN * 1.10
Numeric field comparisons are faster
Equality comparisons are faster than inequality
DB can do a direct search when using =
NULL and LIKE are the slowest of all
Transform conditional expressions to use literals
P_PRICE – 10 instead use P_PRICE = 17

Conditional Expressions (cont’d.)
Write equality conditions first
AND: use condition most likely to be false first
OR: use condition most likely to be true first
Avoid NOT
NOT(P_PRICE > 10.00) instead P_PRICE <= 10.00

SQL Performance Tips

Avoid correlated subqueries
Avoid foreign key constriants
Drop indexes before loading data
Pull only the columns you need (avoid select *)
Avoid outer joins
Avoid calculated fields in Join and Where clauses

http://web.synametrics.com/top10performancetips.htm

DBMS Performance Tuning
Includes managing DBMS processes in primary memory and structures in physical storage
DBMS performance tuning at server end focuses on setting parameters used for the:
Data cache
SQL cache
Sort cache
Optimizer mode
DBMS Performance Tuning (cont’d.)
Some general recommendations for creation of databases:
Use RAID (Redundant Array of Independent Disks) to provide balance between performance and fault tolerance
Minimize disk contention
Put high-usage tables in their own table spaces
Assign separate data files in separate storage volumes for indexes, system, and high-usage tables
DBMS Performance Tuning (cont’d.)
Take advantage of table storage organizations in database
Partition tables based on usage
Use denormalized tables where appropriate
Store computed and aggregate attributes in tables
DBMS Performance Tuning
Includes managing DBMS processes in primary memory and structures in physical storage
DBMS performance tuning at server end focuses on setting parameters used for the:
Data cache
SQL cache
Sort cache
Optimizer mode
DBMS Performance Tuning
Some general recommendations for creation of databases:
Use RAID (Redundant Array of Independent Disks) to provide balance between performance and fault tolerance
Minimize disk contention
Put high-usage tables in their own table spaces
Assign separate data files in separate storage volumes for indexes, system, and high-usage tablesTake
Take advantage of table storage organizations in database
Partition tables based on usage
Use denormalized tables where appropriate
Store computed and aggregate attributes in tables

Query Optimization Example

Example illustrates how query optimizer works
Based on QOVENDOR and QOPRODUCT tables
Uses Oracle SQL*Plus
SQL Commands for DB Performance
Create or Update Statistics
ANALYZE TABLE employee COMPUTE STATISTICS;

Explain plan

EXPLAIN PLAN FOR (your query);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Create Index
CREATE INDEX empnum_NDX1 ON employee(emp_num);

Interpreting an Explain Plan
http://www.morganslibrary.org/pres/lad10/peoug_xplan.pdf
http://docs.oracle.com/cd/B28359_01/server.111/b28274/ex_plan.htm
http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-explain-the-explain-plan-052011-393674.pdf
Explain plan video - https://www.youtube.com/watch?v=SwKIqRV5l3k; https://www.youtube.com/watch?v=bcVbR_FPWKk
Autotrace - http://www.dba-oracle.com/t_OracleAutotrace.htm

Database Statistics
DBA_TABLES
DBA_OBJECT_TABLES
DBA_TAB_STATISTICS
DBA_TAB_COL_STATISTICS
DBA_TAB_HISTOGRAMS
DBA_INDEXES
DBA_IND_STATISTICS
DBA_CLUSTERS
DBA_TAB_PARTITIONS
DBA_TAB_SUBPARTITIONS
DBA_IND_PARTITIONS
DBA_IND_SUBPARTITIONS
DBA_PART_COL_STATISTICS
DBA_PART_HISTOGRAMS
DBA_SUBPART_COL_STATISTICS
DBA_SUBPART_HISTOGRAMS

data CH13 summary

标签 : SQL


What is big data? Velocity, variety and volume

Business intelligence is a term used to describe a comprehensive, cohesive and integrated set of tools and processes used to support business decision making.

Data > information > knowledge > wisdom

Collect & store operational data > aggregate operational data into decision support data > analyze decision support data to generate information >…

Traditional info system focus on operational automation and reporting

BI focus on strategic and tactical use of information

Business intelligence architecture is composed of data, people, processes, technology and the management of such components. Main goal: improved decision making

Operational data: mostly stored in relational db, optimized to support transactions representing daily operations

Operational data vs analytical data
- Extract operational data for BI processing.
- BI effectiveness depends on quality of data gathered at operational level
- Operational data seldom well-suited for decision support tasks
- Need reformat data in order to be useful for business intelligence

Operational vs Decision support data:

Requirements for a decision support DBMS:

  1. Db schema: aggregated and summarized data; queries extract multidimensional time slices
  2. Data extraction and loading: supports different data sources; check for inconsistent data
  3. Db size: dbms must support very large db
  4. End user analytical interface

Data warehouse:

Data marts

Star Schemas

Performance-Improving Techniques for the Star Schema

Four techniques to optimize data warehouse design:

  1. Normalizing dimensional tables(Achieve semantic simplicity &Facilitate end-user navigation through the dimensions)
  2. Maintaining multiple fact tables to represent different aggregation levels
  3. Denormalizing fact tables (improves data access performance and saves data storage space)
  4. Partitioning and replicating tables (Replication makes copy of table and places it in different location)
    Data Analytics(Subset of BI functionality)
    • Encompasses a wide range of mathematical, statistical, and modeling techniques
    • Purpose of extracting knowledge from data
    • Tools can be grouped into two separate areas: #

Explanatory analytics & Predictive analytics

Data Mining

Online analytical processing OLAP
Multidimensional analysis of business data, provides the ability for complex calculations, trend analysis, and sophisticated data modeling.
Three main characteristics:


  1. Multidimensional data analysis techniques
    Data are processed and viewed as part of a multidimensional structure
    Augmented by the following functions:

    1) Advanced data presentation functions
    2) Advanced data aggregation, consolidation, and classification functions
    3) Advanced computational functions
    4) Advanced data modeling functions

  2. Advanced database support
  3. Easy-to-use end-user interfaces
    OLAP Architecture

Three main architectural components:
1. Graphical user interface (GUI)
2. Analytical processing logic
3. Data-processing logic
Designed to use both operational and data warehouse data
In most implementations, data warehouse and OLAP are interrelated and complementary
OLAP systems merge data warehouse and data mart approaches
Relational OLAP

- Multidimensional data schema support within the RDBMS
- Data access language and query performance optimized for multidimensional data
- Support for very large databases (VLDBs)
SQL Extensions for OLAP
ROLLUP
- Used with GROUP BY clause to generate aggregates by different dimensions
- GROUP BY generates only one aggregate for each new value combination of attributes
- ROLLUP extension enables subtotal for each column listed except for the last one
- Last column gets grand total
- Order of column list important

  1. SELECT column1, colun2,
  2. FROM table1, table 2
  3. WHERE condition
  4. GROUP BY ROLLUP(column1, column2…)
  5. HAVING condition
  6. ORDER BY column1, column2

CUBE

  1. SELECT column1, colun2,
  2. FROM table1, table 2
  3. WHERE condition
  4. GROUP BY CUBE(column1, column2…)
  5. HAVING condition
  6. ORDER BY column1, column2

GROUPING

  1. RANKING (rank, percent_rank, row_number)
  2. WINDOWING (cumulative aggregate functions, moving averages, lag, lead)

STATISTICAL

Materialized Views: A dynamic table that contains SQL query command to generate rows(Also contains the actual rows)
- Created the first time query is run and summary rows are stored in table
- Automatically updated when base tables are updated

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注