@Sarah
2015-12-05T22:27:35.000000Z
字数 10398
阅读 771
SQL
Database Systems:
Design, Implementation, and Management
Tenth Edition
Chapter 13
Database Performance Tuning and Query Optimization
Objectives
About the relationship and differences between operational data and decision support data
About online analytical processing (OLAP)
How SQL extensions are used to support OLAP-type data manipulations
What are some of the biggest data challenges facing CIOs today?
Data
One of the biggest problems facing CIOs today is what to do with the growing amount of data they collect every day.
Big data is the new definitive source of competitive advantage across all industries.
Facts
YouTube users upload 48 hours of new video every minute of the day.
571 new websites are created every minute of the day.
The amount of data in the world today is equal to:
Every person in the US tweeting three tweets per minute for 26,976 years.
Every person in the world having more than 215m high-resolution MRI scans a day.
More than 200bn HD movies – which would take a person 47m years to watch.
Problems
Too few people with deep analytical skills to fill the demand of Big Data jobs in the U.S. by 2018.
Poor data can cost businesses 20%–35% of their operating revenue.
According to execs, the influx of data is putting a strain on IT infrastructure. 55 percent of respondents reporting a slowdown of IT systems and 47 percent citing data security problems.
Opportunities
Big data is a top business priority and drives enormous opportunity for business improvement.
Market research firm IDC has released a new forecast that shows the big data market is expected to grow from 16.9 billion in 2015.
Big data is the new definitive source of competitive advantage across all industries
Data
What is big data?
Velocity, variety and volume
How much data is there?
Where is it all stored?
Why is it important?
Can we handle all this data?
Who owns it all?
How much Data?
How much data do you produce every day?
2.5 quintillion every day
http://mashable.com/2012/06/22/data-created-every-minute/
How much data?
Databases
The 10 Largest Databases in the world (link)
How do you define Business Intelligence?
Business Intelligence
Business intelligence (BI) is an umbrella term that includes the
applications, infrastructure and tools, and best practices
enable access to and analysis of information to improve and optimize decisions and performance. (Gartner)
Comprehensive, cohesive, integrated tools and processes
Capture, collect, integrate, store, and analyze data
Generate information to support business decision making
Framework that allows a business to transform:
Data into information
Information into knowledge
Knowledge into wisdom
Business Intelligence
Comprehensive, cohesive, integrated tools and processes
Capture, collect, integrate, store, and analyze data
Generate information to support business decision making
Framework that allows a business to transform:
Data into information
Information into knowledge
Knowledge into wisdom
Business Intelligence Architecture
Composed of data, people, processes, technology, and management of components
Focuses on strategic and tactical use of information
Key performance indicators (KPI)
Measurements that assess company’s effectiveness or success in reaching goals
Multiple tools from different vendors can be integrated into a single BI framework
Business intelligence framework
Big Data
Unlocking the value of big data
The value of big data comes from the knowledge gained from it and what you do with it. Organizations are in need of solutions that combine both technology and business processes so they can take full advantage of their information. It can’t just be about content management, or the customer experience or information exchange. To get above the hype, CIOs must take a holistic approach to setting their big data strategy, aiming to manage and leverage all relevant information across the enterprise.
Business Intelligence Benefits
Main goal: improved decision making
Other benefits
Integrating architecture
Common user interface for data reporting and analysis
Common data repository fosters single version of company data
Improved organizational performance
Business Intelligence Technology Trends
Data storage improvements
Business intelligence appliances
Business intelligence as a service
Big Data analytics
Personal analytics
Operational Data versus Analytical Data
Why extract operational data for BI processing?
Security and control
Operational not structured for BI analysis
BI analysis degrades operational server performance
BI effectiveness depends on quality of data gathered at operational level
Operational data seldom well-suited for decision support tasks
Need reformat data in order to be useful for business intelligence
Operational Data vs.
Decision Support Data
Operational data
Mostly stored in relational database
Optimized to support transactions representing daily operations
Decision support data differs from operational data in three main areas:
Time span
Granularity
Dimensionality
Decision Support:
Database Requirements
Specialized DBMS tailored to provide fast answers to complex queries
Three main requirements
Database schema:
Aggregated and summarized data
Queries extract multidimensional time slices
Data extraction and loading
Supports different data sources (Flat files, Hierarchical, network, and relational databases)
Check for inconsistent data
Database size
DBMS must support very large databases (VLDBs)
The Data Warehouse
Integrated, subject-oriented, time-variant, and nonvolatile collection of data
Provides support for decision making
Usually a read-only database optimized for data analysis and query processing
Requires time, money, and considerable managerial effort to create
Data Marts
Small, single-subject data warehouse subset
More manageable data set than data warehouse
Provides decision support to small group of people
Typically lower cost and lower implementation time than data warehouse
Twelve Rules That Define
a Data Warehouse
Star Schemas
Data-modeling technique
Maps multidimensional decision support data into relational database
Creates near equivalent of multidimensional database schema from relational data
Easily implemented model for multidimensional data analysis while preserving relational structures
Four components: facts, dimensions, attributes, and attribute hierarchies
Star Schema
Star Schemas
FACTS
Numeric measurements that represent specific business aspect or activity
Fact table contains facts linked through their dimensions
Metrics are facts computed at run time
DIMENSIONS
Qualifying characteristics provide additional perspectives to a given fact
Decision support data almost always viewed in relation to other data
Study facts via dimensions
Dimensions stored in dimension tables
Attributes
Use to search, filter, and classify facts
Dimensions provide descriptions of facts through their attributes
No mathematical limit to the number of dimensions
Slice and dice: focus on slices of the data cube for more detailed analysis
Attribute Hierarchies
Provide top-down data organization
Two purposes:
Aggregation
Drill-down/roll-up data analysis
Determine how the data are extracted and represented
Stored in the DBMS’s data dictionary
Used by OLAP tool to access warehouse properly
Star Schema Representation
Many fact rows related to each dimension row
Primary key of fact table is a composite primary key
Fact table primary key formed by combining foreign keys pointing to dimension tables
Dimension tables are smaller than fact tables
Each dimension record is related to thousands of fact records
Performance-Improving Techniques for the Star Schema
Four techniques to optimize data warehouse design:
Normalizing dimensional tables
Maintaining multiple fact tables to represent different aggregation levels
Denormalizing fact tables
Partitioning and replicating tables
Dimension tables normalized to:
Achieve semantic simplicity
Facilitate end-user navigation through the dimensions
Denormalizing fact tables improves data access performance and saves data storage space
Partitioning splits table into subsets of rows or columns
Replication makes copy of table and places it in different location
Data Analytics
Subset of BI functionality
Encompasses a wide range of mathematical, statistical, and modeling techniques
Purpose of extracting knowledge from data
Tools can be grouped into two separate areas:
Explanatory analytics
Predictive analytics
Data Mining
Data-mining tools do the following:
Analyze data
Uncover problems or opportunities hidden in data relationships
Form computer models based on their findings
Use models to predict business behavior
Runs in two modes
Guided
Automated
Predictive Analytics
Employs mathematical and statistical algorithms, neural networks, artificial intelligence, and other advanced modeling tools
Create actionable predictive models based on available data
Models are used in areas such as:
Customer relationships, customer service, customer retention, fraud detection, targeted marketing, and optimized pricing
Online Analytical Processing (OLAP)
Multidimensional analysis of business data, provides the ability for complex calculations, trend analysis, and sophisticated data modeling.
Three main characteristics:
Multidimensional data analysis techniques
Advanced database support
Easy-to-use end-user interfaces
Multidimensional Data Analysis Techniques
Data are processed and viewed as part of a multidimensional structure
Augmented by the following functions:
Advanced data presentation functions
Advanced data aggregation, consolidation, and classification functions
Advanced computational functions
Advanced data modeling functions
Advanced Database Support
Advanced data access features include:
Access to many different kinds of DBMSs, flat files, and internal and external data sources
Access to aggregated data warehouse data
Advanced data navigation
Rapid and consistent query response times
Maps end-user requests to appropriate data source and to proper data access language
Support for very large databases
OLAP Architecture
Three main architectural components:
Graphical user interface (GUI)
Analytical processing logic
Data-processing logic
Designed to use both operational and data warehouse data
In most implementations, data warehouse and OLAP are interrelated and complementary
OLAP systems merge data warehouse and data mart approaches
Relational OLAP
Relational online analytical processing (ROLAP) provides the following extensions:
Multidimensional data schema support within the RDBMS
Data access language and query performance optimized for multidimensional data
Support for very large databases (VLDBs)
SQL Extensions for OLAP
Proliferation of OLAP tools fostered development of SQL extensions
Many innovations have become part of standard SQL
All SQL commands will work in data warehouse as expected
Most queries include many data groupings and aggregations over multiple columns
SQL Extensions
ROLLUP
CUBE
GROUPING
RANKING (rank, percent_rank, row_number)
WINDOWING (cumulative aggregate functions, moving averages, lag, lead)
STATISTICAL
The ROLLUP Extension
Used with GROUP BY clause to generate aggregates by different dimensions
GROUP BY generates only one aggregate for each new value combination of attributes
ROLLUP extension enables subtotal for each column listed except for the last one
Last column gets grand total
Order of column list important
The CUBE Extension
CUBE extension used with GROUP BY clause to generate aggregates by listed columns
Includes the last column
Enables subtotal for each column in addition to grand total for last column
Useful when you want to compute all possible subtotals within groupings
Cross-tabulations are good candidates for application of CUBE extension
Materialized Views
A dynamic table that contains SQL query command to generate rows
Also contains the actual rows
Created the first time query is run and summary rows are stored in table
Automatically updated when base tables are updated
=== OLAP