[关闭]
@Sarah 2015-12-05T23:45:08.000000Z 字数 5559 阅读 899

data CH13 summary

SQL


What is big data? Velocity, variety and volume

Business intelligence is a term used to describe a comprehensive, cohesive and integrated set of tools and processes used to support business decision making.

Data > information > knowledge > wisdom

Collect & store operational data > aggregate operational data into decision support data > analyze decision support data to generate information >…

Traditional info system focus on operational automation and reporting

BI focus on strategic and tactical use of information

Business intelligence architecture is composed of data, people, processes, technology and the management of such components. Main goal: improved decision making

Operational data: mostly stored in relational db, optimized to support transactions representing daily operations

Operational data vs analytical data
- Extract operational data for BI processing.
- BI effectiveness depends on quality of data gathered at operational level
- Operational data seldom well-suited for decision support tasks
- Need reformat data in order to be useful for business intelligence

Operational vs Decision support data:

Requirements for a decision support DBMS:

  1. Db schema: aggregated and summarized data; queries extract multidimensional time slices
  2. Data extraction and loading: supports different data sources; check for inconsistent data
  3. Db size: dbms must support very large db
  4. End user analytical interface

Data warehouse:

Data marts

Star Schemas

Performance-Improving Techniques for the Star Schema

Four techniques to optimize data warehouse design:

  1. Normalizing dimensional tables(Achieve semantic simplicity &Facilitate end-user navigation through the dimensions)
  2. Maintaining multiple fact tables to represent different aggregation levels
  3. Denormalizing fact tables (improves data access performance and saves data storage space)
  4. Partitioning and replicating tables (Replication makes copy of table and places it in different location)
    Data Analytics(Subset of BI functionality)
    • Encompasses a wide range of mathematical, statistical, and modeling techniques
    • Purpose of extracting knowledge from data
    • Tools can be grouped into two separate areas: #

Explanatory analytics & Predictive analytics

Data Mining

Online analytical processing OLAP
Multidimensional analysis of business data, provides the ability for complex calculations, trend analysis, and sophisticated data modeling.
Three main characteristics:


  1. Multidimensional data analysis techniques
    Data are processed and viewed as part of a multidimensional structure
    Augmented by the following functions:

    1) Advanced data presentation functions
    2) Advanced data aggregation, consolidation, and classification functions
    3) Advanced computational functions
    4) Advanced data modeling functions

  2. Advanced database support
  3. Easy-to-use end-user interfaces
    OLAP Architecture

Three main architectural components:
1. Graphical user interface (GUI)
2. Analytical processing logic
3. Data-processing logic
Designed to use both operational and data warehouse data
In most implementations, data warehouse and OLAP are interrelated and complementary
OLAP systems merge data warehouse and data mart approaches
Relational OLAP

- Multidimensional data schema support within the RDBMS
- Data access language and query performance optimized for multidimensional data
- Support for very large databases (VLDBs)
SQL Extensions for OLAP
ROLLUP
- Used with GROUP BY clause to generate aggregates by different dimensions
- GROUP BY generates only one aggregate for each new value combination of attributes
- ROLLUP extension enables subtotal for each column listed except for the last one
- Last column gets grand total
- Order of column list important

  1. SELECT column1, colun2,
  2. FROM table1, table 2
  3. WHERE condition
  4. GROUP BY ROLLUP(column1, column2…)
  5. HAVING condition
  6. ORDER BY column1, column2

CUBE

  1. SELECT column1, colun2,
  2. FROM table1, table 2
  3. WHERE condition
  4. GROUP BY CUBE(column1, column2…)
  5. HAVING condition
  6. ORDER BY column1, column2

GROUPING

  1. RANKING (rank, percent_rank, row_number)
  2. WINDOWING (cumulative aggregate functions, moving averages, lag, lead)

STATISTICAL

Materialized Views: A dynamic table that contains SQL query command to generate rows(Also contains the actual rows)
- Created the first time query is run and summary rows are stored in table
- Automatically updated when base tables are updated

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注