Tuesday, May 8, 2012

Histograms for 500 Alex



Histograms are a neat little gem in the world of Oracle.  They are a great tool that the Optimizer uses with Oracle statistics.

Exert from PSOUG:
Oracle uses two types of histograms for column statistics: height-balanced histograms and frequency histograms.
The type of histogram is stored in the HISTOGRAM column of the *TAB_COL_STATISTICS views (USER and DBA). This column can have values of HEIGHT BALANCED, FREQUENCY, or NONE.In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall. In a frequency histogram, each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of occurrences of that single value. Frequency histograms are automatically created instead of height-balanced histograms when the number of distinct values is less than or equal to the number of histogram buckets specified. Frequency histograms can be viewed using the *TAB_HISTOGRAMS views.
There are a handful of views to take a look at histograms as specified above.  Here are some queries I found useful:

Checking to see if a column has a histogram:
SELECT OWNER, TABLE_NAME, COLUMN_NAME, HISTOGRAM, LAST_ANALYZED  FROM all_tab_col_statistics WHERE OWNER LIKE upper('%&owner%')   AND TABLE_NAME LIKE upper('%&table_name%')   AND COLUMN_NAME LIKE UPPER('%&column_name%') ORDER BY 1,2,3

Viewing the frequency histogram data:

BEGIN
  DBMS_STATS.GATHER_table_STATS (OWNNAME => 'OE', TABNAME => 'INVENTORIES', 
  METHOD_OPT => 'FOR COLUMNS SIZE 20 warehouse_id');
END;
/

SELECT column_name, num_distinct, num_buckets, histogram 
  FROM USER_TAB_COL_STATISTICS
 WHERE table_name = 'INVENTORIES' AND column_name = 'WAREHOUSE_ID';

COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
WAREHOUSE_ID                              9           9 FREQUENCY

SELECT endpoint_number, endpoint_value 
  FROM USER_HISTOGRAMS
 WHERE table_name = 'INVENTORIES' and column_name = 'WAREHOUSE_ID'
  ORDER BY endpoint_number;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             36              1
            213              2
            261              3
            370              4
            484              5
            692              6
            798              7
            984              8

BEGIN
  DBMS_STATS.GATHER_table_STATS (OWNNAME => 'OE', TABNAME => 'INVENTORIES', 
  METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand');
END;

SELECT column_name, num_distinct, num_buckets, histogram 
  FROM USER_TAB_COL_STATISTICS
 WHERE table_name = 'INVENTORIES' AND column_name = 'QUANTITY_ON_HAND';

COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
QUANTITY_ON_HAND                        237          10 HEIGHT BALANCED

SELECT endpoint_number, endpoint_value 
  FROM USER_HISTOGRAMS
 WHERE table_name = 'INVENTORIES' and column_name = 'QUANTITY_ON_HAND'
  ORDER BY endpoint_number;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              0              0
              1             27
              2             42
              3             57
              4             74
              5             98
              6            123
              7            149
              8            175
              9            202 


Basically what the histogram tells you is how frequent a given value is.  Based on its findings it will figure out if it should use the index that you have on the column or if it would be better to do the FTS.

How a Histogram could mess the optimizer up:


Histograms work best with literal variables (select * from test_table where some_ind = 'Y';).  If you try using bind variables (select * from test_table where  some_ind  = :bindx;) Oracle should bind peak to check what they are, pick the plan based on those binds.  What could be bad is if you have a table with a indicator column that 99% is Y and 1% N.  If the bind peak happens on the N its more likely to use an index because that makes sense.  However the next time that is ran and is not hard parsed it will use the same plan even if the bind is Y, where a full table scan makes more sense there.

The default setting for anything 10g is setting histogram collection to auto.  This is getting better with each release but they are still working on it.  You can always collect a histogram on specific columns by hand if you want.

I could go on forever about histograms, but I think this is a good summary.


:::ROLE THE CREDITS:::
Sources:
Ask Tom Article
PSOUG Article
Random Blog that has good facts
- Oracle Documentation
- Rwave solution presentation

No comments:

Post a Comment