Как SQL-сервер разрабатывает предполагаемое количество строк?

Проверка:

[NotNull]
public string Property1 { get; set; }

[Length(Min = 10, Max = 20)]
public string Property2 { get; set; }

[Regex(Expression = @"[abc]{2}")]
public string Property3 { get; set; }
5
задан Justin 25 September 2009 в 11:23
поделиться

3 ответа

SQL Server splits each index into up to 200 ranges with the following data (from here):

  • RANGE_HI_KEY

    A key value showing the upper boundary of a histogram step.

  • RANGE_ROWS

    Specifies how many rows are inside the range (they are smaller than this RANGE_HI_KEY, but bigger than the previous smaller RANGE_HI_KEY).

  • EQ_ROWS

    Specifies how many rows are exactly equal to RANGE_HI_KEY.

  • AVG_RANGE_ROWS

    Average number of rows per distinct value inside the range.

  • DISTINCT_RANGE_ROWS

    Specifies how many distinct key values are inside this range (not including the previous key before RANGE_HI_KEY and RANGE_HI_KEY itself);

Usually, most populated values go into RANGE_HI_KEY.

However, they can get into the range and this can lead to the skew in distribution.

Imagine these data (among the others):

Key value Count of rows

1          1
2          1
3          10000
4          1

SQL Server usually builds two ranges: 1 to 3 and 4 to the next populated value, which makes these statistics:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
3             2           10000    1               2

, which means the when searching for, say, 2, there is but 1 row and it's better to use the index access.

But if 3 goes inside the range, the statistics are these:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
4             10002       1        3334            3

The optimizer thinks there are 3334 rows for the key 2 and index access is too expensive.

8
ответ дан 13 December 2019 в 19:30
поделиться

It uses statistics, which it keeps for each index.

(You can also create statistics on non-indexed columns)

To update all your statistics on every table in a Database (WARNING: will take some time on very large databases. Don't do this on Production servers without checking with your DBA...):

exec sp_msforeachtable 'UPDATE STATISTICS ?'

If you don't have a regular scheduled job to rebuild your most active indexes (i.e. lots of INSERTS or DELETES), you should consider rebuilding your indexes (same caveat as above applies):

exec sp_msforeachtable "DBCC DBREINDEX('?')"
3
ответ дан 13 December 2019 в 19:30
поделиться

Since you already updated the statistics, I'd try to eliminate any parameter sniffing:

CREATE PROCEDURE xyz
(
    @param1 int
    ,@param2 varchar(10)

)AS

DECLARE @param_1 int
       ,@param_2 varchar(10)

SELECT @param_1=@param1
      ,@param_2=@param2

...complex query here....
...WHERE column1=@param_1 AND column2=@param_2....

go
0
ответ дан 13 December 2019 в 19:30
поделиться
Другие вопросы по тегам:

Похожие вопросы: