程式師世界 >> 數據庫知識 >> MYSQL數據庫 >> MySQL綜合教程 >> MySQL中distinct語句的根本道理及其與group by的比擬

MySQL中distinct語句的根本道理及其與group by的比擬

編輯：MySQL綜合教程

MySQL中distinct語句的根本道理及其與group by的比擬。本站提示廣大學習愛好者：（MySQL中distinct語句的根本道理及其與group by的比擬）文章只能為提供參考，不一定能成為您想要的結果。以下是MySQL中distinct語句的根本道理及其與group by的比擬正文

DISTINCT 現實上和 GROUP BY 操作的完成異常類似，只不外是在 GROUP BY 以後的每組中只掏出一筆記錄罷了。所以，DISTINCT 的完成和 GROUP BY 的完成也根本差不多，沒有太年夜的差別。異樣可以經由過程松懈索引掃描或許是緊湊索引掃描來完成，固然，在沒法僅僅應用索引即能完成 DISTINCT 的時刻，MySQL 只能經由過程暫時表來完成。然則，和 GROUP BY 有一點差異的是，DISTINCT 其實不須要停止排序。也就是說，在僅僅只是 DISTINCT 操作的 Query 假如沒法僅僅應用索引完成操作的時刻，MySQL 會應用暫時表來做一次數據的“緩存”，然則不會對暫時表中的數據停止 filesort 操作。固然，假如我們在停止 DISTINCT 的時刻還應用了 GROUP BY 並停止了分組，並應用了相似於 MAX 之類的聚合函數操作，就沒法防止 filesort 了。

上面我們就經由過程幾個簡略的 Query 示例來展現一下 DISTINCT 的完成。

1.起首看看經由過程松懈索引掃描完成 DISTINCT 的操作：

sky@localhost : example 11:03:41> EXPLAIN SELECT DISTINCT group_id 
  -> FROM group_messageG

*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: range
possible_keys: NULL
     key: idx_gid_uid_gc
   key_len: 4
     ref: NULL
     rows: 10
    Extra: Using index for group-by
1 row in set (0.00 sec)

我們可以很清楚的看到，履行籌劃中的 Extra 信息為“Using index for group-by”，這代表甚麼意思？為何我沒有停止 GROUP BY 操作的時刻，履行籌劃中會告知我這裡經由過程索引停止了 GROUP BY 呢？其實這就是於 DISTINCT 的完成道理相干的，在完成 DISTINCT的進程中，異樣也是須要分組的，然後再從每組數據中掏出一條前往給客戶端。而這裡的 Extra 信息就告知我們，MySQL 應用松懈索引掃描就完成了全部操作。固然，假如 MySQL Query Optimizer 如果可以或許做的再人道化一點將這裡的信息換成“Using index for distinct”那就更好更輕易讓人懂得了，呵呵。

2.我們再來看看經由過程緊湊索引掃描的示例：

sky@localhost : example 11:03:53> EXPLAIN SELECT DISTINCT user_id 
  -> FROM group_message
  -> WHERE group_id = 2G

*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: ref
possible_keys: idx_gid_uid_gc
     key: idx_gid_uid_gc
   key_len: 4
     ref: const
     rows: 4
    Extra: Using WHERE; Using index
1 row in set (0.00 sec)

這裡的顯示和經由過程緊湊索引掃描完成 GROUP BY 也完整一樣。現實上，這個 Query 的完成進程中，MySQL 會讓存儲引擎掃描 group_id = 2 的一切索引鍵，得出一切的 user_id，然後應用索引的已排序特征，每改換一個 user_id 的索引鍵值的時刻保存一條信息，便可在掃描完一切 gruop_id = 2 的索引鍵的時刻完成全部 DISTINCT 操作。

3.上面我們在看看沒法零丁應用索引便可完成 DISTINCT 的時刻會是如何：

sky@localhost : example 11:04:40> EXPLAIN SELECT DISTINCT user_id 
  -> FROM group_message
  -> WHERE group_id > 1 AND group_id < 10G

*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: range
possible_keys: idx_gid_uid_gc
     key: idx_gid_uid_gc
   key_len: 4
     ref: NULL
     rows: 32
    Extra: Using WHERE; Using index; Using temporary
1 row in set (0.00 sec)

當 MySQL 沒法僅僅依附索引便可完成 DISTINCT 操作的時刻，就不能不應用暫時表來停止響應的操作了。然則我們可以看到，在 MySQL 應用暫時表來完成 DISTINCT 的時刻，和處置 GROUP BY 有一點差別，就是少了 filesort。現實上，在 MySQL 的分組算法中，其實不必定非要排序能力完成份組操作的，這一點在下面的 GROUP BY 優化小技能中我曾經提到過了。現實上這裡 MySQL 恰是在沒有排序的情形下完成分組最初完成 DISTINCT 操作的，所以少了 filesort 這個排序操作。

4.最初再和 GROUP BY 聯合嘗嘗看：

sky@localhost : example 11:05:06> EXPLAIN SELECT DISTINCT max(user_id) 
  -> FROM group_message
  -> WHERE group_id > 1 AND group_id < 10
  -> GROUP BY group_idG

*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: range
possible_keys: idx_gid_uid_gc
     key: idx_gid_uid_gc
   key_len: 4
     ref: NULL
     rows: 32
    Extra: Using WHERE; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

最初我們再看一下這個和 GROUP BY 一路應用帶有聚合函數的示例，和下面第三個示例比擬，可以看到曾經多了 filesort 排序操作了，恰是由於我們應用了 MAX 函數的原因。要獲得分組後的 MAX 值，又沒法應用索引完成操作，只能經由過程排序才行了。

mysql distinct和group by誰更好
1，測試前的預備

//預備一張測試表 
mysql> CREATE TABLE `test_test` ( 
 ->  `id` int(11) NOT NULL auto_increment, 
 ->  `num` int(11) NOT NULL default '0', 
 ->  PRIMARY KEY (`id`) 
 -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

Query OK, 0 rows affected (0.05 sec)

mysql> delimiter || //轉變mysql敕令停止符為|| 
 
//建個貯存進程向表中拔出10W條數據 
mysql> create procedure p_test(pa int(11)) 
 -> begin 
 -> 
 -> declare max_num int(11) default 100000; 
 -> declare i int default 0; 
 -> declare rand_num int; 
 -> 
 -> select count(id) into max_num from test_test; 
 -> 
 -> while i < pa do 
 ->     if max_num < 100000 then 
 ->         select cast(rand()*100 as unsigned) into rand_num; 
 ->         insert into test_test(num)values(rand_num); 
 ->     end if; 
 ->     set i = i +1; 
 -> end while; 
 -> end||

Query OK, 0 rows affected (0.00 sec)

mysql> call p_test(100000)||

Query OK, 1 row affected (5.66 sec)

mysql> delimiter ;//轉變mysql敕令停止符為; 
mysql> select count(id) from test_test; //數據都出來了

+-----------+ 
| count(id) | 
+-----------+ 
|  100000 | 
+-----------+ 
1 row in set (0.00 sec)

mysql> show variables like "%pro%";  //檢查一下，記載履行的profiling是否是開啟動了，默許是不開啟的

+---------------------------+-------+ 
| Variable_name       | Value | 
+---------------------------+-------+ 
| profiling         | OFF  | 
| profiling_history_size  | 15  | 
| protocol_version     | 10  | 
| slave_compressed_protocol | OFF  | 
+---------------------------+-------+ 
4 rows in set (0.00 sec)

mysql> set profiling=1;      //開啟

Query OK, 0 rows affected (0.00 sec)

2，測試

//做了4組測試 
mysql> select distinct(num) from test_test; 
mysql> select num from test_test group by num; 
 
mysql> show profiles;  //檢查成果

+----------+------------+-------------------------------------------+ 
| Query_ID | Duration  | Query                   | 
+----------+------------+-------------------------------------------+ 
|    1 | 0.07298225 | select distinct(num) from test_test    | 
|    2 | 0.07319975 | select num from test_test group by num  | 
|    3 | 0.07313525 | select num from test_test group by num  | 
|    4 | 0.07317725 | select distinct(num) from test_test    | 
|    5 | 0.07275200 | select distinct(num) from test_test    | 
|    6 | 0.07298600 | select num from test_test group by num  | 
|    7 | 0.07500700 | select num from test_test group by num  | 
|    8 | 0.07331325 | select distinct(num) from test_test    | 
|    9 | 0.57831575 | create index num_index on test_test (num) | //在這兒的時刻，我加了索引 
|    10 | 0.00243550 | select distinct(num) from test_test    | 
|    11 | 0.00121975 | select num from test_test group by num  | 
|    12 | 0.00116550 | select distinct(num) from test_test    | 
|    13 | 0.00107650 | select num from test_test group by num  | 
+----------+------------+-------------------------------------------+ 
13 rows in set (0.00 sec)

下面的1-8是4組數據，而且是沒有加索引的，從中我們可以看出，distinct比group by 會好一點點
10-13是2組數據，是加了索引今後的，從中我們可以看出，group by 比distinct 會好一點點
普通情形，數據量比擬年夜的表，聯系關系字段都邑加索引的，，而且加索引後檢索時光只要之前的六分之一閣下。