程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 數據庫知識 >> MYSQL數據庫 >> MySQL綜合教程 >> 千萬級別mysql合並表快速去重簡析

千萬級別mysql合並表快速去重簡析

編輯:MySQL綜合教程

千萬級別mysql合並表快速去重簡析   mysql合並表去重 目標: 現有表a和b,把兩個表中的數據合並去重到c表中。其中a和b表中數據量大概在2千萬左右。 基本情況 操作系統版本:CentOS release 5.6 64位 操作系統內存:8G 數據庫版本:5.1.56-community 64位 數據庫初始化參數:默認   數據庫表和數據量 表a:  www.2cto.com   mysql> desc a2kw; +-------+-------------+------+-----+---------+-------+ | Field | Type        | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | c1    | varchar(20) | YES  | MUL | NULL    |       | | c2    | varchar(30) | YES  |     | NULL    |       | | c3    | varchar(12) | YES  |     | NULL    |       | | c4    | varchar(20) | YES  |     | NULL    |       | +-------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) 表b mysql> desc b2kw; +-------+-------------+------+-----+---------+-------+ | Field | Type        | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | c1    | varchar(20) | YES  |     | NULL    |       | | c2    | varchar(30) | YES  |     | NULL    |       | | c3    | varchar(12) | YES  |     | NULL    |       | | c4    | varchar(20) | YES  |     | NULL    |       | +-------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec)   a和b表的數據概況如下 mysql> select * from a2kw limit 10; +-----------+-----------+------+----------+ | c1        | c2        | c3   | c4       | +-----------+-----------+------+----------+ | 662164461 | 131545534 | TOM0 | 20120520 | | 226662142 | 605685564 | TOM0 | 20120516 | | 527008225 | 172557633 | TOM0 | 20120514 | | 574408183 | 350897450 | TOM0 | 20120510 | | 781619324 | 583989494 | TOM0 | 20120510 | | 158872754 | 775676430 | TOM0 | 20120512 | | 815875622 | 631631832 | TOM0 | 20120514 | | 905943640 | 477433083 | TOM0 | 20120514 | | 660790641 | 616774715 | TOM0 | 20120512 | | 999083595 | 953186525 | TOM0 | 20120513 | +-----------+-----------+------+----------+ 10 rows in set (0.01 sec)   基本步驟  www.2cto.com   1、在B表上創建索引 mysql> select count(*) from b2kw; +----------+ | count(*) | +----------+ | 20000002 | +----------+ 1 row in set (0.00 sec) mysql> create index ind_b2kw_c1 on  b2kw(c1); Query OK, 20000002 rows affected (1 min 2.94 sec) Records: 20000002  Duplicates: 0  Warnings: 0 數據量為:20000002 ,時間為:1 min 2.94 sec 2、把a、b分別插入中間表temp表中   創建中間表 mysql> create table temp  select * from c2kw where 1=2; Query OK, 0 rows affected (0.00 sec) Records: 0  Duplicates: 0  Warnings: 0 插入數據 mysql> insert into temp  select * from a2kw; Query OK, 20000002 rows affected (13.23 sec) Records: 20000002  Duplicates: 0  Warnings: 0 mysql> insert into temp  select * from b2kw; Query OK, 20000002 rows affected (13.27 sec) Records: 20000002  Duplicates: 0  Warnings: 0    www.2cto.com   mysql> select count(*) from temp; +----------+ | count(*) | +----------+ | 40000004 | +----------+ 1 row in set (0.00 sec) 數據量為:40000004 ,時間為:26.50 sec 3、temp建立聯合索引,強制索引去掉重復數據 mysql> create index ind_temp_c123 on temp(c1,c2,c3); Query OK, 40000004 rows affected (3 min 43.87 sec) Records: 40000004  Duplicates: 0  Warnings: 0 查看執行計劃 mysql> explain select c1,c2,c3,max(c4) from temp FORCE INDEX
(ind_temp_c123) group by c1,c2,c3 ; +----+-------------+-------+-------+---------------+----------
-----+---------+------+----------+-------+ | id | select_type | table | type  | possible_keys | key      
    | key_len | ref  | rows     | Extra | +----+-------------+-------+-------+---------------+-------------
--+---------+------+----------+-------+ |  1 | SIMPLE      | temp  | index | NULL          | ind_temp_c123 | 71  
   | NULL | 40000004 |       | +----+-------------+-------+  www.2cto.com  -------+---------------+--------
-------+---------+------+----------+-------+ 1 row in set (0.05 sec)   mysql> insert into c2kw select c1,c2,c3,max(c4) from temp
FORCE INDEX (ind_temp_c123) group by c1,c2,c3 ; Query OK, 20000004 rows affected (2 min 0.85 sec) Records: 20000004  Duplicates: 0  Warnings: 0 實際大約花費實際為:6 min
  4、刪除中間表 mysql> drop table temp; Query OK, 0 rows affected (0.99 sec) 實際大約花費實際為:1 sec
  5、建立c索引 mysql> create index ind_c2kw_c1 on c2kw(c1); Query OK, 20000004 rows affected (49.74 sec) Records: 20000004  Duplicates: 0  Warnings: 0 mysql> create index ind_c2kw_c2 on c2kw(c2); Query OK, 20000004 rows affected (1 min 47.20 sec) Records: 20000004  Duplicates: 0  Warnings: 0 mysql> create index ind_c2kw_c3 on c2kw(c3); Query OK, 20000004 rows affected (2 min 42.02 sec) Records: 20000004  Duplicates: 0  Warnings: 0 實際大約花費實際為:5分鐘  www.2cto.com  
  6、清空a、b表 mysql> truncate table a2kw; Query OK, 0 rows affected (1.15 sec) mysql> truncate table b2kw; Query OK, 0 rows affected (1.34 sec) 實際大約花費實際為:3sec   一共花費的時間大概在15分鐘左右       作者 RuleV5

  1. 上一頁:
  2. 下一頁:
Copyright © 程式師世界 All Rights Reserved