程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 數據庫知識 >> MYSQL數據庫 >> MySQL綜合教程 >> 使用sqoop將mysql數據導入到hadoop

使用sqoop將mysql數據導入到hadoop

編輯:MySQL綜合教程

使用sqoop將mysql數據導入到hadoop


hadoop的安裝配置這裡就不講了。

Sqoop的安裝也很簡單。 完成sqoop的安裝後,可以這樣測試是否可以連接到mysql(注意:mysql的jar包要放到 SQOOP_HOME/lib 下): sqoop list-databases --connect jdbc:mysql://192.168.1.109:3306/ --username root --password 19891231 結果如下 \ 即說明sqoop已經可以正常使用了。 下面,要將mysql中的數據導入到hadoop中。 我准備的是一個300萬條數據的身份證數據表: \ 先啟動hive(使用命令行:hive 即可啟動) 然後使用sqoop導入數據到hive: sqoop import --connect jdbc:mysql://192.168.1.109:3306/hadoop --username root --password 19891231 --table test_sfz --hive-import sqoop 會啟動job來完成導入工作。 \ \ 完成導入用了2分20秒,還是不錯的。 在hive中可以看到剛剛導入的數據表: \ 我們來一句sql測試一下數據: select * from test_sfz where id < 10; \ 可以看到,hive完成這個任務用了將近25秒,確實是挺慢的(在mysql中幾乎是不費時間),但是要考慮到hive是創建了job在hadoop中跑,時間當然多。
接下來,我們會對這些數據進行復雜查詢的測試: 我機子的配置如下: \ hadoop 是運行在虛擬機上的偽分布式,虛擬機OS是ubuntu12.04 64位,配置如下: \

TEST 1 計算平均年齡

測試數據:300.8 W 1. 計算廣東的平均年齡 mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%'; 用時: 0.877s hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%'; 用時:25.012s 2. 對每個城市的的平均年齡進行從高到低的排序 mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc; 用時:2.949s hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc; 用時:51.29s 可以看到,在耗時上面,hive的增長速度較mysql慢。

TEST 2

測試數據:1200W mysql 引擎: MyISAM(為了加快查詢速度) 導入到hive: \ 1. 計算廣東的平均年齡 mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%'; 用時: 5.642s hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%'; 用時:168.259s 2. 對每個城市的的平均年齡進行從高到低的排序 mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc; 用時:11.964s hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc; 用時:311.714s

TEST 3

測試數據:2000W mysql 引擎: MyISAM(為了加快查詢速度) 導入到hive: \ (這次用的時間很短!可能是因為TEST2中的導入時,我的主機在做其他耗資源的工作..) 1. 計算廣東的平均年齡 mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%'; 用時: 6.605s hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%'; 用時:188.206s 2. 對每個城市的的平均年齡進行從高到低的排序 mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc; 用時:19.926s hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc; 用時:411.816s

 

  1. 上一頁:
  2. 下一頁:
Copyright © 程式師世界 All Rights Reserved