程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python uses Hive to query data

編輯:Python

背景

In the process of big data processing,The most popular open source frameworks today are nothing more thanhadoop體系,那說到Hadoop The two core parts are nothing more thanHDFS、MapReduce,The former is mainly used to store data,The latter is used to calculate and process data,On this basis, if you want to process massive data, you must create it based on thismapper、reduce、partition、jobdrive, etc.,這樣一套流程下來,Are developers too big?,為什麼呢 The work is tedious and the development efficiency is too low,So is there a better simple solution?? 還真有,推薦使用Facebook開源的Hive

Hive是什麼呢? 其實就是針對MapReduce進行了二次封裝,It is convenient for developers to use familiarSQLScripts and convenient operationHadoop集群上的數據,The basic workflow is developers write to manipulate the data in the first placeSQL,Next connectHive,然後Hivewill write youSQL轉為MapReduce任務,Finally these tasks will beHadoop集群上運行,具體流程如下:

具體實現

實現方案

This article mainly introduces the use ofpython開發語言實現,Other languages ​​can refer to the corresponding officialAPI文檔

基於此,我們首先需要安裝Hive的環境,Secondly, the client code needs to be importedhive驅動包
安裝HiveThe environment is not covered in this article,This article focuses on how the client code implements the operationhive的流程

引入hive包
$pip3 install pyhive

示例代碼

from pyhive import hive
class HiveUtils:
def query_count_data(self,query_count_sql,sql_config):
''' The total number of statistics in the table '''
logger.info('query_sql:%s',query_count_sql)
hive_conn = self.__get_connection(sql_config)
hive_cursor = hive_conn.cursor()
hive_cursor.execute(query_count_sql)
list_res = hive_cursor.fetchall()
total_count = list_res[0][0]
logger.info('get total_count:%s',total_count)
hive_cursor.close()
hive_conn.close()
return total_count
@retry(stop=stop_after_attempt(5), wait=wait_fixed(10))
def __get_connection(self,sql_config):
''' Retry if unable to connect5次,delay per retry10秒 '''
logger.info('start to connect to hive ...')
hive_connection = hive.Connection(host=sql_config.host
, port=sql_config.port
, username=sql_config.user_name
, database=sql_config.db_name)
logger.info('connect to hive successful!')
return hive_connection

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved