程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

[Python] GDP analysis and forecast

編輯:Python

Bowen author wangzirui32
Like can give the thumbs-up Collection Pay attention to ~~
My number 162 An original work
This article was first published in CSDN, Reprint is prohibited without permission


hello, Hello everyone , I am a wangzirui32, Today we are going to learn how to analyze and forecast the GDP , Start learning !

1. Data sources

The data covers 2010-2021 Gross domestic product csv file , come from National bureau of statistics Official data :

Select the time in turn (2010-2021), download CSV file , Named as 2010.csv,2011.csv( year +.csv), give the result as follows :

2. Cleaning data file

2010 The contents of the data file for the year are as follows :

You can see , The real data is in the... Of the file 3 From the beginning of the line to the penultimate 6 That's ok , We need to extract the data , And carry out gbk Encoding conversion ( The source file is gbk code , Demand is UTF8 code ), Put the data file into datafiles In the folder , Then create a directory on the upper level of this directory Python file collate_data.py, Write code :

import os
import codecs
for i in os.listdir("datafiles"):
path = "datafiles/{}".format(i)
try:
# Put the data file gbk Code to utf-8
with codecs.open(path, "rb", "gb2312") as f:
content = f.read()
with codecs.open(path, "wb", "utf-8") as f:
f.write(content)
except: pass
# The introduction information of the data file is removed 
with codecs.open(path, "rb", "utf-8") as f:
new_content = f.readlines()[2:-5]
with codecs.open(path, "wb", "utf-8") as f:
f.writelines(new_content)

Execute this code , The data is cleaned .

3. analysis

With the help of pandas Reading data ,matplotlib Draw statistical chart for analysis , Install library command :

pip install pandas matplotlib

3.1 Broken line chart

The code is as follows :

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
# Extract the data 
gdp_Q1 = [] # first quarter gdp
gdp_Q2 = [] # The second quarter gdp
gdp_Q3 = [] # The third quarter gdp
gdp_Q4 = [] # In the fourth quarter gdp
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_Q1.append(df[' first quarter '][0])
gdp_Q2.append(df[' The second quarter '][0])
gdp_Q3.append(df[' The third quarter '][0])
gdp_Q4.append(df[' In the fourth quarter '][0])
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
Q1_line, = plt.plot(years, gdp_Q1, color="blue")
Q2_line, = plt.plot(years, gdp_Q2, color="pink")
Q3_line, = plt.plot(years, gdp_Q3, color="green")
Q4_line, = plt.plot(years, gdp_Q4, color="orange")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis and forecast ")
plt.xlabel(" year ")
plt.ylabel(" GDP ( One hundred million yuan )")
plt.xticks(years)
plt.legend([Q1_line, Q2_line, Q3_line, Q4_line, all_year_line],
[' first quarter ',' The second quarter ', ' The third quarter ', ' In the fourth quarter ', ' Annual total value '],
loc='upper right')
plt.show()

The effect is as follows :

3.2 Column statistical chart

The code is as follows :

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
# Extract the data 
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
plt.bar(years, gdp_all_year, width=0.5, label="numbers")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis ", loc="center")
plt.xlabel(" year ", fontsize=14)
plt.ylabel(" GDP ( One hundred million yuan )", fontsize=14)
plt.show()

The effect is as follows :

You can see ,2020 Because of the epidemic , The gross domestic product has declined , But in recent years, the overall situation is still on the rise .

4. Fit the linear regression equation

We will use sklearn Machine learning library to fit linear regression equation , Its installation command is as follows :

pip install scikit-learn

4.1 By the end of 1 Quarter as a parameter

Ideas as follows , We use sklearn Library fitting linear regression equation , By the end of 1 Quarter as a parameter , The equation that generates the prediction , The code is as follows :

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
from sklearn import linear_model
# Extract the data 
gdp_Q1 = [] # first quarter gdp
gdp_Q2 = [] # The second quarter gdp
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_Q1.append(df[' first quarter '][0])
gdp_Q2.append(df[' The second quarter '][0])
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
Q1_line, = plt.plot(years, gdp_Q1, color="blue")
Q2_line, = plt.plot(years, gdp_Q2, color="pink")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis and forecast ")
plt.xlabel(" year ")
plt.ylabel(" GDP ( One hundred million yuan )")
plt.xticks(years)
# Formulate the equation 
# Create a linear regression model 
model = linear_model.LinearRegression()
model.fit(list(zip(gdp_Q1)), gdp_all_year)
# Acquisition coefficient 
coef = model.coef_
# Acquisition intercept 
intercept = model.intercept_
# equation 
equation = "y = x*{} + {}".format(coef[0], intercept)
print(" Linear regression equation :", equation)
# Calculate equation data 
forecast_value = [i*coef[0]+intercept for i in gdp_Q1]
# Draw equation polyline 
forecast_line, = plt.plot(years, forecast_value, color="green")
plt.legend([Q1_line, Q2_line, all_year_line, forecast_line],
[' first quarter ',' The second quarter ', ' Annual total value ', ' Equation simulation '],
loc='upper right')
plt.show()

The effect is as follows :

You can see , Equation basic fitting curve , But because 2020 The GDP fell in the first quarter of the year , Furthermore, the predicted annual data also produced a large gap , So what do we do ?

4.2 By the end of 1,2 Quarter as a parameter

We can take the first 1,2 Quarter as a parameter , The code is as follows :

import pandas
import matplotlib
import matplotlib.pyplot as plt
import os
from sklearn import linear_model
# Extract the data 
gdp_Q1 = [] # first quarter gdp
gdp_Q2 = [] # The second quarter gdp
gdp_all_year = [] # Throughout the year gdp
years = [] # All years 
# Solve the Chinese garbled code 
matplotlib.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
# Read the data in each file 
for filename in os.listdir("datafiles"):
year = filename.split(".")[0]
path = "datafiles/{}".format(filename)
df = pandas.read_csv(path)
df.columns = [' indicators ',' In the fourth quarter ', ' The third quarter ', ' The second quarter ', ' first quarter ']
gdp_Q1.append(df[' first quarter '][0])
gdp_Q2.append(df[' The second quarter '][0])
gdp_all_year.append(df[' In the fourth quarter '][1])
years.append(year)
# Draw a statistical chart 
Q1_line, = plt.plot(years, gdp_Q1, color="blue")
Q2_line, = plt.plot(years, gdp_Q2, color="pink")
all_year_line, = plt.plot(years, gdp_all_year, color="red")
plt.title("2010-2021 GDP analysis and forecast ")
plt.xlabel(" year ")
plt.ylabel(" GDP ( One hundred million yuan )")
plt.xticks(years)
# Formulate the equation 
model = linear_model.LinearRegression()
model.fit(list(zip(gdp_Q1, gdp_Q2)), gdp_all_year)
coef = model.coef_
intercept = model.intercept_
equation = "y = x1*{} + x2*{} + {}".format(coef[0], coef[1], intercept)
print(" Linear regression equation :", equation)
# Calculate the predicted results 
forecast_value = [i[0]*coef[0]+i[1]*coef[1]+intercept for i in list(zip(gdp_Q1, gdp_Q2))]
forecast_line, = plt.plot(years, forecast_value, color="green")
plt.legend([Q1_line, Q2_line, all_year_line, forecast_line],
[' first quarter ',' The second quarter ', ' Annual total value ', ' Equation simulation '],
loc='upper right')
plt.show()

The effect is as follows :

You can see , The fitting result of this equation is very good , It can be used as a prediction equation .
The linear regression equation is :

y = x1*0.20405068090604006 + x2*3.8656156020304238 + 9671.424027125235

Equivalent to :

 GDP = GDP in the first quarter *0.20405068090604006 + GDP in the second quarter *3.8656156020304238 + 9671.424027125235

This completes the whole process of analysis and prediction .


Okay , That's all for today's lesson , I am a wangzirui32, You can collect and pay attention to what you like , See you next time !


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved