程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python data analysis (I): import data, descriptive statistics, cross analysis, correlation analysis, linear regression analysis

編輯:Python

Catalog

      • 1 First import some packages
      • 2 Import data
        • (1) from excel Table import
      • 3 Create data manually
      • 4 Data sorting
      • 5 Simple calculation of data
      • 6 On data 0-1 Standardization
      • 7 Basic descriptive statistical indicators
      • 8 Grouping statistics
      • 9 Correlation analysis
      • 10 Draw a scatter plot
      • 11 linear regression model

1 First import some packages

The data analysis of this paper is in anaconda Medium Spyder In the middle of .

import pandas
from sklearn.linear_model import LinearRegression
import matplotlib
import matplotlib.pyplot as plt

2 Import data

(1) from excel Table import

The screenshot below is data.xlsx Data in ,sheet Name data1.

Execute the following code

# utilize pandas Inside read_excel function 
# Pay attention to two places , First, write the file path ( Include the file name )
# Second, write which one in the import file sheet
data = pandas.read_excel(
'D:/7_science_and_technology/ Data analysis /data.xlsx',
sheet_name='data1'

give the result as follows :

3 Create data manually

# utilize pandas Inside DataFrame Manually create 
# ' Variable name ':[...,...,...,...,]
data_2 = pandas.DataFrame({

'catalog': ['A','B','C','D','E'],
'percent': [0.1, 0.15, 0.4, 0.6, 0.9]
})

give the result as follows :

utilize plot.bar Function draw a histogram :

data_2.plot.bar(x = 'catalog', y='percent')

give the result as follows :

4 Data sorting

# True Stands for ascending order ,False For descending order 
sortData = data.sort_values(
by = [' Math scores ',' Chinese achievement '],
ascending = [True, False]
)

give the result as follows :

5 Simple calculation of data

# Simple calculation of data 
data[' Total score '] = data. Math scores + data. Chinese achievement

give the result as follows :

6 On data 0-1 Standardization

# Data pair 0-1 Standardization 
data[' Chinese achievement standardization '] = round(
(data. Chinese achievement - data. Chinese achievement .min())/(
data. Chinese achievement .max() - data. Chinese achievement .min())
)

give the result as follows :

7 Basic descriptive statistical indicators

# Basic description statistics 
print(data. Total score .describe())

give the result as follows :

8 Grouping statistics

# Group statistics by sex 
ga = data.groupby(by = [' Gender '])[' Chinese achievement '].agg('count')
print(ga)
print(ga.sum()) # The total number of cases 
print(ga/ga.sum()) # Calculation scale 

give the result as follows :

9 Correlation analysis

# Correlation analysis : Chinese achievement 、 Math scores 
corrMatrix = data[[
' Math scores ',' Chinese achievement '
]].corr()
print(corrMatrix)

give the result as follows :

10 Draw a scatter plot

# Draw a scatter plot 
#data.plot(' Math scores ',' Chinese achievement ', kind = 'scatter')
plt.scatter(data[' Math scores '], data[' Chinese achievement '])

give the result as follows :

11 linear regression model

# The regression model 
x = data[[' Math scores ']]
y = data[[' Chinese achievement ']]
lrModel = LinearRegression()
lrModel.fit(x, y)
print(lrModel.coef_)
print(lrModel.intercept_)
# Accuracy of regression model 
print(lrModel.score(x, y))


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved