您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Basic use of Python pandas

編輯：Python

About Pandas

Pandas It's based on NumPy Data analysis toolkit for , It provides an efficient data analysis method , And it can be used for large data sets .
Pandas The main data structure of is Series （ One-dimensional data ） And DataFrame（ Two dimensional data ）, These two data structures are enough to handle finance 、 Statistics 、 Social Sciences 、 Most typical use cases in fields such as engineering .

Series Is an object similar to a one-dimensional array , It consists of a set of data （ Various Numpy data type ） And a set of related data labels （ Index ） form .
DataFrame Is an object similar to a two-dimensional array , It consists of a set of data （ Various Numpy data type ） And a set of related data labels （ Index ） form .DataFrame It's a tabular data structure , It has an ordered set of columns ,
Each column can be of a different value type （ The number 、 character string 、 Boolean value ）.DataFrame There are both row and column indexes , It can be seen by Series A dictionary made up of （ Share an index ）.

Basic use

# Import Pandas
import pandas as pd
# Pandas data structure - Series
# Pandas Series Similar to a column in a table （column）, Similar to one-dimensional arrays , You can save any data type .
# Series By index （index） And columns make up , Function as follows ：
# pandas.Series( data, index, dtype, name, copy)
# 1. data： A set of data (ndarray type ).
# 2. index： Data index label , If you don't specify , The default from the 0 Start .
# 3. dtype： data type , By default, I will judge .
# 4. name： Set the name .
# 5. copy： Copy the data , The default is False.
# Create a Series
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
print(s)
# Create an empty Series
s = pd.Series()
print(s)
# Create an empty Series, And initialize it as a dictionary 
s = pd.Series({
'a': 1, 'b': 2, 'c': 3})
print(s)
# Create an empty Series, And initialize it as a list 
s = pd.Series([1, 2, 3])
print(s)
# Create an empty Series, And initialize to a string 
s = pd.Series('hello')
print(s)
# Create an empty Series, And initialize to a number 
s = pd.Series(1)
print(s)
# Create an empty Series, And initialize to a Boolean value 
s = pd.Series(True)
print(s)
# Pandas data structure - DataFrame
# DataFrame It's a tabular data structure , It has an ordered set of columns , Each column can be of a different value type （ The number 、 character string 、 Boolean value ）.DataFrame There are both row and column indexes , It can be seen by Series A dictionary made up of （ Share an index ）.
# Pandas DataFrame Is a two-dimensional array structure , It's like a two-dimensional array .
# DataFrame The construction method is as follows ：
# pandas.DataFrame( data, index, columns, dtype, copy)
# data： A set of data (ndarray、series, map, lists, dict Other types ).
# index： Index value , Or it can be called a line label .
# columns： Column labels , The default is RangeIndex (0, 1, 2, …, n) .
# dtype： data type .
# copy： Copy the data , The default is False.
# Create a Series
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
print(s)
# Create a DataFrame
data = [['Google', 10], ['Runoob', 12], ['Wiki', 13]]
df = pd.DataFrame(data, columns=['Site', 'Age'], dtype=float)
print(df)
# Create an empty DataFrame
df = pd.DataFrame()
print(df)
# Create an empty DataFrame, And initialize it as a dictionary 
dict = {
'a': 1, 'b': 2, 'c': 3}
data = pd.DataFrame(list(dict.items()))
print(data)
# Create an empty DataFrame, And initialize it as a list 
df = pd.DataFrame([1, 2, 3])
print(df)
# Pandas CSV file 
# CSV（Comma-Separated Values, Comma separated values , Sometimes referred to as character separated values , Because the separator character can also not be a comma ）, Its files store tabular data in plain text （ Numbers and text ）.
# CSV It's universal 、 Relatively simple file format , By user 、 Business and science are widely used .Pandas It's easy to handle CSV file ,
# Read csv
# df = pd.read_csv('nba.csv')
# Before output 5 After the row and 5 That's ok , Other omitted 
print(df)
# Output all 
print(df.to_string())
# Three fields name, site, age
nme = ["Google", "Runoob", "Taobao", "Wiki"]
st = ["www.google.com", "www.runoob.com", "www.taobao.com", "www.wikipedia.org"]
ag = [90, 40, 80, 98]
# Dictionaries 
dict = {
'name': nme, 'site': st, 'age': ag}
df = pd.DataFrame(dict)
# preservation dataframe
df.to_csv('site.csv')
# Data processing 
# head() Method , Before output 5 Row data 
print(df.head())
# tail() Method , After output 5 Row data 
print(df.tail())
# info() Method , Output DataFrame The situation of 
print(df.info())
# describe() Method , Output DataFrame Descriptive statistics for 
print(df.describe())
# shape() Method , Output DataFrame The number of rows and columns 
print(df.shape)
# index() Method , Output DataFrame The index of 
print(df.index)
# columns() Method , Output DataFrame Column name of 
print(df.columns)
# values() Method , Output DataFrame Value 
print(df.values)
# dtypes() Method , Output DataFrame Data type of 
print(df.dtypes)
# isnull() Method , Output DataFrame Null value information of 
print(df.isnull())
# notnull() Method , Output DataFrame Non null value information of 
print(df.notnull())
# dropna() Method , Delete DataFrame Null value line of 
print(df.dropna())
# dropna(how='all') Method , Delete DataFrame All null value lines of 
print(df.dropna(how='all'))
# dropna(thresh=) Method , Delete DataFrame Null value line of , If there is more than thresh Null value of row , Then delete the line 
print(df.dropna(thresh=2))
# dropna(subset=) Method , Delete DataFrame Null value line of , Only delete the null value of the specified column 
print(df.dropna(subset=['age']))
# fillna() Method , fill DataFrame Null value of 
print(df.fillna(value=0))
# fillna(method='ffill') Method , fill DataFrame Null value of , The value in front 
print(df.fillna(method='ffill'))
# fillna(method='bfill') Method , fill DataFrame Null value of , Value after 
print(df.fillna(method='bfill'))
# fillna(method='pad') Method , fill DataFrame Null value of , The value in front 
print(df.fillna(method='pad'))
# fillna(method='backfill') Method , fill DataFrame Null value of , Value after 
print(df.fillna(method='backfill'))
# fillna(method='ffill', limit=) Method , fill DataFrame Null value of , The value in front , Fill up to limit That's ok 
print(df.fillna(method='ffill', limit=2))
# fillna(method='bfill', limit=) Method , fill DataFrame Null value of , Value after , Fill up to limit That's ok 
print(df.fillna(method='bfill', limit=2))
# fillna(method='pad', limit=) Method , fill DataFrame Null value of , The value in front , Fill up to limit That's ok 
print(df.fillna(method='pad', limit=2))
# Pandas JSON
# JSON（JavaScript Object Notation,JavaScript Object notation ）, Is the syntax for storing and exchanging text information , similar XML.
# JSON Than XML smaller 、 faster , Easier to parse ,Pandas It's easy to handle JSON data .
# Read json
# df = pd.read_json('sites.json') 
# print(df.to_string())
data = [
{

"id": "A001",
"name": " Novice tutorial ",
"url": "www.runoob.com",
"likes": 61
},
{

"id": "A002",
"name": "Google",
"url": "www.google.com",
"likes": 124
},
{

"id": "A003",
"name": " TaoBao ",
"url": "www.taobao.com",
"likes": 45
}
]
df = pd.DataFrame(data)
print(df)
# Pandas Data cleaning 
# Data cleaning is the process of processing some useless data . Many data sets have missing data 、 Data format error 、 Wrong data or duplicate data , If you want to make the data analysis more accurate , We need to process these useless data .
# Pandas Cleaning null value : If we want to delete a row that contains an empty field , have access to dropna() Method , The syntax is as follows ：
# DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
# Pandas Cleaning malformed data 
# Cells with wrong data format will make data analysis difficult , It's not even possible . We can use rows that contain empty cells , Or convert all cells in the column to data in the same format .
# df['Date'] = pd.to_datetime(df['Date'])
# Pandas Cleaning error data 
# Data errors are also common , We can replace or remove the wrong data .
# df.loc[2, 'age'] = 30
# Pandas Cleaning duplicate data 
# If we want to clean up duplicate data , have access to duplicated() and drop_duplicates() Method . If the corresponding data is duplicate ,duplicated() Returns the True, Otherwise return to False.
# print(df.duplicated())