程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python chapter 07 data cleaning and preparation

編輯:Python

In the process of data analysis and modeling , Quite a lot of time is spent on data preparation : load 、 clear 、 Transform and reshape . These jobs will take up the analyst's time 80% Or more . Sometimes , The format of data stored in files and databases is not suitable for a particular task . Many researchers choose to use a general-purpose programming language ( Such as Python、Perl、R or Java) or UNIX Text processing tools ( Such as sed or awk) Specialized processing of data formats . Fortunately, ,pandas And built-in Python The standard library provides a set of advanced 、 agile 、 A quick tool , It allows you to easily organize the data into the desired format .

If you find a book or pandas There is no data operation mode in the library , Please check your mailing list or GitHub Proposed on the website . actually ,pandas Many of the design and implementation of is driven by the requirements of real applications .

In this chapter , I will talk about dealing with missing data 、 Duplicate data 、 String manipulation and other tools for analyzing data conversion . Next chapter , I will focus on merging in a number of ways 、 Reshape the data set .

7.1 Processing missing data

In many data analysis work , Missing data often occurs .pandas One of our goals is to handle missing data as easily as possible . for example ,pandas All descriptive statistics of the object do not include missing data by default .

Missing data in pandas There are some imperfections in the way presented in , But for most users, it can guarantee normal functions . For numerical data ,pandas Use floating point values NaN(Not a Number) Indicates missing data . We call it sentry value , It can be easily detected :

In [10]: string_data 

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved