程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python machine learning: 8 items for beginners

編輯:Python


No amount of theory can replace hands-on practice .

Textbooks and courses will make you think you are proficient , Because the material is right in front of you . But when you try to apply it , You may find it more difficult than it looks . and 「 project 」 Can help you quickly improve the application ML Skill , It also gives you the opportunity to explore interesting topics .

Besides , You can add projects to your portfolio , To find a job more easily , Find cool career opportunities , Even negotiate a higher salary .

In this article , We will introduce it to beginners 8 An interesting machine learning project . You can do any of them in one weekend , Or if you like them , It can be extended to longer projects .

1、 Machine learning Gladiator

We affectionately call it 「 Machine learning Gladiator 」, But it's not new . This is built around machine learning practical One of the quickest ways of intuition .

The goal is to adopt out of the box models and apply them to different data sets . This is a great project 3 A major reason :

First , You will build intuition about how the model fits the problem . Which models are robust to missing data ? Which models can handle classification features well ? Yes , You can look through the textbook to find the answer , But you will learn better through practical operation .

secondly , This project will teach you valuable skills for rapid prototyping . In the real world , If you don't simply try them , It is often difficult to know which model performs best .

Last , This exercise can help you master the workflow of model building . for example , You will begin to practice ……

  • Import data

  • Clean up the data

  • Split it into workouts / Test or cross validation set

  • Preprocessing

  • The transformation of

  • Feature Engineering

Because you will use the model out of the box , You will have the opportunity to focus on honing these key steps .

see sklearn (Python) or caret Documentation page for instructions . You should practice regression 、 Classification and clustering algorithm .

course

• Python: sklearn – sklearn package The official course of

• Use Scikit-Learn Predicting wine quality —— A step-by-step tutorial for training machine learning models

• R: caret – from caret Webinar provided by the package author

data source

• UCI Machine learning repository ——350 Multiple searchable datasets , Covers almost all topics . You will find the data set you are interested in .

• Kaggle Data sets ——Kaggle Uploaded by the community 100 Multiple datasets . Here are some very interesting datasets , Include PokemonGo Spawning sites and tortillas in San Diego .

• data.gov —— An open data set released by the US government . If you are interested in Social Sciences , You can check it out .

2、 Play money ball

stay 《 Penalty kicks turn into gold 》 In a Book , Auckland A The team revolutionized baseball by analyzing players and scouts . They have built a competitive team , It only costs the Yankees and other large market teams to pay their salaries 1/3.

First , If you haven't read this book yet , You should go and see . This is one of our favorites !

Fortunately, , There is a great deal of data available in the sports world . The team 、 match 、 Scores and player data can be tracked online and obtained for free .

For beginners , There are many interesting machine learning projects . for example , You can try ……

• Sports betting …… Predict box scores based on available data before each new game .

• Talent scout …… Use University statistics to predict which players will have the best careers .

• Integrated management … Create player clusters based on their strengths , To build a comprehensive team .

Sports is also a great area for practicing data visualization and exploratory analysis . You can use these skills to help you decide what types of data to include in your analysis .

data source

• Sports statistics database —— Sports statistics and historical data , It covers many professional sports and some college sports . A clean interface makes web pages easier to crawl .

• Sports Reference – Another sports statistics database . The interface is more cluttered , However, you can export a single table as CSV file .

• cricsheet.org – International and IPL The ball by ball data of a cricket match . Provide IPL and T20 International competition CSV file .

3、 Forecast the stock price

For any data scientist interested in Finance , The stock market is like a candy paradise .

First , You have many types of data to choose from . You can find the price 、 Fundamentals 、 Global macroeconomic indicators 、 Volatility index, etc …… be too numerous to enumerate .

secondly , The data can be very fine . You can easily access every company by day ( Even by minute ) Time series data of , So that you can think creatively about trading strategies .

Last , Financial markets usually have a short feedback cycle . therefore , You can quickly validate your predictions for new data .

Some examples of machine learning projects that you can try for beginners include ……

• Quantitative value investment …… According to the fundamental indicators of the company's quarterly report 6 The price trend of the last month .

• forecast …… Build a time series model based on the difference between implied volatility and actual volatility , Even a recurrent neural network .

• Statistical arbitrage …… Find similar stocks based on price movements and other factors , And look for periods when prices diverge .

An obvious disclaimer : Building a trading model to practice machine learning is simple . Making them profitable is extremely difficult . There is no financial advice here , We don't recommend trading real money .

course

• Python: sklearn for Investing – Applying machine learning to investment YouTube Video series .

• R: Quantitative Trading with R – Use R Detailed class notes for quantitative finance .

data source

• Quandl – Free of charge ( And quality ) Data market for financial and economic data . for example , You can download in batches 3000 End of day stock prices of several U.S. companies Or the Federal Reserve's economic data .

• Quantopian – Quantify the financial community , Provide a free platform for developing trading algorithms . Include datasets .

• US Fundamentals Archive – 5000 Many American companies 5 Annual fundamental data .

4、 Teach neural networks to read handwriting

Neural network and deep learning are two successful cases of modern artificial intelligence . They are used in image recognition 、 Great progress has been made in automatic text generation and even in autonomous vehicle .

Get involved in this exciting field , You should start with manageable datasets .

MNIST The handwritten numeral classification challenge is a classic entry point . Image data is usually smaller than 「 Plane 」 Relational data is more difficult to handle .MNIST Data is very friendly to beginners , And small enough to fit on a computer .

Handwriting recognition challenges you , But it doesn't need high computing power .

First , We recommend using the first chapter of the following tutorial . It will teach you how to build neural networks from scratch , Solve with high precision MNIST Challenge .

course

• Neural networks and deep learning ( Online books ) —— The first 1 This chapter describes how to Python Neural network is written from the beginning , To come from MNIST The number of . The author also gives a good explanation for the intuition behind the neural network .

data source

• MNIST – MNIST It is a modified subset of two data sets collected by the National Institute of standards and technology . It contains 70,000 Handwritten digital images with labels .

5、 Investigate Enron

Enron scandal and bankruptcy are the biggest in history One of the business collapses .

2000 year , Enron is one of the largest energy companies in the United States . then , After being exposed for fraud , It spiraled into bankruptcy within a year .

Fortunately, , We have an Enron email database . It contains 150 Former Enron employees ( Mainly senior management ) Between 50 Million emails . It is also the only large public database of real e-mail , This makes it more valuable .

in fact , Data scientists have been using this data set for education and research for many years .

Examples of beginner machine learning projects you can try include ……

• Anomaly detection …… Mapping and receiving e-mail by hour , And try to detect abnormal behaviors that lead to public scandals .

• Social network analysis …… Build a network diagram model among employees to find key influencers .

• natural language processing …… Analyze the body message with e-mail metadata , To classify emails according to their purpose .

data source

• Enron email dataset —— This is from CMU Managed Enron email archive .

• Enron data description (PDF) – Exploratory analysis of Enron e-mail data , Can help you get the foundation .

6、 Write from scratch ML Algorithm

Writing machine learning algorithms from scratch is an excellent learning tool , There are two main reasons .

First , There is no better way to build a true understanding of their mechanisms . You will be forced to consider every step , This will lead to real mastery .

secondly , You will learn how to convert mathematical instructions into working code . When adjusting the algorithm from academic research , You will need this skill .

We suggest choosing a less complex algorithm . Even the simplest algorithm , You also need to make many subtle decisions . Once you are familiar with building simple algorithms , Try extending them for more functionality . for example , This paper attempts to expand the ordinary logistic regression algorithm into a lasso by adding regularization parameters / Ridge return .

Last , This is a hint that every beginner should know : Don't be discouraged. , Because your algorithm is not as fast or fancy as the algorithm in the existing software package . These software packages are the result of years of development !

course

• Python: Logical regression from zero

• Python: From scratch k- Nearest neighbor

• R: Logical regression from zero

7、 Tap social media emotions

Due to the huge amount of user generated content , Social media has almost become 「 big data 」 The pronoun of .

Mining this wealth of data can prove that you can master ideas in an unprecedented way 、 Trends and public sentiment .Facebook、Twitter、YouTube、 WeChat 、WhatsApp、Reddit…… The list continues .

Besides , Each generation spends more time on social media than their predecessors . This means that social media data will be linked to marketing 、 The brand is more relevant to the whole business .

Although there are many popular social media platforms , but Twitter It is a classic entry point for practicing machine learning .

Use Twitter data , You can get data ( Tweet content ) And metadata ( Location 、 Theme Tags 、 user 、 Forward tweets, etc ) Interesting mix of , It opens up almost endless paths for analysis .

course

• Python: mining Twitter data —— How to Twitter Data for emotional analysis

• R: Using machine learning for sentiment analysis —— A short and sweet emotional analysis course

data source

• Twitter API – twitter API Is the classic source of streaming data . You can track tweets 、 Theme labels, etc .

• StockTwits API – StockTwits It's like Twitter for traders and investors . You can extend this data set in many interesting ways by connecting it to a time series data set using timestamps and stock symbols .

8、 Improve health care

Because of machine learning , Another industry undergoing rapid change is global health and healthcare .

In most countries , It takes years of education to become a doctor . This is a demanding 、 Long working hours 、 High risk 、 Enter areas with higher barriers .

therefore , Recently, with the help of machine learning, great efforts have been made to reduce the workload of doctors and improve the overall efficiency of the health care system .

Use cases include :

• Preventive care …… Predict disease outbreaks at the individual and community levels .

• Diagnostic care … Automatically classify image data , For example, scanning 、X Rays, etc .

• insurance …… Adjust the premium according to the public risk factors .

As hospitals continue to modernize patient records , And as we collect more detailed health data , Data scientists will have plenty of opportunities at their fingertips .

course

• R: Build a meaningful machine learning model for disease prediction

• Machine learning in healthcare —— Wonderful speech from Microsoft Research

data source

• Large health data sets —— A collection of large health-related datasets

• data.gov/health – Health and healthcare related data sets provided by the U.S. government .

• Health, nutrition and demographics —— Global health provided by the world bank 、 Nutrition and demographic data .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved