程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python machine learning: 8 items for beginners

編輯:Python



No amount of theory can replace hands-on practice .


Textbooks and courses will make you think you are proficient , Because the material is right in front of you . But when you try to apply it , You may find it more difficult than it looks . and 「 project 」 Can help you quickly improve the application ML Skill , It also gives you the opportunity to explore interesting topics .


Besides , You can add projects to your portfolio , To find a job more easily , Find cool career opportunities , Even negotiate a higher salary .


In this article , We will Introduce... To beginners 8 An interesting machine learning project . you You can finish any of them in one weekend , Or if you like them , It can be extended to longer projects .


1、 Machine learning Gladiator


We affectionately call it 「 Machine learning Gladiator 」, But it's not new . This is built around machine learning   practical   One of the quickest ways of intuition .


The goal is Out of the box And apply it to different data sets . This is a great project 3 A major reason :


First , You will build intuition about how the model fits the problem . Which models are robust to missing data ? Which models can handle classification features well ? Yes , You can look through the textbook to find the answer , But you will learn better through practical operation .


secondly , This project will teach you valuable skills for rapid prototyping . In the real world , If you don't simply try them , It is often difficult to know which model performs best .


Last , This exercise can help you master the Workflow . for example , You will begin to practice ……


Import data

Clean up the data

Split it into workouts / Test or cross validation set

Preprocessing

The transformation of

Feature Engineering


Because you will use the model out of the box , You will have the opportunity to focus on honing these key steps .


see sklearn (Python) or caret (R) Documentation page for instructions . You should practice Return to 、  Classification and clustering algorithm .


course


• Python: sklearn  – sklearn package   The official course of

• Use Scikit-Learn Predicting wine quality —— A step-by-step tutorial for training machine learning models

• R: caret  – from caret Webinar provided by the package author


data source


• UCI Machine learning repository  ——350 Multiple searchable datasets , Covers almost all topics . You will find the data set you are interested in .

• Kaggle Data sets ——Kaggle Uploaded by the community 100 Multiple datasets . Here are some very interesting datasets , Include PokemonGo Spawning sites and tortillas in San Diego .

• data.gov —— An open data set released by the US government . If you are interested in Social Sciences , You can check it out .


2、 Play money ball


stay 《 Penalty kicks turn into gold 》 In a Book  , Auckland A The team revolutionized baseball by analyzing players and scouts . They have built a competitive team , It only costs the Yankees and other large market teams to pay their salaries 1/3.


First , If you haven't read this book yet , You should go and see . This is one of our favorites !


Fortunately, , There is a great deal of data available in the sports world . The team 、 match 、 Scores and player data can be tracked online and obtained for free .


For beginners , There are many interesting machine learning projects . for example , You can try ……


•  Sports betting …… Predict box scores based on available data before each new game .

•  Talent scout ……  Use University statistics to predict which players will have the best careers .

•  Integrated management ......  Create player clusters based on their strengths , To build a comprehensive team .


Physical education is also practice Data visualization and Exploratory analysis A great area for . You can use these skills to help you decide what types of data to include in your analysis .


data source


•  Sports statistics database  —— Sports statistics and historical data , It covers many professional sports and some college sports . A clean interface makes web pages easier to crawl .

• Sports Reference  – Another sports statistics database . The interface is more cluttered , However, you can export a single table as CSV file .

• cricsheet.org – International and IPL The ball by ball data of a cricket match . Provide IPL and T20 International competition CSV file .


3、 Forecast the stock price


For any data scientist interested in Finance , The stock market is like a candy paradise .


First , You have many types of data to choose from . You can find the price 、 Fundamentals 、 Global macroeconomic indicators 、 Volatility index, etc …… be too numerous to enumerate .


secondly , The data can be very fine . You can easily access every company by day ( Even by minute ) Time series data of , So that you can think creatively about trading strategies .


Last , Financial markets usually have a short feedback cycle . therefore , You can quickly validate your predictions for new data .


Some examples of machine learning projects that you can try for beginners include ……


•  Quantitative value investment ……  According to the fundamental indicators of the company's quarterly report 6 The price trend of the last month .

•  forecast ……  Build a time series model based on the difference between implied volatility and actual volatility , Even a recurrent neural network .

•  Statistical arbitrage ……  Find similar stocks based on price movements and other factors , And look for periods when prices diverge .


An obvious disclaimer : Building a trading model to practice machine learning is simple . Making them profitable is extremely difficult . There is no financial advice here , We don't recommend trading real money .


course


• Python: sklearn for Investing – Applying machine learning to investment YouTube Video series .

• R: Quantitative Trading with R – Use R Detailed class notes for quantitative finance .


data source


• Quandl  – Free of charge ( And quality ) Data market for financial and economic data . for example , You can download in batches 3000 End of day stock prices of several U.S. companies   Or the Federal Reserve's economic data .

• Quantopian – Quantify the financial community , Provide a free platform for developing trading algorithms . Include datasets .

• US Fundamentals Archive – 5000 Many American companies 5 Annual fundamental data .


4、 Teach neural networks to read handwriting


Neural network and deep learning are two successful cases of modern artificial intelligence . They are used in image recognition 、 Great progress has been made in automatic text generation and even in autonomous vehicle .


Get involved in this exciting field , You should start with manageable datasets .


MNIST Handwritten numeral classification challenge It's a classic entry point . Image data is usually smaller than 「 Plane 」 Relational data is more difficult to handle .MNIST Data is very friendly to beginners , And small enough to fit on a computer .

Handwriting recognition challenges you , But it doesn't need high computing power .


First , We recommend using the first chapter of the following tutorial . It will teach you how to build neural networks from scratch , Solve with high precision MNIST Challenge .


course


•  Neural networks and deep learning ( Online books ) —— The first 1 This chapter describes how to Python Neural network is written from the beginning , To come from MNIST The number of . The author also gives a good explanation for the intuition behind the neural network .


data source


• MNIST  – MNIST It is a modified subset of two data sets collected by the National Institute of standards and technology . It contains 70,000 Handwritten digital images with labels .


5、 Investigate Enron


Enron scandal and bankruptcy are the biggest in history   One of the business collapses .


2000 year , Enron is one of the largest energy companies in the United States . then , After being exposed for fraud , It spiraled into bankruptcy within a year .


Fortunately, , We have an Enron email database . It contains 150 Former Enron employees ( Mainly senior management ) Between 50 Million emails . It is also the only large public database of real e-mail , This makes it more valuable .


in fact , Data scientists have been using this data set for education and research for many years .


Examples of beginner machine learning projects you can try include ……


•  Anomaly detection …...  Mapping and receiving e-mail by hour , And try to detect abnormal behaviors that lead to public scandals .

•  Social network analysis ……  Build a network diagram model among employees to find key influencers .

•  natural language processing …… Analyze the body message with e-mail metadata , To classify emails according to their purpose .


data source


•  Enron email dataset  —— This is from CMU Managed Enron email archive .

•  Enron data description (PDF) – Exploratory analysis of Enron e-mail data , Can help you get the foundation .


6、 Write from scratch ML Algorithm


Writing machine learning algorithms from scratch is an excellent learning tool , There are two main reasons .


First , There is no better way to build a true understanding of their mechanisms . You will be forced to consider every step , This will lead to real mastery .


secondly , You will learn how to convert mathematical instructions into working code . When adjusting the algorithm from academic research , You will need this skill .


We suggest choosing a less complex algorithm . Even the simplest algorithm , You also need to make many subtle decisions . Once you are familiar with building simple algorithms , Try extending them for more functionality . for example , Try to add a regularization parameter to the normal Logical regression The algorithm is extended to Lasso / Ridge return .


Last , This is a hint that every beginner should know : Don't be discouraged. , Because your algorithm is not as fast or fancy as the algorithm in the existing software package . These software packages are the result of years of development !


course


• Python: Logical regression from zero

• Python: From scratch k- Nearest neighbor

• R: Logical regression from zero


7、 Tap social media emotions


Due to the huge amount of user generated content , Social media has almost become 「 big data 」 The pronoun of .


Mining this wealth of data can prove that you can master ideas in an unprecedented way 、 Trends and public sentiment .Facebook、Twitter、YouTube、 WeChat 、WhatsApp、Reddit…… The list continues .


Besides , Each generation spends more time on social media than their predecessors . This means that social media data will be linked to marketing 、 The brand is more relevant to the whole business .


Although there are many popular social media platforms , but Twitter It is a classic entry point for practicing machine learning .


Use Twitter data , You can get data ( Tweet content ) And metadata ( Location 、 Theme Tags 、 user 、 Forward tweets, etc ) Interesting mix of , It opens up almost endless paths for analysis .


course


• Python: mining Twitter data —— How to Twitter Data for emotional analysis

• R: Using machine learning for sentiment analysis —— A short and sweet emotional analysis course


data source


• Twitter API  – twitter API Is the classic source of streaming data . You can track tweets 、 Theme labels, etc .

• StockTwits API  – StockTwits It's like Twitter for traders and investors .  You can extend this data set in many interesting ways by connecting it to a time series data set using timestamps and stock symbols  .


8、 Improve health care


Because of machine learning , Another industry undergoing rapid change is global health and healthcare .


In most countries , It takes years of education to become a doctor . This is a demanding 、 Long working hours 、 High risk 、 Enter areas with higher barriers .


therefore , Recently, with the help of machine learning, great efforts have been made to reduce the workload of doctors and improve the overall efficiency of the health care system .


Use cases include :

•  Preventive care ……  Predict disease outbreaks at the individual and community levels .

•  Diagnostic care ......  Automatically classify image data , For example, scanning 、X Rays, etc .

•  insurance …… Adjust the premium according to the public risk factors .


As hospitals continue to modernize patient records , And as we collect more detailed health data , Data scientists will have plenty of opportunities at their fingertips .


course


• R: Build a meaningful machine learning model for disease prediction

•  Machine learning in healthcare —— Wonderful speech from Microsoft Research


data source


•  Large health data sets —— A collection of large health-related datasets

• data.gov/health – Health and healthcare related data sets provided by the U.S. government .

•  Health, nutrition and demographics —— Global health provided by the world bank 、 Nutrition and demographic data .




  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved