您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Detailed explanation of SciPy hierarchical clustering parameters in Python

編輯：Python

Detailed explanation python Hierarchical clustering fcluster function

Call the instance ：

import scipy
import scipy.cluster.hierarchy as sch
from scipy.cluster.vq import vq,kmeans,whiten
import numpy as np
import matplotlib.pylab as plt
points=scipy.randn(20,4)
#1. Hierarchical clustering 
# Generate a distance matrix between points , The Euclidean distance used here :
disMat = sch.distance.pdist(points,'euclidean')
# Hierarchical clustering :
Z=sch.linkage(disMat,method='average')
# The hierarchical clustering results are represented in a tree view and saved as plot_dendrogram.png
P=sch.dendrogram(Z)
plt.savefig('plot_dendrogram.png')
# according to linkage matrix Z Get the clustering results :
cluster= sch.fcluster(Z, t=1, 'inconsistent')
print "Original cluster by hierarchy clustering:\n",cluster

The parameter list is as follows ：

def fcluster(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None):

Z Represents the use of “ correlation function ” Related data .
For example, the above call example uses the Euclidean distance to generate the distance matrix , And the distance of the matrix is averaged
Different distance formulas can be used here

t This parameter is used to distinguish the thresholds of different clusters , In different criterion The parameters set under different conditions are different .
For example, when criterion by ’inconsistent’ when ,t The value should be in 0-1 between ,t The closer the 1 Represents the greater the correlation between the two data ,t More and more 0 It indicates that the correlation between the two data is smaller . This correlation can be used to compare the correlation between two vectors , It can be used for clustering in high dimensional space

depth Represents a process of inconsistency (‘inconsistent’) The maximum depth at the time of calculation , It doesn't make sense for other parameters , The default is 2

criterion This parameter represents the decision condition , Here, the meaning of each parameter is explained in detail ：
1. When criterion by ’inconsistent’ when ,t The value should be in 0-1 between ,t The closer the 1 Represents the greater the correlation between the two data ,t More and more 0 It indicates that the correlation between the two data is smaller . This correlation can be used to compare the correlation between two vectors , It can be used for clustering in high dimensional space
2. When criterion by ’distance’ when ,t The value represents the absolute difference , If it is less than this difference , The two data will be merged , When greater than this difference , The two data will be separated .
3. When criterion by ’maxclust’ when ,t Represents the maximum number of clusters , Set up 4 Then the maximum number of clusters is 4 class , When clustering meets 4 Class time , Iteration stop
4. When criterion by ’monocrit’ when ,t Your choice is not fixed , But according to a function monocrit[j] To make sure . For example , The threshold of the maximum average distance is in the inconsistency matrix r Calculate the threshold in 0.8, It can be written like this ,

MR = maxRstat(Z, R, 3)
cluster(Z, t=0.8, criterion='monocrit', monocrit=MR)

When criterion by ’maxclust_monocrit’ when , The function will be used when the maximum number of clusters is t At the same time , Set the threshold t Minimize inconsistencies .
The call example is as follows ：

MI = maxinconsts(Z, R)
cluster(Z, t=3, criterion='maxclust_monocrit', monocrit=MI)

上一篇文章：【vscode】python
下一篇文章： Blue Bridge Cup [13th finals] Python group B

Python

Deep inventory! The most complete Python operation database module in history (20)

Previous review ????&nb

Python data analysis and presentation 1

2.2.1 The dimensions of data &

MOEAD原理及Python實現、MOEAD實現、基於分解的多目標進化、切比雪夫方法-（python完整代碼）

優質資源分享學習路線指引（點擊解鎖）知識定位人群定位🧡

Production of uneven illumination data of human face based on Python

List of articles One 、 Princi

Python+Appium+Pytest+Allure實戰APP自動化測試框架，小試牛刀！

Hi，大家好。今天我們來聊聊Python+Appium+Py

dsx-rl中遇到的python函數的筆記

1.zip()函數 zip() 函數用於將可迭代的對象作為參

The problem of sorted and reversed in Python

The use of str() and repr() methods in Python

How to add the same character to each element of Python list

Pandas custom change the order of columns in dataframe

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

pandas自定義改變dataframe數據列的前後次序 (change the order of columns in dataframe)

Leetcode solution (1672): total assets of the richest customers (Python)

Python and fractal 0019 - [tutorial] stack of circles

python與分形0019 - 【教程】Stack of Circles

leetcode 2305. Fair Distribution of Cookies（python）

熱門圖文

Inet控件中上載的路徑、文件名中出現空格的解決辦法 java-求大神幫忙看看ajax登陸無響應淺談PHP的反射機制，淺談PHP反射機制 C#開發WPF/Silverlight動畫及游戲系列教程(Game Course)：(十)(4) C說話中函數聲明與挪用成績 Introduction to Python zero foundation-11 introduction to the standard library - Part II Python 信任管理問題漏洞如何修復？ setMinutes 方法

欄目導航