程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Use Python to crawl the latest version of zulujdk and download it as an image sharing station

編輯:Python
It's me again !!! The first figure !

author :Mintimate

Blog :https://www.mintimate.cn Mintimate’s Blog, Just to share with you

Preface

In general use JDK, Just use OracleJDK or OpenJDK,OracleJDK Commercial license agreements often change ; On the safe side , Or use it OpenJDK It is better to develop the project .

and OpenJDK In many branches of , I prefer ZuluJDK:https://www.azul.com/downloads/

But a little awkward ,ZuluJDK The official website of the project , Not very friendly . occasionally , Even if it goes up , It's also very slow , Even one. JDK I can't download it .

therefore , I just want to use the Hong Kong server of my Tencent cloud lightweight application server for transfer , Set up your own mirror station ; Built mirror station , Can give me a lightweight application server in Shanghai 、 Nanjing and other regions provide direct download links ; You can even download it for your friends JDK, Share the joy ( ´▽`)

The server system and region used this time

ZuluJDK

ZuluJDK Is based on OpenJDK Developed , Of the protocol used :GPL v2 + Classpath exception (GPL v2 + CE):

ZuluJDK agreement

Use ZuluJDK, The function is basically the same as that of OracleJDK It doesn't make any difference , And I won't be Oracle The agreement affects (ZuluJDK Has always been a GPL v2+CE)

About use OpenJDK Development Java Software , Whether it is necessary to comply with GPLv2 And open source , This is quite a heated discussion ; But pay attention to this Classpath exception, I think the developed software can still not be used GPL agreement ; How about it , Request to call the boss to support in the comment area (・_・;

Design thinking

In order to achieve ZuluJDK Download the latest version . Prepare to use Python analysis ZuluJDk Download address , After use wget Download to the server , Finally using Nginx Directory mapping .

Environment depends on

Environment dependency is simple , Hardware aspect :

  • Tencent cloud lightweight application server Debian Mirror system :Python To use its wget modular , Call the system wget;Windows The operating system doesn't know if it can be Python call wget.

Software level :

  • Python3.x: Core software , Used to write about reptiles .
  • Vim8.2 With YCM: Text editor , Used to write Python Script .
  • PAW: The Internet API Testing software , You can use curll coordination grep Command instead of .

Python Module dependencies :

requests==2.27.1
wget==3.2

Data acquisition

First look at the page :https://www.azul.com/downloads/

Discover data interfaces :

Data interface

Copy its interface , Use paw perhaps postman To test :

analysis

Find this is pure JSON object , And received requests : Well structured .

The design is too simple .

stay Python In the use requests Library to simulate the request , Request header :

URL = "https://www.azul.com/wp-admin/admin-ajax.php?action=bundles&endpoint=community&use_stage=false&include_fields%5B%5D=java_version&include_fields%5B%5D=os&include_fields%5B%5D=javafx&include_fields%5B%5D=latest&include_fields%5B%5D=ext"
HEADERS = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'x-csrf-token': '',
'x-requested-with': 'XMLHttpRequest',
'cookie': ''
,
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}

Send the request and resolve it to JSON:

def get_zulu_json():
response = requests.get(url=URL,
headers=HEADERS).json()
return response
Write tests

So you can put all ZuluJDK The version information of has been obtained :

All the information

But it's a bit awkward , I didn't expect so many , Directly spread the console of my Tencent cloud lightweight application server ……

Poor little white cloud , Don't worry !!! I take PAW To help you share the pressure ~~~

PAW Inside display , Yes 4473 term , Obviously, all the previous builds are here .

Zulu The server is so big ……4473 individual JDK/JRE…… There are at least 1T There's more storage space .

So we need to filter , abbreviation : Disassemble objects ⁄(⁄ ⁄ ⁄ω⁄ ⁄ ⁄)⁄

Data processing

It's easy to deal with , Observe JSON Object properties , give an example :

{
"abi": "any",
"arch": "arm",
"bundle_type": "jdk",
"cpu_gen": [
"v8"
],
"ext": "zip",
"features": [
"jdk"
],
"hw_bitness": "64",
"id": 17865,
"java_version": [
19,
0,
0
],
"javafx": false,
"jdk_version": [
19,
0,
0,
6
],
"latest": false,
"name": "zulu19.0.21-ea-jdk19.0.0-ea.6-macosx_aarch64.zip",
"openjdk_build_number": 6,
"os": "macos",
"release_status": "ea",
"sha256_hash": "85493b2ab7bc2cba742856684730ee42a8ee71d3d0a510770a5d0071a2622903",
"support_term": "sts",
"url": "https://cdn.azul.com/zulu/bin/zulu19.0.21-ea-jdk19.0.0-ea.6-macosx_aarch64.zip",
"zulu_version": [
19,
0,
21,
0
]
}, 

You can see , Yes extlatest and name And so on . commonly JDK They are all configured by themselves , The operating system is generally Windows、Linux and macOS.

No one will install it with an installer ? Won't! , Won't! , Not good …… Install with the installer , At that time, uninstall will not find it (。 ́︿ ̀。)

macOS and Linux Of ZuluJDK, The general one must be the archive (tartar.gz),Windows Of ZuluJDk All are zip file , And os="windows".

So we filter it twice .( Split the object twice , wuhu , I'm good or bad )

macOS/Linux

So we are right JSON To filter :

def filter_by(zulu_info, latest=None, javafx=None, ext=None, os=None):
params = locals()
params.pop('zulu_info')
for key in params:
if params[key] is None:
continue
if zulu_info.get(key) != params[key]:
return False
return True
download_list=list(filter(lambda x: filter_by(x, javafx=True, ext="tar.gz"), zulu_json))
edit

among :

  • zulu_json: From the above data JSON( That's it. 4000 Multi object ◡ ヽ(`Д´)ノ ┻━┻)

Calculate the length :

length

That's good , Only 342 The item .

But there are still JRE, Little friends also need JRE???

Cut directly !!! And you can find , This JSON object , The data goes forward , The newer the version . Let's just buffer and download the latest version each time :

# For counting
temp_list = {}
for item in filtered_list:
# Regular judge whether it is JDK
if re.search(r"jdk", item['name']) is None:
# No JDK, Skip the loop directly
continue
# see JDK edition ( The big version , Such as :JDK11、JDK17 etc. )
jdk_version_code = str(item['jdk_version'][0])
# Spliced into a system +JDK Version form
OS_version = jdk_version_code + item['name'].split("-")[-1]
# If the object exists , Jump straight out of the loop
if OS_version in temp_list.keys():
continue
else:
temp_list[OS_version] = 1,

such , You can guarantee , Each major version of each system , Download only once :

Test it
342 change 20 term

You can see , In this case 342 Items become only 27 The item (macOS/Windows)

Windows

Windows Filtering and downloading macOS and Linux The method is the same :

download_list=list(filter(lambda x: filter_by(x, javafx=True, ext="zip",os="windows"), zulu_json))

And just now Linux The method is the same , And filter it out JRE, Keep only the latest version :

Use the code
The final results

So it looks like , Not much to download (27+12)

download JDK

Last , We just downloaded it . Use here wget Download the data ;Python Of wget modular :https://pypi.org/project/wget/

This is not GNU Of wget Tools , It's for Python Call inside wget For download .

install wget modular :

pip3 install wget
install wget

Define a download directory :

def has_dir(path):
if not os.path.exists(path):
os.makedirs(path)

After that, the code for incomplete filtering :

def download_by_list(filtered_list):
temp_list = {}
for item in filtered_list:
if re.search(r"jdk", item['name']) is None:
continue
jdk_version_code = str(item['jdk_version'][0])
OS_version = jdk_version_code + item['name'].split("-")[-1]
if OS_version in temp_list.keys():
continue
else:
temp_list[OS_version] = 1,
# Set up wget Download directory
save_path = "ZuluJDK_Mirror/" + jdk_version_code + "/"
# Determine whether the directory exists
has_dir(save_path)
print(" Start the download :" + item['name'])
# download
wget.download(item['url'], out=save_path + item['name'])
print("\n")
# Thread to sleep
time.sleep(20)

The final result

Last , We wrote Python Script :

Edit code

Use Python The command runs :

Start downloading and crawling

The last downloaded file :

The downloaded file

Use Nginx Make a directory mapping :

Directory mapping

Of course , We can use Cron Periodically buffer the latest version of ZuluJDK, I'm not going to do that here .

END

Last , We can Nginx Send the directory mapping address to the desired download ZuluJDK My little friend .

in addition …… Suddenly found that : I explained it directly ZuluJDK Straight chain of , use Nginx Carry on the reverse generation …… It seems more convenient ; It doesn't take up server space !!!

however , It's better to be like this , Ensure resource persistence ~~~ and , Hard disk resources of Tencent cloud lightweight application server in Hong Kong, China , Make full use of , Enjoy making your own “ wheel ” Happiness (**-**)


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved