程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python observability of seven key part

編輯:Python
作者: 譯者:  

| 2022-08-02 11:57      

Learn why Python The observability is very important,And how to implement it in your software development life cycle.

Applications can perform a lot of code you wrote,And is a way to basically see perform.So how do you know:

  • The code is in the running?
  • Whether at work?
  • 誰在使用它,如何使用?

Observability is a kind of ability,Can view the data to tell you,What are you doing the code.在這篇文章中,The main concern is the server code in a distributed system.Is not to say that the client application code of observability is not important,Just say the client is often not with Python 寫的.Is not to say that the observability of data science is not important,But in the field of data science observability tools(大多是 Juptyter And quick feedback)是不同的.

為什麼可觀測性很重要

所以,Why observability important?在軟件開發生命周期(SDLC)中,Observability is a key part of the.

Delivery of an application is not the end,This is just the beginning of a new cycle.在這個周期中,The first stage is to confirm that this new version run normal.否則的話,Probably need to roll back.What are the function normal operation?What features are slightly wrong?You need to know what happened,To know what to do next.These things sometimes doesn't work in strange ways.Whether it's natural,Or the problem of the underlying infrastructure,Or applied to a state of strange,These things may at any time for any reason to stop working.

在標准 SDLC 之外,You need to know everything is in the running.如果沒有,Is there a way to know is how can not run,這是非常關鍵的.

反饋

The first part of the observability is getting feedback.When the code when it is doing what of information, are,Feedback can help in many ways.In the simulation environment or test environment,Feedback helps to found the problem,更重要的是,At a faster way to categorize them.This can be improved in the validation step tools and communication.

當進行金絲雀部署canary deploymentOr change the characteristic sign,You need to know whether to continue,Or wait for longer time,或者回滾,Feedback is very important.

監控

Sometimes you don't doubt there are some things to.Maybe is a dependency service has a problem,Or a social networking site out of your website.Perhaps in the relevant system have complex operations,Then you want to make sure that your system can perfect processing.在這些情況下,You want to put the observability system data integration to the control panel.

When writing an application,The control panel needs to be part of the design standards.Only when your application can give these data sharing control panel,They will put these data display.

警報

Look at the control panel more than 15 Minutes just like watching paint dry.No one should suffer from this torture.對於這種任務,We have alarm system.Alarm system observability data compared with the expected data,If they don't match, notice.Complete in-depth study time management is beyond the scope of this article.然而,從兩方面來說,Observable application isAlarm and friendlyalert-friendly

  • They have enough,足夠好的數據,Alarm is of high quality.
  • Alert enough data,Or the receiver can easily get data,This helps to find the source.

High quality alert has three features:

  • Less misstatement:如果有警報,那一定是有問題了.
  • The omission of less:如果有問題,It must be a alarm trigger.
  • 及時性:The alarms will quickly in order to reduce recovery time.

These three characteristics are conflicting with each other.You can raise the standards of monitoring to reduce the false alarm,At the cost of increased the omission of.You can also by lowering the threshold of the monitoring to reduce the omission of,At the cost of increasing misstatement.通過收集更多數據,You can also reduce the misstatement or omission,And the price is decreased timeliness.

At the same time improve the three parameters is more difficult for.This requires high quality observability data.Higher quality data can improve the three characteristics at the same time.

日志

Some people like to laugh at with print to debug method.但是,In a world of most of the software are not you the machine running,You can do is print debugging.Logging is a form of print debugging.盡管它有很多缺點,但 Python Logging library provides a standardized logging.更重要的是,It means that you can use these libraries to log.

Applications will be responsible for allocation of log record way.ironically,In the application to configure logging is responsible for the many years later,Now more and more not so.In modern container編排orchestration環境中,Modern application record standard error and the standard output,並且信任編排orchestrationSystem can be reasonable processing log.

然而,You should not rely on libraries,或者說,Any other place.If you want to make people know what happened,使用日志,而不是打印.

日志級別

Logging is one of the most important function 日志級別.Different level of logging can give you a reasonable filter and shunt log.But it is only under the condition of log level to maintain consistent can do.最後,You should log level consistent throughout the application.

Choose not compatible with the semantic repository can be back by the application layer in the appropriate configuration to repair,It only takes through the use of Python The most important general style to do:getLogger(__name-_).

Most reasonable library will follow this agreement.過濾器FiltersCan be modified in situ before log object from them.You may be added a filter to the handler,According to the name of the handler to bend the message,Has the appropriate level.

import logging
LOGGER=logging.getLogger(__name__)

考慮到這一點,Now you must clear the log level semantic.There are many options,But these are my favorite:

  • Error:Send an instant warning.The application in a state of need operator attention.(這意味著包含 Critical 和 Error
  • Warning:I like to call these“Work time alarm”.這種情況下,Someone should look in a working days.
  • Info:It is in normal work process of.If you doubt there is a problem,This is used to help people to understand the application in what to do.
  • Debug:默認情況下,This should not appear in a production environment.In the simulation environment or development environment,可以發出來,也可以不發.如果需要更多的信息,In a production environment can also be specially open.

Don't under any circumstances in the log contain個人身份信息Personal Identifiable Information(PII)或密碼.No matter what level of logging is,都是如此,Such as the level changes,Activate the debug level and so on.Log polymerization system are rarely PII 安全PII-safe的,特別是隨著 PII The continuous development of laws and regulations(HIPAA、GDPR 等等).

日志聚合

Almost all modern systems are distributed.冗余redundancy擴展性scaling,有時是管轄權jurisdictionalNeed more horizontal distribution.Micro service means vertical distribution.Log in to view logs each machine is not realistic.For reasonable control reason,Allows the developer to login to the machine will give them more permissions,這不是個好主意.

All log should be sent to an aggregator.There are some business plan,你可以配置一個 ELK 棧,Or you can also use other database(SQL 或則 no-SQL).As a real low technology solutions,You can log write file,And then sends them to the object store.有很多解決方案,But the most important thing is to choose a,And everything will be aggregated together.

記錄查詢

After the record all the things to a place,會有很多日志.Specific aggregators can define how to write a query,But through the search of the store and write NoSQL 查詢,Record query to match the source and the details are very useful.

指標抓取

指標抓取Metric Scraping是一個服務器拉取server pull模型.Index server regularly and application connection,And pull index.

最後,This means that the server needs to connect and find all the relevant application server.

以 Prometheus 為標准

If your index aggregator is Prometheus,那麼  Format as a端點endpoint是很有用的.但是,Even if the aggregator is not Prometheus,也是很有用的.Almost all of the systems are included and Prometheus The endpoint compatible墊片shim

使用客戶端 Python Library to your applications a Prometheus 墊片,This will enable it to by most of the indicators aggregator grab.當 Prometheus Find a server,It is expected to find a target endpoint.This is often part of the application routing,通常在 /metrics 路徑下.不管 Web What is the application platform,If you can run under an endpoint of a custom type custom byte stream,Prometheus You can grab it.

For most popular frameworks,There is always a middleware plugin or something similar to collect index,Such as delay and error rate.Usually it is not enough.You need to collect a custom application data:比如,Each endpoint cache命中/缺失hit/miss率,數據庫延遲,等等.

使用計數器

Prometheus 支持多個數據類型.An important and subtle type is the counter.Counter is always a work in progress —— 但有一點需要注意.

When applied to reset,計數器會歸零.The counter of the“歷時epochs”Through the counter“創建時間”Send as metadata to manage.Prometheus Know not to compare two different歷時epochs的計數器.

Use instrument values

Instrument value is simple a lot:They measure instantaneous value.Use them to measure at the ups and downs of data:比如,Distribution of the total memory size,緩存大小,等等.

使用枚舉值

Enumeration values for the entire state of the application is very useful,Although they can be collected in the form of a finer.比如,You are using aFunction of door controlfeature-gating框架,A state has more than one(比如,使用中、關閉、屏蔽shadowing 等)的功能,Maybe use enumeration would be more useful.

分析

Analysis on different indicators,Because they correspond to the continuous events.比如,在網絡服務器中,Event is an external request and work.特別是,In the event before the event analysis cannot be sent.

Event contains specific indicators:延遲,數量,And other details of the service request of,等等.

結構化日志

Now a possible option is to log structured.Send events to send only with correct format effective載荷payload的日志.This data can be from the log aggregator request,然後解析,And put them in a suitable system,So I can for the visibility of it.

錯誤追蹤

You can use a log to track error,Error analysis can also be used to track.But a special error system is worth.An optimization system for error can send more mistakes,Because the error is rare after all.So that it can send the correct data,And with these data,It can make more intelligent thing.Python Errors in the tracking system is usually associated with general exception handling,然後收集數據,And put it to a special error aggregator.

使用 Sentry

很多情況下,自己運行 Sentry 是正確的做法.當錯誤發生時,Means that something is wrong.It is impossible to delete sensitive data in a reliable way of,Because there must be a will appear sensitive data is sent to the should not place.

通常,This work will not be very big:Abnormal does not often appear.最後,This system does not need high quality,Also don't need high reliability backup.The errors of yesterday and should already be repaired,希望如此,如果沒有,You'll also find!

快速、安全、可重復:All have to

Observable system development faster,Because they can give you feedback.They run up also more secure,Because when a problem,They also earlier to let you know.最後,Because there is a feedback loop,Observability also helps to build a repeatable process around it.Observability can let you know your application.And know more about them,Is half the battle.

磨刀不誤砍柴功

Build all the observable layer is a difficult thing.Always let a person feel is a waste of work,或者更像是“可以有,但是不急”.

Then can you do this?也許吧,但是不應該.All right behind the observability can accelerate the construction of a stage of development:測試、監控,Even is to train people.In a industry and science and technology industry turmoil,Reduce to train people, effort is worth it.

事實上,可觀測性很重要,So write it out as soon as possible,Then you can in the whole process for maintenance.反過來,It will also help you maintain your software.


via: 

作者: 選題: 譯者: 校對:

本文由  原創編譯, 榮譽推出



  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved