您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

基於Python深度圖生成3D點雲

編輯：Python

文章目錄

前言
二維RGB圖像
成像原理
數據准備
圖片加載
算法實現
生成點雲
點雲顯示
完整代碼
總結

前言

廢話不多說，直接開造。這裡的話我們有兩個目標，第一個是如何把一個2維圖片上的點映射到3維空間。第二就是如何生成3D點雲。當然實際上這是一個大問題，因為只要解決了第一個問題，第二個問題就是“送分”

二維RGB圖像

在說到3D點雲之前我們就不得不說到RGB圖像，也就是一些二維圖像。

圖像以像RGB三個通道的形式進行存儲。也就是這樣：

平時我們看到的就是左邊的2D圖像，實際上是以右邊的形式存儲的。

那麼從我們的矩陣角度來看的話，大概是這個樣子的：

[
[[r,g,b],[r,g,b],[r,g,b],[r,g,b]],
[[r,g,b],[r,g,b],[r,g,b],[r,g,b]],
...
]

這裡面存在了兩組信息，
第一組就是我們最容易忽略的位置信息，也就是像素點的位置。
第二組就是我們的色彩信息。

顯然如果我們想要生成3D點雲的話，我們的色彩信息是必要的。

而位置信息，在二維平面上，是在3為空間上的映射

那麼要想得到3D點雲，那麼我們就必須得到完整的位置坐標，也就是x,y,z

成像原理

那麼說到這裡的話，我們就不得不說到，攝像機是如何把咱們的影像映射出來的了。

大概他是這樣成像的：

這裡咱們有三個東西需要注意一下。

首先是我們的攝像機位置
之後是相片的位置也就是中間那個
最後是我們物品實際位置

所以我們實際上的一個二維圖像，就是一個投影，在實際上的話，我們可以理解為深度為1的空間。
但是這個投影

所以的話，用數學矩陣的形式表示為：

其中u,v為二維平面上的任意點坐標，1為深度，Zc就是咱們的二維平面上的點。

其中R,T是外參矩陣，並且我們的世界坐標原點就是相機的坐標原點
所以，R，T可以取下面的矩陣，（具體原理的話比較復雜，我也不是很懂為什麼會這樣，我回頭再補補數學）

帶入得到：

最後轉換得到這個公式：

那麼接下來咱們就對這個公式進行處理，來生成我們的點雲

數據准備

前面通過公式我們可以了解到，一個二維圖像，具備了RGB信息，還有對應的那個像素點在二維空間的“投影”。我們需要將這個點重新還原到三維空間。所以這裡需要使用到深度信息，而這個是需要一些專業攝像機才有的，比如iphone的，或者機器人的：

有了這個深度圖片，我們就可以還原坐標了。其實說到這兒，我們其實可以發現，RGB圖片的話其實只是提供了一下色彩如果要還原為3D點雲的話，因為深度圖和對應的rgb的矩陣的位置是一一對應的。

圖片加載

說到這裡，我們就可以開始我們的正式編碼了。

首先是咱們的圖片加載階段，這裡的話主要是我們先需要一些信息，和讀取圖片。


def __init__(self, rgb_file, depth_file, save_ply, camera_intrinsics=[784.0, 779.0, 649.0, 405.0]):
self.rgb_file = rgb_file
self.depth_file = depth_file
self.save_ply = save_ply
self.rgb = cv2.imread(rgb_file)
self.depth = cv2.imread(self.depth_file, -1)
print("your depth image shape is:",self.depth.shape)
self.width = self.rgb.shape[1]
self.height = self.rgb.shape[0]
self.camera_intrinsics = camera_intrinsics
self.depth_scale = 1000

下面的這個東西，就是咱們公式當中的u0,v0,dx,dy

camera_intrinsics=[784.0, 779.0, 649.0, 405.0]

這個要注意的是，這個玩意呢，是不同的設備相機有不同的參數，要根據自己的設置。

一般來說相機會給這樣的矩陣：

我們對應起來就可以了。

之後是要注意的是這個深度圖是uint16的，所以讀取的時候加個-1

這裡還有個東西：
深度圖與比例因子（scale_factor）
在代碼中是：

self.depth_scale = 1000

深度圖對應的尺度因子是深度圖中存儲的值與真實深度（單位為m）的比例
通常情況下，深度值以毫米為單位存儲在16位無符號整數中(0~65535)，因此要獲得以米為單位的z值，深度圖的像素需要除以比例因子1000。不過不同相機的的比例因子可能不同，我這裡的話是1000，這個看自己的實際的。

算法實現

之後就是咱們的算法實現了，這個主要是位置換算，得到真實的x,y,z


depth = np.asarray(self.depth, dtype=np.uint16).T
# depth[depth==65535]=0
self.Z = depth / self.depth_scale
fx, fy, cx, cy = self.camera_intrinsics
X = np.zeros((self.width, self.height))
Y = np.zeros((self.width, self.height))
for i in range(self.width):
X[i, :] = np.full(X.shape[1], i)
self.X = ((X - cx / 2) * self.Z) / fx
for i in range(self.height):
Y[:, i] = np.full(Y.shape[0], i)
self.Y = ((Y - cy / 2) * self.Z) / fy

之後,x,y,z就算好了。

注意的的是我們計算完成後得到的x，y , x 的每一組向量都是寬×高

生成點雲

現在咱們已經得到了具體的坐標。那麼接下來是要生成點雲的，我們要把剩下的色彩信息搞上去。
代碼很簡單：

 data_ply = np.zeros((6, self.width * self.height))
data_ply[0] = self.X.T.reshape(-1)
data_ply[1] = -self.Y.T.reshape(-1)
data_ply[2] = -self.Z.T.reshape(-1)
img = np.array(self.rgb, dtype=np.uint8)
data_ply[3] = img[:, :, 0:1].reshape(-1)
data_ply[4] = img[:, :, 1:2].reshape(-1)
data_ply[5] = img[:, :, 2:3].reshape(-1)
self.data_ply = data_ply
t2 = time.time()
print('calcualte 3d point cloud Done.', t2 - t1)

之後就是保存文件了。
目前點雲的主要存儲格式包括：pts、LAS、PCD、.xyz 和. pcap 等

例如：
pts 點雲文件格式是最簡便的點雲格式，直接按 XYZ 順序存儲點雲數據，可以是整型或者浮點型。
LAS 是激光雷達數據（LiDAR），存儲格式比 pts 復雜，旨在提供一種開放的格式標准，允許不同的硬件和軟件提供商輸出可互操作的統一格式。LAS 格式點雲截圖，其中 C：class(所屬類)，F：flight(航線號)，T：time(GPS 時間)，I：intensity(回波強度)，R：return(第幾次回波)，N：number of return(回波次數)，A：scan angle(掃描角)，RGB：red green blue(RGB 顏色值)。

等等，我們這裡的是ply
全名為多邊形檔案（Polygon File Format）或史丹佛三角形檔案（Stanford Triangle Format）。. 該格式主要用以儲存立體掃描結果的三維數值，透過多邊形片面的集合描述三維物體，與其他格式相較之下這是較為簡單的方法。. 它可以儲存的資訊包含顏色、透明度、表面法向量、材質座標與資料可信度，並能對多邊形的正反兩面設定不同的屬性。

格式為

頭部

頂點列表

面片列表

其他元素列表

 start = time.time()
float_formatter = lambda x: "%.4f" % x
points = []
for i in self.data_ply.T:
points.append("{} {} {} {} {} {} 0\n".format
(float_formatter(i[0]), float_formatter(i[1]), float_formatter(i[2]),
int(i[3]), int(i[4]), int(i[5])))
file = open(self.save_ply, "w")
file.write('''ply format ascii 1.0 element vertex %d property float x property float y property float z property uchar red property uchar green property uchar blue property uchar alpha end_header %s ''' % (len(points), "".join(points)))
file.close()
end = time.time()
print("Write into .ply file Done.", end - start)

點雲顯示

這個的話可以使用軟件

也可以使用open3d模塊

完整代碼

import cv2
import numpy as np
import open3d as o3d
import time
class point_cloud_generator():
def __init__(self, rgb_file, depth_file, save_ply, camera_intrinsics=[784.0, 779.0, 649.0, 405.0]):
self.rgb_file = rgb_file
self.depth_file = depth_file
self.save_ply = save_ply
self.rgb = cv2.imread(rgb_file)
self.depth = cv2.imread(self.depth_file, -1)
print("your depth image shape is:",self.depth.shape)
self.width = self.rgb.shape[1]
self.height = self.rgb.shape[0]
self.camera_intrinsics = camera_intrinsics
self.depth_scale = 1000
def compute(self):
t1 = time.time()
depth = np.asarray(self.depth, dtype=np.uint16).T
# depth[depth==65535]=0
self.Z = depth / self.depth_scale
fx, fy, cx, cy = self.camera_intrinsics
X = np.zeros((self.width, self.height))
Y = np.zeros((self.width, self.height))
for i in range(self.width):
X[i, :] = np.full(X.shape[1], i)
self.X = ((X - cx / 2) * self.Z) / fx
for i in range(self.height):
Y[:, i] = np.full(Y.shape[0], i)
self.Y = ((Y - cy / 2) * self.Z) / fy
data_ply = np.zeros((6, self.width * self.height))
data_ply[0] = self.X.T.reshape(-1)
data_ply[1] = -self.Y.T.reshape(-1)
data_ply[2] = -self.Z.T.reshape(-1)
img = np.array(self.rgb, dtype=np.uint8)
data_ply[3] = img[:, :, 0:1].reshape(-1)
data_ply[4] = img[:, :, 1:2].reshape(-1)
data_ply[5] = img[:, :, 2:3].reshape(-1)
self.data_ply = data_ply
t2 = time.time()
print('calcualte 3d point cloud Done.', t2 - t1)
def write_ply(self):
start = time.time()
float_formatter = lambda x: "%.4f" % x
points = []
for i in self.data_ply.T:
points.append("{} {} {} {} {} {} 0\n".format
(float_formatter(i[0]), float_formatter(i[1]), float_formatter(i[2]),
int(i[3]), int(i[4]), int(i[5])))
file = open(self.save_ply, "w")
file.write('''ply format ascii 1.0 element vertex %d property float x property float y property float z property uchar red property uchar green property uchar blue property uchar alpha end_header %s ''' % (len(points), "".join(points)))
file.close()
end = time.time()
print("Write into .ply file Done.", end - start)
def show_point_cloud(self):
pcd = o3d.io.read_point_cloud(self.save_ply)
o3d.visualization.draw([pcd])
if __name__ == '__main__':
camera_intrinsics = [378.998657, 378.639862, 321.935120, 240.766663]
rgb_file = "data/1.jpg"
depth_file = "data/1.png"
save_ply = "data.ply"
a = point_cloud_generator(rgb_file=rgb_file,
depth_file=depth_file,
save_ply=save_ply,
camera_intrinsics=camera_intrinsics
)
a.compute()
a.write_ply()
a.show_point_cloud()

效果如下：

總結

這裡的話其實還是從一張深度圖到3d點雲，實際上有時候我們可能只需一組坐標然後還原，那麼這部分的話，要實現的話也不難，不過要重新做一點轉換，公式還是那個公式。這裡的話就不能在多說了，而且這裡還有一點值得一提的是這個通過這種方式得到的3d點雲其實怎麼說呢，並不是特別准，所以這邊還是要更加牛批的人工智能算法的，目前有個集成玩意可以玩玩的是middlepipe。這方面的話Google還是得是他。