国产精品久久久久久爽爽爽麻豆色哟哟,日韩欧美在线免费观看,色欲天香天天免费视频,国产日本视频完整版无删减在线观看,俺去也亚洲,奇米影视第四色7777,美女操逼动漫,91成人影库一级A片

點(diǎn)擊上方“小白學(xué)視覺(jué)”，選擇加"星標(biāo)"或“置頂”

重磅干貨，第一時(shí)間送達(dá)

在短短的兩個(gè)星期內(nèi)，ChatGPT 迅速成為公眾聊天的焦點(diǎn)。超過(guò)一百萬(wàn)的人與OpenAI的新聊天機(jī)器人“交談”，讓它寫(xiě)詩(shī)和大學(xué)論文，生成創(chuàng)意食譜，創(chuàng)建虛擬機(jī)…它也被用來(lái)撰寫(xiě)新聞文章和YouTube視頻的介紹。

作為計(jì)算機(jī)視覺(jué)公司的機(jī)器學(xué)習(xí)工程師，我在過(guò)去的幾天里一直嘗試了解 ChatGPT 對(duì) CV 了解的極限。我想知道這種語(yǔ)言模型對(duì)該領(lǐng)域的未來(lái)（和現(xiàn)在）意味著什么。

該帖子分為 5 個(gè)部分：

什么是ChatGPT？
ChatGPT擅長(zhǎng)的地方
面臨的挑戰(zhàn)
在哪些方面需要極度謹(jǐn)慎
為什么可以為 ChatGPT 授予CV工程師

什么是ChatGPT？

ChatGPT 于2022年11月30日發(fā)行，是OpenAI的最新產(chǎn)品。像GPT1，GPT2，GPT3和指令之前一樣，ChatGPT是一種Generative PretrainTed Transformer（GPT）模型，這是一種具有“self - attention”概念的語(yǔ)言模型，它允許該模型靈活地識(shí)別哪些部分是有效輸入。

大語(yǔ)言模型（LLMs）經(jīng)過(guò)大量文本數(shù)據(jù)（例如書(shū)籍和文章）的訓(xùn)練，以了解人類(lèi)語(yǔ)言的模式和結(jié)構(gòu)。這使他們能夠生成聽(tīng)起來(lái)更符合人類(lèi)講話(huà)習(xí)慣的文本，從而使它們對(duì)于語(yǔ)言翻譯類(lèi)和問(wèn)答類(lèi)等任務(wù)非常有效。

在過(guò)去的幾年中，LLMs一直在迅速增長(zhǎng)。這些模型的規(guī)模成倍增加：盡管2017年推出的第一個(gè) Transformer 模型具有6500萬(wàn)個(gè)參數(shù) —— GPT3，直到2021年中，仍具有1750億個(gè)參數(shù)。隨著它們規(guī)模的增大，其表達(dá)能力也迅速加強(qiáng)。ChatGPT 是在更新版本的GPT3的最新版本中創(chuàng)建的，稱(chēng)為GPT3.5。這種巨大的表達(dá)能力以及對(duì)其進(jìn)行訓(xùn)練的數(shù)據(jù)（大概與GPT3相似），使得 ChatGPT 如此有能力理解包括各種人類(lèi)和編程語(yǔ)言，數(shù)字方程式和和弦的輸入。

作為一種生成模型，ChatGPT是專(zhuān)門(mén)設(shè)計(jì)的，旨在根據(jù)其從訓(xùn)練數(shù)據(jù)中學(xué)到的模式創(chuàng)建新文本。聊天機(jī)器人不僅可以及時(shí)給出響應(yīng)，而且還能夠集成用戶(hù)提供的反饋，以更好地完成所需的任務(wù)。當(dāng)然，這不是完美的，但是此功能為一系列交互式應(yīng)用打開(kāi)了新的大門(mén)。

僅供參考：本節(jié)中的三個(gè)句子由Chatgpt撰寫(xiě)。我會(huì)讓你猜測(cè)哪一個(gè)！

ChatGPT擅長(zhǎng)的地方

據(jù)我所知，ChatGPT 在計(jì)算機(jī)視覺(jué)應(yīng)用程序方面，比較可靠的主要有三類(lèi)：commonplace code，dressed individual method calls 和 clean concatenations of simple components。ChatGPT對(duì)這三個(gè)類(lèi)別中任何一個(gè)中的查詢(xún)的回答都比較令人滿(mǎn)意。我將舉例說(shuō)明每個(gè)例子。

commonplace code

這應(yīng)該不足為奇，但是一個(gè)接受過(guò)包括文本和代碼在內(nèi)的大型語(yǔ)料庫(kù)訓(xùn)練的生成模型，通常非常擅長(zhǎng)生成頻繁出現(xiàn)且在互聯(lián)網(wǎng)上幾乎沒(méi)有變化的代碼塊。當(dāng)基于代碼的解決方案基本上是規(guī)范的(并且可能在訓(xùn)練數(shù)據(jù)中無(wú)處不在)時(shí)，ChatGPT 的概率偏好將以很高的概率生成經(jīng)過(guò)驗(yàn)證的真實(shí)結(jié)果。

這方面的兩個(gè)例子包括在信號(hào)處理和計(jì)算機(jī)視覺(jué)(以及其他領(lǐng)域)中無(wú)處不在的快速傅里葉變換(FFT) ：

import numpy as np
def fft(x):    """Compute the discrete Fourier Transform of the 1D array x"""    x = np.asarray(x, dtype=float)    N = x.shape[0]
    if N <= 1:  # base case        return x
    # split the array into even and odd components    even = fft(x[0::2])    odd = fft(x[1::2])
    # compute the FFT of the even and odd components    T = [np.exp(-2j*np.pi*k/N)*odd[k] for k in range(N//2)]    return np.concatenate([even + T, even - T])

池化層，這是所有卷積神經(jīng)網(wǎng)絡(luò)必不可少的組成部分（CNN）：

def min_pooling(inputs, pool_size):    # Get the dimensions of the input array    n_rows, n_cols, n_channels = inputs.shape        # Calculate the size of the output array    out_rows = n_rows // pool_size    out_cols = n_cols // pool_size        # Create an empty array to store the output    outputs = np.zeros((out_rows, out_cols, n_channels))        # Iterate over the input array and compute the output    for r in range(out_rows):        for c in range(out_cols):            for k in range(n_channels):                # Extract a patch of the input array                patch = inputs[r*pool_size:(r+1)*pool_size, c*pool_size:(c+1)*pool_size, k]                                # Compute the min of the patch                outputs[r, c, k] = np.min(patch)                    return outputs

當(dāng)然，這些實(shí)現(xiàn)并不總是有效的，比如上面的 min _ pool() 函數(shù)，它使用嵌套的 for 循環(huán)。對(duì)于像 ChatGPT 這樣的半黑盒人工智能模型，我們并不總是清楚這種解決方案是為了方便而有意選擇的，或者它真的選擇了它認(rèn)為是最優(yōu)的解決方案。在這樣的場(chǎng)景中，您可以使用 ChatGPT 以反饋為導(dǎo)向的特性來(lái)請(qǐng)求改進(jìn)。例如，在共享的情況下，可以指示 ChatGPT 并行化代碼或向量化 for 循環(huán)。知道要求哪些改進(jìn)是絕對(duì)必要的！

Dressed method calls

同樣，ChatGPT 對(duì)所有常見(jiàn)的機(jī)器學(xué)習(xí)和計(jì)算機(jī)視覺(jué)庫(kù)（直到2021年）都有很好的掌握，包括 PyTorch、 TensorFlow、 Scikit-learn、 PIL、 Skimage 和 OpenCV。當(dāng)聊天機(jī)器人可以調(diào)用這些庫(kù)中修飾方法進(jìn)行適當(dāng)?shù)念A(yù)處理步驟（如輸入輸出處理、將彩色圖像轉(zhuǎn)換為灰度以及 reshape 數(shù)組），可以處于最佳狀態(tài)。

這方面的一些例子，從最簡(jiǎn)單到最復(fù)雜，都在銳化圖像：

from PIL import Image, ImageFilter
# Open the imageim = Image.open('image.jpg')
# Apply a sharpening filtersharpened_im = im.filter(ImageFilter.SHARPEN)
# Save the sharpened imagesharpened_im.save('sharpened_image.jpg')

使用 Otsu 閾值：

from skimage import iofrom skimage.filters import threshold_otsu
# Load the imageimage = io.imread('image.png')
# Convert the image to grayscalegray_image = np.mean(image, axis=2)
# Calculate the Otsu thresholdthreshold = threshold_otsu(gray_image)
# Apply the threshold to the image to create a binary imagebinary_image = gray_image > threshold

并計(jì)算一組圖像的 tail-distributed stochastic neighbor（t-SNE）：

import numpy as npfrom scipy.spatial.distance import pdist, squareformfrom sklearn.manifold import TSNEfrom scipy.spatial.distance import pdist, squareformfrom PIL import Image
def tsne(images, perplexity=30.0, learning_rate=200.0, n_components=2):    # Convert the images to grayscale    grayscale_images = [np.array(image.convert('L')) for image in images]
    # Stack the grayscale images into a single array    stacked_images = np.stack(grayscale_images)
    # Flatten the stacked images into a single array    flattened_images = stacked_images.reshape(len(images), -1)
    # Compute the pairwise distances between the flattened images    distances = squareform(pdist(flattened_images, 'euclidean'))
    # Compute the low-dimensional embedding of the images using t-SNE    tsne = TSNE(perplexity=perplexity, learning_rate=learning_rate, n_components=n_components, init='random')    embedding = tsne.fit_transform(distances)
    return embedding

不幸的是，ChatGPT 并不熟練，甚至不熟悉新的CV庫(kù)，比如 FiftyOne。

Concatenating simple components

雖然 ChatGPT 的行為肯定是非理性的，但是模型具有基本的邏輯功能，可以將多個(gè)（簡(jiǎn)單的）輸入合成到復(fù)合輸出中：只需向 ChatGPT 詢(xún)問(wèn)表單中的一些問(wèn)題

“ X 和 Y 有什么共同點(diǎn)”？

例如，我要求 ChatGPT 執(zhí)行以下操作：

”編寫(xiě)代碼將兩個(gè)圖像的大小調(diào)整為(100,200) ，將數(shù)據(jù)集增加6個(gè)圖像，將所有輸入圖像合并為一個(gè)圖像，然后將所有圖像繪制在一個(gè)網(wǎng)格中。把這個(gè)提示寫(xiě)在第三個(gè)圖上“

它的解決方案是這樣：

import matplotlib.pyplot as pltfrom skimage import io, transformfrom skimage.util import random_noise
# load and resize imagesimg1 = io.imread("image1.png")img1 = transform.resize(img1, (100, 200))img2 = io.imread("image2.png")img2 = transform.resize(img2, (100, 200))
# augment dataset with 6 additional imagesaugmented_imgs = []for i in range(6):    augmented_imgs.append(random_noise(img1))    augmented_imgs.append(random_noise(img2))
# blend imagesblended_img = img1 * 0.5 + img2 * 0.5for img in augmented_imgs:    blended_img += img * 0.5 / len(augmented_imgs)
# plot all images in a gridfig, ax = plt.subplots(nrows=3, ncols=3)ax[0, 0].imshow(img1)ax[0, 1].imshow(img2)ax[0, 2].imshow(blended_img)for i in range(6):    ax[1 + i // 3, i % 3].imshow(augmented_imgs[i])
# add text to third plotax[0, 2].set_title("Resized, augmented, and blended images")

令人驚奇的其他地方

它也能夠毫無(wú)問(wèn)題地完成其他任務(wù)包括：

1. 在三維點(diǎn)云上執(zhí)行分割(使用 DBSCAN)

2. 使用 DeOldify 恢復(fù)舊圖像的顏色

3. 在視頻中執(zhí)行目標(biāo)跟蹤

面臨的挑戰(zhàn)

在經(jīng)過(guò)和 ChatGPT 進(jìn)行有關(guān)”計(jì)算機(jī)視覺(jué)“任務(wù)的交流之后，似乎仍然存在之前反復(fù)出現(xiàn)的問(wèn)題：long-tail scenarios, mathy manipulations, 和 expansive code blocks。與前一節(jié)一樣，我將給出每個(gè)例子。

Long tail

在任何領(lǐng)域，都會(huì)有各種各樣的任務(wù)，這些任務(wù)是某些子領(lǐng)域的主要內(nèi)容，但是與用于訓(xùn)練 LLMs 的龐大語(yǔ)料庫(kù)中更常見(jiàn)的主題相比，它們相形見(jiàn)絀。ChatGPT 在這些領(lǐng)域有相當(dāng)一部分的問(wèn)題，并且在小眾主題上提示時(shí)對(duì)細(xì)節(jié)非常敏感。

這方面的一個(gè)例子是 3D 計(jì)算機(jī)視覺(jué)，這是一個(gè)小但強(qiáng)大的子領(lǐng)域計(jì)算機(jī)視覺(jué)涉及三維空間數(shù)據(jù)。ChatGPT 有關(guān)于點(diǎn)云的工作知識(shí)，

“點(diǎn)云是 3D 空間中的一組點(diǎn)，代表一個(gè)物體的表面。點(diǎn)云中的每個(gè)點(diǎn)都由它的 3D 坐標(biāo)(x，y 和 z)定義，還可能包括其他信息，如顏色或法向量。點(diǎn)云通常用于計(jì)算機(jī)視覺(jué)和 3D 建模應(yīng)用”

當(dāng)我要求它“編寫(xiě)代碼來(lái)定義作用于點(diǎn)云的 PointNet 體系結(jié)構(gòu)”時(shí)，它的回復(fù)包括 symmetric max pooling 函數(shù)，該函數(shù)允許模型直接作用于非結(jié)構(gòu)化的點(diǎn)云數(shù)據(jù)。

import tensorflow as tf
# Define the input layerinput_layer = tf.keras.layers.Input(shape=(None, 3))
# Use a shared MLP to process the input point cloudmlp1 = tf.keras.layers.Dense(64, activation="relu")mlp2 = tf.keras.layers.Dense(64, activation="relu")mlp3 = tf.keras.layers.Dense(64, activation="relu")x = mlp1(input_layer)x = mlp2(x)x = mlp3(x)
# Use max pooling to get the features of the entire point cloudglobal_features = tf.keras.layers.GlobalMaxPooling1D()(x)
# Use a fully connected layer to predict the outputoutput_layer = tf.keras.layers.Dense(units=1, activation=None)(global_features)
# Create the modelmodel = tf.keras.Model(inputs=input_layer, outputs=output_layer)

”這段代碼定義了一個(gè) PointNet 體系結(jié)構(gòu)，它接受一個(gè)點(diǎn)云作為輸入，并使用一個(gè)共享的多層感知器(MLP)來(lái)獨(dú)立處理每個(gè)點(diǎn)。然后，每個(gè)點(diǎn)的 MLP 輸出通過(guò)一個(gè)全局最大池化層傳遞，該層提取整個(gè)點(diǎn)云的特征。最后，根據(jù)點(diǎn)云的全局特征，采用一個(gè)全連接層來(lái)預(yù)測(cè)輸出。

這只是 PointNet 體系結(jié)構(gòu)的一種可能的實(shí)現(xiàn)，根據(jù)項(xiàng)目的具體需求，可以對(duì)體系結(jié)構(gòu)進(jìn)行許多不同的變化和修改?！?/span>

當(dāng)我只提示 ChatGPT“為 PointNet 模型體系結(jié)構(gòu)編寫(xiě)代碼”(省略了點(diǎn)云的明確提示)時(shí)，我得到的回應(yīng)如下:

import torchfrom torch import nn
class PointNet(nn.Module):  def __init__(self):    super(PointNet, self).__init__()
    # define the neural network layers    self.fc1 = nn.Linear(3, 64)    self.fc2 = nn.Linear(64, 64)    self.fc3 = nn.Linear(64, 64)    self.fc4 = nn.Linear(64, 128)    self.fc5 = nn.Linear(128, 1024)    self.fc6 = nn.Linear(1024, 512)    self.fc7 = nn.Linear(512, 256)    self.fc8 = nn.Linear(256, 128)    self.fc9 = nn.Linear(128, 64)    self.fc10 = nn.Linear(64, 10)
  def forward(self, x):    # apply the neural network layers to the input point cloud    x = F.relu(self.fc1(x))    x = F.relu(self.fc2(x))    x = F.relu(self.fc3(x))    x = F.relu(self.fc4(x))    x = F.relu(self.fc5(x))    x = F.relu(self.fc6(x))    x = F.relu(self.fc7(x))    x = F.relu(self.fc8(x))    x = F.relu(self.fc9(x))    x = self.fc10(x)    return x

”此代碼定義一個(gè)從 nn.Module 擴(kuò)展的 PointNet 類(lèi)。PointNet 類(lèi)定義了一個(gè)具有10個(gè)全連接層的神經(jīng)網(wǎng)絡(luò)，這些層在該類(lèi)的 __init__ 方法中定義。forward 方法將這些層應(yīng)用于輸入點(diǎn)云并返回模型的輸出。

要使用 PointNet 模型，我們將創(chuàng)建 PointNet 類(lèi)的一個(gè)實(shí)例，然后在輸入點(diǎn)云上調(diào)用 forward 方法。模型的輸出將是一個(gè)形狀為 (1,10) 的張量，其中包含了模型對(duì)輸入點(diǎn)云的預(yù)測(cè)”。

Math manipulations get messy

雖然 ChatGPT 能夠解決令人驚訝的獨(dú)立數(shù)學(xué)問(wèn)題，甚至能夠生成自己的證明，但是聊天機(jī)器人卻很難將多行數(shù)學(xué)代碼串在一起。例如，當(dāng)被要求在不使用計(jì)算機(jī)視覺(jué)庫(kù)的情況下實(shí)現(xiàn) Otsu 閾值或霍夫變換時(shí)，ChatGPT 就會(huì)失敗。

Otsu 閾值：

def otsus_method(img):    # calculate histogram of grayscale values    histogram = np.zeros(256)    for i in range(img.shape[0]):        for j in range(img.shape[1]):            histogram[img[i, j]] += 1        # normalize histogram    histogram = histogram / img.size        # calculate cumulative sum of normalized histogram    cumulative_sum = np.cumsum(histogram)        # calculate cumulative sum of squares of normalized histogram    cumulative_sum_of_squares = np.cumsum(histogram ** 2)        # initialize variables    optimal_threshold = 0    max_variance = 0        # iterate over all possible threshold values    for i in range(256):        # calculate background and foreground probabilities        p_background = cumulative_sum[i]        p_foreground = 1 - p_background                # calculate means        mean_background = np.sum(np.arange(256) * histogram[:i+1]) / p_background        mean_foreground = np.sum(np.arange(i+1, 256) * histogram[i+1:]) / p_foreground                # calculate between-class variance        variance = p_background * p_foreground * (mean_background - mean_foreground) ** 2                # update optimal threshold if necessary        if variance > max_variance:            max_variance = variance            optimal_threshold = i        return optimal_threshold

運(yùn)行 ChatGPT 生成的 Otsu 閾值代碼后拋出錯(cuò)誤

霍夫變換：

def hough_transform(img, theta_resolution=1, rho_resolution=1):    # convert image to grayscale    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)        # detect edges in the image using Canny edge detection    edges = cv2.Canny(img, 50, 200)        # create accumulator array with dimensions determined by theta and rho resolutions    theta = np.linspace(-90, 90, 180 // theta_resolution)    rho = np.linspace(-np.sqrt(2) * img.shape[0], np.sqrt(2) * img.shape[0], 2 * np.sqrt(2) * img.shape[0] // rho_resolution)    accumulator = np.zeros((len(theta), len(rho)))        # find non-zero edge pixels in the image    y_idxs, x_idxs = np.nonzero(edges)        # convert x and y indices to cartesian coordinates    points = np.stack([x_idxs, y_idxs], axis=1)        # iterate over all edge points    for x, y in points:        # iterate over all theta values        for i, t in enumerate(theta):            # calculate rho for current theta value            r = x * np.cos(t) + y * np.sin(t)                        # find closest rho value and increment corresponding accumulator bin            j = np.argmin(np.abs(rho - r))            accumulator[i, j] += 1        return accumulator, theta, rho

運(yùn)行 ChatGPT 生成的 Hough 轉(zhuǎn)換代碼后引發(fā)的錯(cuò)誤

在許多情況下，高級(jí)邏輯流并不一定是錯(cuò)誤的，或者至少不會(huì)太離譜。相反，在某行代碼上的調(diào)整，可能會(huì)引起上下文代碼的沖突：當(dāng) Otsu 閾值代碼試圖將不兼容形狀的操作數(shù)一起進(jìn)行廣播操作時(shí)，它失敗了。

其中一些數(shù)學(xué)錯(cuò)誤很容易糾正，無(wú)論是通過(guò)將 int 轉(zhuǎn)換為 float，還是重寫(xiě)代碼。其他時(shí)間，這些表面上的錯(cuò)誤可能掩蓋了更多潛在的問(wèn)題。如果您打算使用 ChatGPT 處理這些類(lèi)型的問(wèn)題，請(qǐng)謹(jǐn)慎處理?？梢宰?ChatGPT 編寫(xiě)的代碼越模塊化越好。

Expansive solutions breed errors

這可能不會(huì)太令人驚訝，但是生成的代碼越多，就越有可能出現(xiàn)錯(cuò)誤(或多個(gè)錯(cuò)誤)。畢竟，人們可以很容易地爭(zhēng)辯說(shuō)，人類(lèi)是一樣的。對(duì)于 Transformer 模型，一個(gè)眾所周知的問(wèn)題是，隨著輸出中“塊”數(shù)的增加，相干性(和模型性能)降低。包含計(jì)算機(jī)視覺(jué)代碼的塊也不例外。

例如，當(dāng)我要求 ChatGPT“從頭開(kāi)始實(shí)現(xiàn)光學(xué)字符識(shí)別(OCR)”時(shí)，該模型無(wú)可挽回地失敗了。這次失敗與其說(shuō)是一系列可以理解的數(shù)學(xué)錯(cuò)誤，不如說(shuō)是一個(gè)完全的混亂。

當(dāng)然，這并不意味著 ChatGPT 無(wú)法執(zhí)行 OCR。

當(dāng)我提示聊天機(jī)器人“應(yīng)用光學(xué)字符識(shí)別”時(shí)，它表明至少它知道如何在 pytesseract 中使用相關(guān)方法：

import pytesseract
# Load the license plate image# image = cv2.imread("license_plate.png")
# Convert the image to grayscalegray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply Otsu's thresholding to binarize the imagethreshold, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Use Tesseract to perform OCR on the license plate imagetext = pytesseract.image_to_string(binary, lang="eng")
# Print the recognized textprint(text)

當(dāng) ChatGPT 在 PyTorch、 TensorFlow 或 Sklearn 構(gòu)建神經(jīng)網(wǎng)絡(luò)時(shí)，這個(gè)問(wèn)題經(jīng)常會(huì)暴露出來(lái)，在這些網(wǎng)絡(luò)中，它可能會(huì)以一種無(wú)法完成任何使用任務(wù)的方式將一系列層串在一起。

在哪些方面需要極度謹(jǐn)慎

前三個(gè)類(lèi)別只是觸及了計(jì)算機(jī)視覺(jué)中使用 ChatGPT 可能出現(xiàn)問(wèn)題的表面。在某些方面，人工智能也可能會(huì)給你錯(cuò)得離譜的答案。

當(dāng)我請(qǐng)求 ChatgPT 給我“10個(gè)最佳目標(biāo)檢測(cè)模型的推斷時(shí)間年份”時(shí)，我得到了下面回應(yīng)：

# Define the names and years of the 10 best object detection modelsmodel_names = ["Faster R-CNN", "YOLOv3", "SSD", "RetinaNet", "Mask R-CNN", "R-FCN", "SSD with FPN", "YOLOv4", "Grid R-CNN", "FCOS"]model_years = [2015, 2018, 2016, 2017, 2017, 2016, 2017, 2020, 2019, 2019]
# Define the inference times of the 10 best object detection models (in milliseconds)model_inference_times = [100, 30, 20, 40, 60, 50, 25, 15, 35, 45]

粗略一看，這似乎相當(dāng)可信。這些模型在目標(biāo)檢測(cè)上都相當(dāng)有名ーー而且“最佳”在某種程度上有些主觀。年份也核實(shí)過(guò)了。所有這些都在推理時(shí)間上出現(xiàn)了問(wèn)題。這些時(shí)間來(lái)自不同的來(lái)源，都有不同的條件和假設(shè)。ChatGPT 的回應(yīng)讓人覺(jué)得他們可以把蘋(píng)果比作蘋(píng)果，這顯然是錯(cuò)誤的。

問(wèn)一個(gè)關(guān)于推理時(shí)間這種模糊問(wèn)題是一個(gè)糟糕的開(kāi)始。但是，即使是更好的問(wèn)題也可能給出錯(cuò)誤的答案。

同樣，每當(dāng) ChatGPT 給出沒(méi)有作為示例顯式給出的特定數(shù)值時(shí)，您可能應(yīng)該小心。有時(shí)候這些值是合適的，比如當(dāng)它知道用給定的平均值和標(biāo)準(zhǔn)差規(guī)范化 AlexNet 時(shí)：

當(dāng)我要求 ChatGPT 使用立體圖像執(zhí)行深度知覺(jué)時(shí)，它包含了一行代碼，內(nèi)容如下:

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

當(dāng)被問(wèn)及為什么選擇這些值時(shí)，它回應(yīng)：

# Calculate the depth of each pixeldepth = 9.0 * 0.54 / (disparity + 0.00001)

“這個(gè)公式利用立體相機(jī)系統(tǒng)的已知參數(shù)，例如基線(xiàn)(兩個(gè)相機(jī)之間的距離)和焦距，將視差值(左右圖像中像素水平位置的差異)轉(zhuǎn)換為深度值?！?/span>

這個(gè)公式中使用的特定值(9.0和0.54)只是示例，它們可能不適用于所有的立體相機(jī)系統(tǒng)。一般來(lái)說(shuō)，根據(jù)視差計(jì)算深度的公式是:

depth = baseline * focal_length / (disparity + epsilon)

其中 baseline 是兩個(gè)攝像機(jī)之間的距離，focal_length 是攝像機(jī)的焦距，disparity 是從立體圖像中計(jì)算出來(lái)的視差圖，epsilon 是用來(lái)避免被零除的一個(gè)小值。這些參數(shù)的值將取決于您所使用的特定立體攝像系統(tǒng)”。

另一個(gè)值得關(guān)注的領(lǐng)域是數(shù)據(jù)集和模型質(zhì)量的評(píng)估。當(dāng)被要求編寫(xiě)“評(píng)估圖像數(shù)據(jù)集的質(zhì)量和多樣性”的代碼時(shí)，ChatGPT 給出了以下代碼：

import osfrom PIL import Imagefrom sklearn.neighbors import NearestNeighbors
# Define the path to the dataset directorydataset_dir = './dataset'
# Load the images in the datasetimages = []for filename in os.listdir(dataset_dir):    # Load the image    image = Image.open(os.path.join(dataset_dir, filename))
    # Convert the image to grayscale and resize it to a fixed size    image = image.convert('L').resize((128, 128))
    # Add the image to the list of images    images.append(image)
# Flatten the images into a single arrayflattened_images = np.stack([np.array(image) for image in images]).reshape(len(images), -1)
# Compute the nearest neighbors of the flattened imagesnbrs = NearestNeighbors(n_neighbors=5).fit(flattened_images)distances, indices = nbrs.kneighbors(flattened_images)
# Compute the mean and standard deviation of the distances to the nearest neighborsmean_distance = np.mean(distances[:, 1:])std_distance = np.std(distances[:, 1:])
print('Mean distance to nearest neighbor:', mean_distance)print('Standard deviation of distances:', std_distance)

同時(shí)還附上了一個(gè)說(shuō)明：

”在本例中，images 變量是數(shù)據(jù)集中的圖像列表。首先從磁盤(pán)上的文件加載圖像，然后使用 PIL.Image 模塊中的 Convert 和 resize 方法將圖像轉(zhuǎn)換為灰度并調(diào)整到固定大小。然后使用堆棧和從數(shù)字模塊重塑圖像形狀的方法將圖像壓平成一個(gè)單獨(dú)的數(shù)組。nbrs 變量使用來(lái)自 skLearn.friends 模塊的 NearestNeighbors 類(lèi)計(jì)算圖像的最近鄰。mean_distance 和 std_distance 變量計(jì)算到最近鄰的距離的均值和標(biāo)準(zhǔn)差，這可以作為衡量數(shù)據(jù)集質(zhì)量和多樣性的指標(biāo)。平均距離越短，標(biāo)準(zhǔn)差越高，說(shuō)明數(shù)據(jù)集的質(zhì)量越高，種類(lèi)也越多。”

盡管這種方法并不一定是錯(cuò)誤的，或者在技術(shù)上是錯(cuò)誤的，但它至少是非常簡(jiǎn)單的。

當(dāng)我要求 ChatGPT 計(jì)算一個(gè)圖像數(shù)據(jù)集中的分布偏移時(shí)，我得到了一個(gè)類(lèi)似的令人沮喪的答案，它對(duì)原始像素值使用了 t 檢驗(yàn)(一個(gè)只比較兩組樣本平均值的檢驗(yàn))。你也可以看到，ChatGPT 試圖計(jì)算兩組圖像之間的均方差，這只有在圖像大小相同的情況下才有意義，但是我們?cè)谶@里會(huì)給聊天機(jī)器人一點(diǎn)懷疑的余地，然后繼續(xù)下一步。

# Import necessary librariesimport numpy as npfrom scipy.stats import ttest_indfrom sklearn.metrics import mean_squared_error
# Load the original and new datasetsoriginal_data = ...new_data = ...
# Calculate the mean squared error between the original and new imagesmse = mean_squared_error(original_data, new_data)
# Use a t-test to compare the means of the original and new datasetst, p = ttest_ind(original_data, new_data)
# Print the resultsprint("Mean squared error:", mse)print("T-test results: t =", t, "p =", p)
# If the mean squared error is large and the t-test p-value is small,# this indicates a significant distribution shift in the data

計(jì)算機(jī)器學(xué)習(xí)中的分布變化通常是一個(gè)困難的問(wèn)題，但至少對(duì)各種類(lèi)型的可能變化的基本理解是標(biāo)準(zhǔn)的，有多種比應(yīng)用 t 檢驗(yàn)更微妙的方法。

最后一個(gè)例子是有啟發(fā)性的：我提示 ChatGPT“計(jì)算我的模型被給予一組基本真值、預(yù)測(cè)和預(yù)測(cè)置信度得分的錯(cuò)誤程度”，它的回答的關(guān)鍵是“你可以使用一個(gè)像準(zhǔn)確度或精確度這樣的度量來(lái)衡量你的模型的表現(xiàn)”，這完全錯(cuò)過(guò)了分?jǐn)?shù)，而且根本沒(méi)有使用置信度得分。作為參考，在開(kāi)源計(jì)算機(jī)視覺(jué)庫(kù) FiftyOne 中，可以通過(guò)調(diào)用 dataset.cdomputer_mistakenness() 為 dataset 數(shù)據(jù)集計(jì)算這種類(lèi)型的數(shù)量。

為什么可以為 ChatGPT 授予CV工程師

然而，盡管有這些缺點(diǎn)，ChatGPT 仍然可以成為計(jì)算機(jī)視覺(jué)領(lǐng)域的一股力量。至少這是一個(gè)人的觀點(diǎn)。

您可能已經(jīng)注意到了我所包含的示例中的一個(gè)主題。ChatGPT 是一個(gè)工具，可以幫助計(jì)算機(jī)視覺(jué)工程師和從業(yè)人員更有效地完成他們的工作。這是一個(gè)非常強(qiáng)大的工具，也可以是相當(dāng)”危險(xiǎn)“的。就像任何工具一樣，它的好壞取決于它的使用者如何使用。

從編程的角度來(lái)看，和許多優(yōu)秀的人類(lèi)工程學(xué)能夠產(chǎn)生優(yōu)秀的機(jī)器生成代碼的原則是相同的：以模塊化設(shè)計(jì)為目標(biāo)，如果可能的話(huà)將高級(jí)任務(wù)分解為多個(gè)部分，利用現(xiàn)有的庫(kù)而不是重造輪子。

如果使用得當(dāng)，ChatGPT 可以幫助CV工程師更快地編寫(xiě)CV代碼，使人們可以利用 Scale、 V7和 Voxel51等公司的工具集，專(zhuān)注于建立高質(zhì)量的數(shù)據(jù)集和評(píng)估機(jī)器學(xué)習(xí)模型等重要任務(wù)。

計(jì)算機(jī)視覺(jué)工具集

FiftyOne 是由 Voxel51開(kāi)發(fā)的一個(gè)開(kāi)源機(jī)器學(xué)習(xí)工具集，它使數(shù)據(jù)科學(xué)團(tuán)隊(duì)能夠通過(guò)幫助他們管理高質(zhì)量的數(shù)據(jù)集，評(píng)估模型，發(fā)現(xiàn)錯(cuò)誤，可視化嵌入，更快地進(jìn)入生產(chǎn)，從而提高他們的計(jì)算機(jī)視覺(jué)模型的性能。

· END ·

好消息！
小白學(xué)視覺(jué)知識(shí)星球
開(kāi)始面向外開(kāi)放啦??????



下載1：OpenCV-Contrib擴(kuò)展模塊中文版教程
在「小白學(xué)視覺(jué)」公眾號(hào)后臺(tái)回復(fù)：擴(kuò)展模塊中文教程，即可下載全網(wǎng)第一份OpenCV擴(kuò)展模塊教程中文版，涵蓋擴(kuò)展模塊安裝、SFM算法、立體視覺(jué)、目標(biāo)跟蹤、生物視覺(jué)、超分辨率處理等二十多章內(nèi)容。

下載2：Python視覺(jué)實(shí)戰(zhàn)項(xiàng)目52講
在「小白學(xué)視覺(jué)」公眾號(hào)后臺(tái)回復(fù)：Python視覺(jué)實(shí)戰(zhàn)項(xiàng)目，即可下載包括圖像分割、口罩檢測(cè)、車(chē)道線(xiàn)檢測(cè)、車(chē)輛計(jì)數(shù)、添加眼線(xiàn)、車(chē)牌識(shí)別、字符識(shí)別、情緒檢測(cè)、文本內(nèi)容提取、面部識(shí)別等31個(gè)視覺(jué)實(shí)戰(zhàn)項(xiàng)目，助力快速學(xué)校計(jì)算機(jī)視覺(jué)。

下載3：OpenCV實(shí)戰(zhàn)項(xiàng)目20講
在「小白學(xué)視覺(jué)」公眾號(hào)后臺(tái)回復(fù)：OpenCV實(shí)戰(zhàn)項(xiàng)目20講，即可下載含有20個(gè)基于OpenCV實(shí)現(xiàn)20個(gè)實(shí)戰(zhàn)項(xiàng)目，實(shí)現(xiàn)OpenCV學(xué)習(xí)進(jìn)階。

交流群

歡迎加入公眾號(hào)讀者群一起和同行交流，目前有SLAM、三維視覺(jué)、傳感器、自動(dòng)駕駛、計(jì)算攝影、檢測(cè)、分割、識(shí)別、醫(yī)學(xué)影像、GAN、算法競(jìng)賽等微信群（以后會(huì)逐漸細(xì)分），請(qǐng)掃描下面微信號(hào)加群，備注：”昵稱(chēng)+學(xué)校/公司+研究方向“，例如：”張三 + 上海交大 + 視覺(jué)SLAM“。請(qǐng)按照格式備注，否則不予通過(guò)。添加成功后會(huì)根據(jù)研究方向邀請(qǐng)進(jìn)入相關(guān)微信群。請(qǐng)勿在群內(nèi)發(fā)送廣告，否則會(huì)請(qǐng)出群，謝謝理解~

ChatGPT：新晉CV工程師

重磅干貨，第一時(shí)間送達(dá)

Dressed method calls

Concatenating simple components

Math manipulations get messy

Expansive solutions breed errors

重磅干貨，第一時(shí)間送達(dá)