精品国产18久久久久久,91久久成人片,俺去一在线三区,久久99国产精品成人欧美,无码精品国产一区二区高潮,国产视频123区,亚洲AV无码国产精品综合,色色撸

在介紹 PyTorch 之前，我們將首先使用 numpy 實(shí)現(xiàn)網(wǎng)絡(luò)。

Numpy 提供了一個 n 維數(shù)組對象，以及許多用于操縱這些數(shù)組的函數(shù)。Numpy 是用于科學(xué)計(jì)算的通用框架。它對計(jì)算圖，深度學(xué)習(xí)或梯度一無所知。但是，我們可以使用 numpy 操作手動實(shí)現(xiàn)通過網(wǎng)絡(luò)的前向和后向傳遞，從而輕松地使用 numpy 使兩層網(wǎng)絡(luò)適合隨機(jī)數(shù)據(jù)：

# -*- coding: utf-8 -*-
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2Copy

PyTorch：張量

Numpy 是一個很棒的框架，但是它不能利用 GPU 來加速其數(shù)值計(jì)算。對于現(xiàn)代深度神經(jīng)網(wǎng)絡(luò)，GPU 通常會提供 50 倍或更高的加速，因此遺憾的是，numpy 不足以實(shí)現(xiàn)現(xiàn)代深度學(xué)習(xí)。

在這里，我們介紹最基本的 PyTorch 概念：張量。PyTorch 張量在概念上與 numpy 數(shù)組相同：張量是 n 維數(shù)組，而 PyTorch 提供了許多在這些張量上運(yùn)行的功能。在幕后，張量可以跟蹤計(jì)算圖和漸變，但它們也可用作科學(xué)計(jì)算的通用工具。

與 numpy 不同，PyTorch 張量可以利用 GPU 加速其數(shù)字計(jì)算。要在 GPU 上運(yùn)行 PyTorch Tensor，只需要將其轉(zhuǎn)換為新的數(shù)據(jù)類型。

在這里，我們使用 PyTorch 張量使兩層網(wǎng)絡(luò)適合隨機(jī)數(shù)據(jù)。像上面的 numpy 示例一樣，我們需要手動實(shí)現(xiàn)通過網(wǎng)絡(luò)的正向和反向傳遞：

# -*- coding: utf-8 -*-

import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2Copy

自動分級

PyTorch：張量和自定等級

在以上示例中，我們必須手動實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)的正向和反向傳遞。對于小型的兩層網(wǎng)絡(luò)，手動實(shí)施反向傳遞并不是什么大問題，但是對于大型的復(fù)雜網(wǎng)絡(luò)而言，可以很快變得非常麻煩。

幸運(yùn)的是，我們可以使用自動微分來自動計(jì)算神經(jīng)網(wǎng)絡(luò)中的反向傳遞。PyTorch 中的?autograd?軟件包正是提供了此功能。使用 autograd 時，網(wǎng)絡(luò)的正向傳遞將定義計(jì)算圖；圖中的節(jié)點(diǎn)為張量，邊為從輸入張量生成輸出張量的函數(shù)。然后通過該圖進(jìn)行反向傳播，可以輕松計(jì)算梯度。

這聽起來很復(fù)雜，在實(shí)踐中非常簡單。每個張量代表計(jì)算圖中的一個節(jié)點(diǎn)。如果x是具有x.requires_grad=True的張量，則x.grad是另一個張量，其保持x相對于某個標(biāo)量值的梯度。

在這里，我們使用 PyTorch 張量和 autograd 來實(shí)現(xiàn)我們的兩層網(wǎng)絡(luò)?，F(xiàn)在我們不再需要手動通過網(wǎng)絡(luò)實(shí)現(xiàn)反向傳遞：

# -*- coding: utf-8 -*-
import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    # An alternative way is to operate on weight.data and weight.grad.data.
    # Recall that tensor.data gives a tensor that shares the storage with
    # tensor, but doesn't track history.
    # You can also use torch.optim.SGD to achieve this.
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()Copy

PyTorch：定義新的 autograd 功能

在幕后，每個原始的 autograd 運(yùn)算符實(shí)際上都是在 Tensor 上運(yùn)行的兩個函數(shù)。?正向函數(shù)從輸入張量計(jì)算輸出張量。?向后函數(shù)接收相對于某個標(biāo)量值的輸出張量的梯度，并計(jì)算相對于相同標(biāo)量值的輸入張量的梯度。

在 PyTorch 中，我們可以通過定義torch.autograd.Function的子類并實(shí)現(xiàn)forward和backward函數(shù)來輕松定義自己的 autograd 運(yùn)算符。然后，我們可以通過構(gòu)造實(shí)例并像調(diào)用函數(shù)一樣調(diào)用新的 autograd 運(yùn)算符，并傳遞包含輸入數(shù)據(jù)的張量。

在此示例中，我們定義了自己的自定義 autograd 函數(shù)來執(zhí)行 ReLU 非線性，并使用它來實(shí)現(xiàn)我們的兩層網(wǎng)絡(luò)：

# -*- coding: utf-8 -*-
import torch

class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """

    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # To apply our Function, we use Function.apply method. We alias this as 'relu'.
    relu = MyReLU.apply

    # Forward pass: compute predicted y using operations; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()Copy

nn 模塊

PyTorch：nn

計(jì)算圖和 autograd 是定義復(fù)雜運(yùn)算符并自動采用導(dǎo)數(shù)的非常強(qiáng)大的范例。但是對于大型神經(jīng)網(wǎng)絡(luò)，原始的 autograd 可能會有點(diǎn)太低了。

在構(gòu)建神經(jīng)網(wǎng)絡(luò)時，我們經(jīng)常想到將計(jì)算安排在層中，其中某些層具有可學(xué)習(xí)的參數(shù)，這些參數(shù)將在學(xué)習(xí)期間進(jìn)行優(yōu)化。

在 TensorFlow 中，像 Keras ， TensorFlow-Slim 和 TFLearn 之類的軟件包在原始計(jì)算圖上提供了更高層次的抽象，可用于構(gòu)建神經(jīng)網(wǎng)絡(luò)。

在 PyTorch 中，nn包也達(dá)到了相同的目的。?nn包定義了一組模塊，它們大致等效于神經(jīng)網(wǎng)絡(luò)層。模塊接收輸入張量并計(jì)算輸出張量，但也可以保持內(nèi)部狀態(tài)，例如包含可學(xué)習(xí)參數(shù)的張量。?nn程序包還定義了一組有用的損失函數(shù)，這些函數(shù)通常在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時使用。

在此示例中，我們使用nn包來實(shí)現(xiàn)我們的兩層網(wǎng)絡(luò)：

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.gradCopy

PyTorch：優(yōu)化

到目前為止，我們通過手動更改持有可學(xué)習(xí)參數(shù)的張量(使用torch.no_grad()或.data來避免在自動分級中跟蹤歷史記錄）來更新模型的權(quán)重。對于像隨機(jī)梯度下降這樣的簡單優(yōu)化算法而言，這并不是一個巨大的負(fù)擔(dān)，但是在實(shí)踐中，我們經(jīng)常使用更復(fù)雜的優(yōu)化器(例如 AdaGrad，RMSProp，Adam 等）來訓(xùn)練神經(jīng)網(wǎng)絡(luò)。

PyTorch 中的optim軟件包抽象了優(yōu)化算法的思想，并提供了常用優(yōu)化算法的實(shí)現(xiàn)。

在此示例中，我們將使用nn包像以前一樣定義我們的模型，但是我們將使用optim包提供的 Adam 算法優(yōu)化模型：

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()Copy

PyTorch：自定義 nn 模塊

有時，您將需要指定比一系列現(xiàn)有模塊更復(fù)雜的模型。對于這些情況，您可以通過子類化nn.Module并定義一個forward來定義自己的模塊，該模塊使用其他模塊或在 Tensors 上的其他自動轉(zhuǎn)換操作來接收輸入 Tensors 并生成輸出 Tensors。

在此示例中，我們將兩層網(wǎng)絡(luò)實(shí)現(xiàn)為自定義的 Module 子類：

# -*- coding: utf-8 -*-
import torch

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()Copy

PyTorch：控制流+權(quán)重共享

作為動態(tài)圖和權(quán)重共享的示例，我們實(shí)現(xiàn)了一個非常奇怪的模型：一個完全連接的 ReLU 網(wǎng)絡(luò)，該網(wǎng)絡(luò)在每個前向傳遞中選擇 1 到 4 之間的隨機(jī)數(shù)，并使用那么多隱藏層，多次重復(fù)使用相同的權(quán)重計(jì)算最里面的隱藏層。

對于此模型，我們可以使用常規(guī)的 Python 流控制來實(shí)現(xiàn)循環(huán)，并且可以通過在定義前向傳遞時簡單地多次重復(fù)使用同一模塊來實(shí)現(xiàn)最內(nèi)層之間的權(quán)重共享。

我們可以輕松地將此模型實(shí)現(xiàn)為 Module 子類：

# -*- coding: utf-8 -*-
import random
import torch

class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

pytorch編程之通過示例學(xué)習(xí) PyTorch