理解和使用Pytorch搭建GAN


参考链接:https://becominghuman.ai/understanding-and-building-generative-adversarial-networks-gans-8de7c1dc0e25

原文标题:Understanding and building Generative Adversarial Networks(GANs)- Deep Learning with PyTorch


We’ll be building a Generative Adversarial Network that will be able to generate images of birds that never actually existed in the real world.

img

-These bird images are purely generated by the Deep Learning Model(GAN)-

img

Before we actually start building a GAN, let us first talk about the idea behind GANs. GANs were invented by Ian Goodfellow, he obtained his B.S. and M.S. in computer science from Stanford University and his Ph.D. in machine learning from the Université de Montréal,. This is the new big thing in the field of Deep Learning right now. Yann LeCun, the director of Facebook AI said :

“Generative Adversarial Networks is the most interesting idea in the last ten years in Machine Learning.”

何谓GANs ? 它有何用 ?

Neural Networks are good at classifying and predicting things, and AI Researchers wanted to make the neural net more human in nature by allowing it to CREATE rather than just letting it see things, and turns out that Ian Goodfellow was successful in inventing a class of Deep Learning Model which could do that.

GANs如何工作 ?

GANs contain two separate neural networks. Let us call one neural network as “G”, which stands for Generator and the other neural network as “D”, which is a Discriminator. The Generator first generates random images and a Discriminator sees those images and tells the Generator how real the generated images are.

img

生成器 :

In the starting phase, a Generator model takes random noise signals as input and generates a random noisy image as the output, gradually with the help of the Discriminator, it starts generating images of a particular class that look real.

判别器 :

The Discriminator which will be the opponent of Generator is fed with both the generated images as well as a certain class of images at the same time, allowing it to tell the generated how the real image looks like.

After reaching a certain point, the Discriminator will be unable to tell if the generate image is a real or a fake image, and that is when we can see images of a certain class(class that the discriminator is trained with.) being generated by out Generator that never actually existed before.

GAN的应用 :

  • 超分辨率(Super Resolution).

    img

  • Assisting Artists.

img

  • Element Abstraction.

    img

    上代码

    NOTE : The below explanation of the code is not prepared for a novice deep learning programmer , i expect you to be comfortable with the deep learning accent in python.

    Let us start by importing all the required python libraries for building our GAN. Please make sure PyTorch is installed in your computer before you start.

    #importing required libraries
    from __future__ import print_function
    import torch
    import torch.nn as nn
    import torch.nn.parallel
    import torch.optim as optim
    import torch.utils.data
    import torchvision.datasets as dset
    import torchvision.transforms as transforms
    import torchvision.utils as vutils
    from torch.autograd import Variable
    

    Now let us set the hyper-parameters which will be the batch-size and image-size in this case :

    # Setting hyperparameters
    batchSize = 64 
    imageSize = 64
    

In the first line, we have set the size of the batch to 64. And in the second line we have set the size of the images generated by the generator to 64 x 64 resolution.


Then we are going to create an object to perform image transformations as given below :

# Creating the transformations
transform = transforms.Compose([transforms.Scale(imageSize), 
                                transforms.ToTensor(), 
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),])

The above transformations are necessary to make the image compatible as an input to the neural network of the discriminator.


NOTE : In order to get the dataset, click here and you will be directed to https://github.com/venkateshtata/GAN_Medium.git , clone that repository into your local system and replace the dcgan.py file with the python file your writing to. the data folder contains the dataset.


Now lets load our dataset from a respective directory. The type of dataset we are going to be using here is a CIFAR-10 dataset. We are going to load them in batches, and make sure that the python file you are writing to is in the same directory for less complexity while importing the dataset.

# Loading the dataset
dataset = dset.CIFAR10(root = './data', download = True, transform = transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size = batchSize, shuffle = True, num_workers = 2)

We download the training set in the ./data folder and we apply the previous transformations on each image. Then use dataLoader to get the images of the training set batch by batch. Almost every element of the above code is self explanatory, the value of num_workers defines the number of threads that must be used to carry out the process of loading the training data.


As we will be dealing with multiple(2) neural networks here, we will be defining a universal function to initialise the weights of a given neural network by calling the function and passing the NN(Neural Network) into it.

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

The above weights_init function takes as input a neural network m and will initialise all its weights. This function will be called for each iteration during the training process.


img

Our first big step will be to define a class for our Generator neural network. We’ll start by creating a class that will be holding the architecture of the Generator, which will basically contain a sequence of layers that each input undergoes.

class G(nn.Module):
    def __init__(self):
            super(G, self).__init__()
            self.main = nn.Sequential(
                            nn.ConvTranspose2d(100, 512, 4, 1, 0, bias = False),
                            nn.BatchNorm2d(512),
                            nn.ReLU(True),
                            nn.ConvTranspose2d(512, 256, 4, 2, 1, bias = False),
                            nn.BatchNorm2d(256),
                            nn.ReLU(True),
                            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias = False),
                            nn.BatchNorm2d(128),
                            nn.ReLU(True),
                            nn.ConvTranspose2d(128, 64, 4, 2, 1, bias = False),
                            nn.BatchNorm2d(64),
                            nn.ReLU(True),
                            nn.ConvTranspose2d(64, 3, 4, 2, 1, bias = False),
                            nn.Tanh()
                        )

Breaking down the above code :

  • We have created a class ‘G’, referring to the Generator neural network, and inheriting from nn.module which contains all the tools required for building neural networks, which help us is placing different applications and and connections inside a given neural network.
  • Then we create a meta module of a neural network that will contain a sequence of modules such as convolutions, full connections, etc.
  • A great thing to observe from the above Fig 1.0 is that the structures of neural networks of both Generator and the Discriminator are inverse to each other, which basically means that in Generator, the Convolution must be in an inverse way, where the the input will be random noise vectors. Hence we start with an inverse convolution using ConvTranspose2d.
  • Then we normalize all the features along the dimension of the batch and apply a ReLU rectification to break the linearity. Click here for more detailed explanation of parameters used in the above functions.
  • We repeat the above operations again while changing the input nodes from ‘100’ to ‘512’, the number of feature maps from 512 to 256 and keeping the bias as False. [ Note: The values i am choosing in the above code are choices of researchers. ]
  • In the final ConvTranspose2d we will be outputting 3 filters as the output image of the generator is going to be a 3 channel(RGB) and we apply a Tanh rectification to break the linearity and stay between -1 and +1.

Now we need to create a tool which will be a forward function to propogate the signal inside the Generator.

def forward(self, input):
        output = self.main(input)
        return output

The input of the above function will be some random vector of size 100 as defined inside the G class. It returns the output containing the generated images. The initial image is made up random vectors.


Creating the generator :

netG = G() 
netG.apply(weights_init)

Here we are creating a generator object and initialising all the weights of the input neural network.


Now, lets start defining our Discriminator class that will be holding the architecture of a Discriminator.

class D(nn.Module):
def __init__(self):
        super(D, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(3, 64, 4, 2, 1, bias = False),
            nn.LeakyReLU(0.2, inplace = True),
            nn.Conv2d(64, 128, 4, 2, 1, bias = False),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace = True),
            nn.Conv2d(128, 256, 4, 2, 1, bias = False),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace = True),
            nn.Conv2d(256, 512, 4, 2, 1, bias = False),
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2, inplace = True),
            nn.Conv2d(512, 1, 4, 1, 0, bias = False),
            nn.Sigmoid()
        )

Breaking down the Discriminator :

  • Similar to the G class, the D Discriminator class is inheriting from the nn.module. The input of the Discriminator will be the image generated by the Generator, to which the Discriminator will be returning a number between 0 and 1 as output.
  • Since it takes a generated image of the generator, the first operation is going to be a convolution, hence we start with a convolution and apply LeakyReLU.
  • Observe that unlike the what we did in G class, we are using LeakyReLU here, which will take the negative slope till 0.2, and this comes from frequent experimentation, which i didn’t do, but researchers choice.
  • We use BatchNorm2d to normalize all the features along the dimension of the batch.
  • And at the end, we are using the classic old fashioned function, which is the sigmoid function to break the linearity and stay between 0 and 1.

Now, in order to forward propagate the signal into the Discriminator, we need to define a Forward class, which is going to carry the output of the generator to the Discriminator :

def forward(self, input):
        output = self.main(input)
        return output.view(-1)

In the final line we return the output which will be a value between 0 and 1, because we need to flatten passed NN to make sure the vectors are in the same dimension.


Creating the Discriminator :

netD = D() 
netD.apply(weights_init)

We create the discriminator object of the above class D and initialize all the weights of its neural network.


Now its time we train our Generative Adversarial Network. But before that we need to start by getting a criteria that will measure the error of prediction given by the discriminator. In order to achieve that, we are going to use BCE Loss(where BCE means Binary Cross Entropy.), which is perfect for Adversarial Neural Networks. Hence we need optimisers for both the generator as well as the discriminator.

criterion = nn.BCELoss()
optimizerD = optim.Adam(netD.parameters(), lr = 0.0002, betas = (0.5, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr = 0.0002, betas = (0.5, 0.999))

We start by creating a criterion object that will measure the error between the prediction and the target. Then we create optimisers for objects of both discriminator and the generator.

We are using Adam optimiser from the optim module, which is a highly advance optimal for stochastic gradient descent.


We’ll be training our neural nets for 25 epochs, hence :

for epoch in range(25):

Then we need to iterate over the images within the dataset, hence :

for i, data in enumerate(dataloader, 0):

First step is to update the weights of the neural network of the discriminator, hence we initialise the gradients of the discriminator to 0 with respect to the weights :

netD.zero_grad()

As we know that our discriminator must be trained with both the real and fake images at a time. Hence we will train the discriminator with a real image of the dataset first :

real, _ = data
        input = Variable(real)
        target = Variable(torch.ones(input.size()[0]))
        output = netD(input)
        errD_real = criterion(output, target)

We get a real image of the dataset which will be used to train the discriminator, and then wrap it in a variable. Then we forward propagate this real image into the neural network of the discriminator to get the prediction (a value between 0 and 1) and compute the loss between the predictions (output) and the target (equal to 1).


Now, training the discriminator with a fake image generated by the generator :

noise = Variable(torch.randn(input.size()[0], 100, 1, 1))
        fake = netG(noise)
        target = Variable(torch.zeros(input.size()[0]))
        output = netD(fake.detach())
        errD_fake = criterion(output, target)

Here, first we are making a random input vector (noise) of the generator and forward propagate this random input vector into the neural network of the generator to get some fake generated images. Then we forward propagate the fake generated images into the neural network of the discriminator to get the prediction (a value between 0 and 1) and compute the loss between the prediction (output) and the target (equal to 0).


Back-propagating the total error :

errD = errD_real + errD_fake
        errD.backward()
        optimizerD.step()

Here we are computing the total error of the discriminator and backpropagating the loss error by computing the gradients of the total error with respect to the weights of the discriminator. At the end we apply the optimizer to update the weights according to how much they are responsible for the loss error of the discriminator.


Next step is to update the weights of the neural network of the generator :

netG.zero_grad()
        target = Variable(torch.ones(input.size()[0]))
        output = netD(fake)
        errG = criterion(output, target)
        errG.backward()
        optimizerG.step()

As done previously , first we are initialising the gradients of the generator to 0 with respect to the weights. Getting the target. Forward propagating the fake generated images into the neural network of the discriminator to get the prediction (a value between 0 and 1) and then computing the loss between the prediction (output between 0 and 1) and the target (equal to 1). Then back-propagating the loss error by computing the gradients of the total error with respect to the weights of the generator and applying the optimizer to update the weights according to how much they are responsible for the loss error of the generator.


Now, our final step is to print the losses and save the real images and the generated images of the mini batch every 100 steps. Which is done as followed :

print('[%d/%d][%d/%d] Loss_D: %.4f Loss_G: %.4f' % (epoch, 25, i, len(dataloader), errD.data[0], errG.data[0]))
        if i % 100 == 0:
            vutils.save_image(real, '%s/real_samples.png' % "./results", normalize = True)
            fake = netG(noise)
            vutils.save_image(fake.data, '%s/fake_samples_epoch_%03d.png' % ("./results", epoch), normalize = True)

完整代码 :

from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
from torch.autograd import Variable

batchSize = 64 
imageSize = 64

transform = transforms.Compose([transforms.Scale(imageSize), 
                                transforms.ToTensor(), 
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),]) # We create a list of transformations (scaling, tensor conversion, normalization) to apply to the input images.

dataset = dset.CIFAR10(root = './data', 
                       download = True, 
                       transform = transform) 

dataloader = torch.utils.data.DataLoader(dataset, 
                                         batch_size = batchSize, 
                                         shuffle = True, num_workers = 2) 

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

# Generator
class G(nn.Module):
    def __init__(self):
            super(G, self).__init__()
            self.main = nn.Sequential(
                nn.ConvTranspose2d(100, 512, 4, 1, 0, bias = False),
                nn.BatchNorm2d(512),
                nn.ReLU(True),
                nn.ConvTranspose2d(512, 256, 4, 2, 1, bias = False),
                nn.BatchNorm2d(256),
                nn.ReLU(True),
                nn.ConvTranspose2d(256, 128, 4, 2, 1, bias = False),
                nn.BatchNorm2d(128),
                nn.ReLU(True),
                nn.ConvTranspose2d(128, 64, 4, 2, 1, bias = False),
                nn.BatchNorm2d(64),
                nn.ReLU(True),
                nn.ConvTranspose2d(64, 3, 4, 2, 1, bias = False),
                nn.Tanh()
            )

    def forward(self, input):
            output = self.main(input)
            return output
netG = G()
netG.apply(weights_init)


# Discriminator
class D(nn.Module):
    def __init__(self):
            super(D, self).__init__()
            self.main = nn.Sequential(
                nn.Conv2d(3, 64, 4, 2, 1, bias = False),
                nn.LeakyReLU(0.2, inplace = True),
                nn.Conv2d(64, 128, 4, 2, 1, bias = False),
                nn.BatchNorm2d(128),
                nn.LeakyReLU(0.2, inplace = True),
                nn.Conv2d(128, 256, 4, 2, 1, bias = False),
                nn.BatchNorm2d(256),
                nn.LeakyReLU(0.2, inplace = True),
                nn.Conv2d(256, 512, 4, 2, 1, bias = False),
                nn.BatchNorm2d(512),
                nn.LeakyReLU(0.2, inplace = True),
                nn.Conv2d(512, 1, 4, 1, 0, bias = False),
                nn.Sigmoid()
            )

    def forward(self, input):
            output = self.main(input)
            return output.view(-1)
netD = D()
netD.apply(weights_init)

# Create criterion
criterion = nn.BCELoss()
optimizerD = optim.Adam(netD.parameters(), lr = 0.0002, betas = (0.5, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr = 0.0002, betas = (0.5, 0.999))

# Batch Training
for epoch in range(25):

    for i, data in enumerate(dataloader, 0):

            netD.zero_grad()

            real, _ = data
            input = Variable(real)
            target = Variable(torch.ones(input.size()[0]))
            output = netD(input)
            errD_real = criterion(output, target)

            noise = Variable(torch.randn(input.size()[0], 100, 1, 1))
            fake = netG(noise)
            target = Variable(torch.zeros(input.size()[0]))
            output = netD(fake.detach())
            errD_fake = criterion(output, target)

            errD = errD_real + errD_fake
            errD.backward()
            optimizerD.step()
            netG.zero_grad()
            target = Variable(torch.ones(input.size()[0]))
            output = netD(fake)
            errG = criterion(output, target)
            errG.backward()
            optimizerG.step()

            print('[%d/%d][%d/%d] Loss_D: %.4f Loss_G: %.4f' % (epoch, 25, i,     len(dataloader), errD.data[0], errG.data[0]))
            if i % 100 == 0:
                vutils.save_image(real, '%s/real_samples.png' % "./results", normalize = True)
                fake = netG(noise)
                vutils.save_image(fake.data, '%s/fake_samples_epoch_%03d.png' % ("./results", epoch), normalize = True)

代码库在此 : https://github.com/venkateshtata/GAN_Medium


Author: lunyang
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source lunyang !
 Previous
LSTM LSTM
参考资料:https://colah.github.io/posts/2015-08-Understanding-LSTMs/ 循环神经网络Humans don’t start their thinking from scratch ev
2019-12-20
Next 
变分自编码器VAE是这么一回事 变分自编码器VAE是这么一回事
参考链接:https://zhuanlan.zhihu.com/p/34998569 过去虽然没有细看,但印象里一直觉得变分自编码器(Variational Auto-Encoder,VAE)是个好东西。趁着最近看概率图模型的三分钟热度,
2019-12-20
  TOC