Variable.cuda() or Variable.grad - DonghoonPark12/ssd.pytorch GitHub Wiki

์ฝ”๋“œ๋ฅผ ๋ณด๋‹ค๊ฐ€ ์ฒ˜์Œ๋ณด๋Š” ๋ถ€๋ถ„์ด ๋‚˜์™”๋‹ค. input_var = Variable(input).cuda(), target_var = Variable(target).cuda() ์™œ ๊ฐ์‹ธ์ฃผ๋Š” ๊ฒƒ์ผ๊นŒ.

    for epoch in range(2):
        logger.info("-- EPOCH: %s", epoch)
        running_loss = 0.0
        for i, data in enumerate(train_loader, 0):
            if i % 50 == 49: 
                logger.info("-- ITERATION: %s", i)
            input, target = data

            # wrap input + target into variables
            input_var = Variable(input).cuda()
            target_var = Variable(target).cuda()
            ...

๋‹ค๋ฅธ ์ฝ”๋“œ๋ฅผ ๋ณด๋‹ˆ ์•„๋ž˜์™€ ๊ฐ™์ด ๋‘๊ฐ€์ง€๋กœ ๊ตฌ๋ถ„๋˜์–ด ์žˆ์—ˆ๋‹ค.

Variable(torch.randn(3,1,2).float().cuda(), requires_grad=True) # ํ˜น์€

Variable(torch.randn(3,1,2).float(), requires_grad=True).cuda() 

๋‹ต๋ณ€ ์ค‘์— ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ์ด ์žˆ์—ˆ๋‹ค.
".cuda() creates another Variable that isnโ€™t a leaf node in the computation graph"
.cuda()๋Š” ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์—์„œ ๋ฆฌํ”„(leaf) ๋…ธ๋“œ๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธ ๋ณ€์ˆ˜๋ฅผ ์ƒ์„ฑํ•œ๋‹ค(?). ๋ฌด์Šจ ๋ง์ผ๊นŒ.

์‹ค์€ x = Variable(torch.randn(3,1,2).float(), requires_grad=True).cuda() ๋ผ ํ•˜๋ฉด ์•„๋ž˜์˜ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„๋‹ค.

y = Variable(torch.randn(3,1,2).float(), requires_grad=True)
x = y.cuda()

์—ฌ๊ธฐ์„œ gradient๊ฐ€ ๊ณ„์‚ฐ๋˜๋Š” ๊ฒƒ์€ y์ด๋ฉฐ, x๊ฐ€ ์•„๋‹ˆ๋‹ค.

์œ„์˜ ๋‹ต๋ณ€์„ ์ฐธ๊ณ ํ•ด ๋ณผ๋•Œ, ์•„๋ž˜ ์ฝ”๋“œ์—์„œ input_var๊ณผ target_var์€ gradient๊ฐ€ ๊ณ„์‚ฐ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ฃผ์„๊นŒ์ง€ ์ฒจ๋ถ€ํ•ด ๋ณด์•˜๋‹ค.

# wrap input + target into variables
input_var = Variable(input).cuda()   # non-leaf Variable, result of operation, gradient would be 'None'
target_var = Variable(target).cuda()

๋งŒ์•ฝ ๊ณ„์‚ฐ๋˜๊ฒŒ ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜์˜ ์ฝ”๋“œ๋กœ ๋ฐ”๊พผ๋‹ค.

# wrap input + target into variables
input_var = Variable(input.cuda(), require_grad=True) # leaf Variable, user created
target_var = Variable(target.cuda, , require_grad=True)

ํ•˜์ง€๋งŒ, ์œ„์˜ ๋‹ต๋ณ€๋“ค์€ ๋‚˜์˜ ๊ถ๊ธˆ์ฆ์„ ํ•ด์†Œํ•ด์ฃผ์ง€ ๋ชปํ•œ๋‹ค. ์˜คํžˆ๋ ค ๊ถ๊ธˆ์ฆ์„ ๋” ํ‚ค์šด๋‹ค. Variable()๋กœ ๊ฐ์‹ธ๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ ์‹ฌ์ง€์–ด gradient๋ฅผ ๊ณ„์‚ฐํ•˜์ง€๋„ ์•Š์„๊บผ๋ฉด์„œ .cuda()๋Š” ์™œ ๋ถ™์ด๋Š” ๊ฒƒ์ผ๊นŒ?

# wrap input + target into variables
input_var = Variable(input).cuda()
target_var = Variable(target).cuda()

pytorch 1.10๊ธฐ์ค€์œผ๋กœ Variable()์€ ๋”์ด์ƒ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ ์•„๋ž˜์˜ ์ฝ”๋“œ๋กœ ๋Œ€์ฒดํ•œ๋‹ค.
(์ด๋•Œ, ์ƒํ–ฅ๋œ pytorch๋ฒ„์ „์—์„œ input_var๊ณผ target_var์€ leaf_variable์ผ๊นŒ, non-leaf variable์ผ๊นŒ?)

# wrap input + target into variables
input_var = input.cuda()
target_var = target.cuda()

์ด์ œ ๋งˆ์ง€๋ง‰์œผ๋กœ ๋‚จ์€ ์งˆ๋ฌธ์€ .cuda()๋กœ ๊ฐ์‹ธ๋Š” ์ด์œ ์ด๋‹ค. input_var๊ณผtarget_var์ด gpu์œ ๋ฌด์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ ธ์•ผ ๋˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ. ์งˆ๋ฌธ์„ ๋‚จ๊ฒผ๋‹ค. ๊ธฐ๋‹ค๋ ค ๋ณด์ž.

[Reference]
https://discuss.pytorch.org/t/how-to-get-cuda-variable-gradient/1386
https://discuss.pytorch.org/t/strange-behavior-of-variable-cuda-and-variable-grad/1642