Gradient Descent

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• (Gradient Descent)

: ๋ฏธ๋ถ„ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ์˜ค์ฐจ๋ฅผ ๋น„๊ตํ•˜์—ฌ ๊ฐ€์žฅ ์ž‘์€ ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

์ฆ‰, '๋ฏธ๋ถ„ ๊ฐ’์ด 0์ธ ์ง€์ '์„ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค 1. a1์—์„œ ๋ฏธ๋ถ„์„ ๊ตฌํ•œ๋‹ค 2. ๊ตฌํ•ด์ง„ ๊ธฐ์šธ๊ธฐ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ(๊ธฐ์šธ๊ธฐ๊ฐ€ +๋ฉด ์Œ์˜ ๋ฐฉํ–ฅ, -๋ฉด ์–‘์˜ ๋ฐฉํ–ฅ)์œผ๋กœ ์–ผ๋งˆ๊ฐ„ ์ด๋™์‹œํ‚จ a2์—์„œ ๋น„๋ถ„์„ ๊ตฌํ•œ๋‹ค 3. ๊ตฌํ•œ ๋ฏธ๋ถ„ ๊ฐ’์ด 0์ด ์•„๋‹ˆ๋ฉด ์œ„ ๊ณผ์ •์„ ๋ฐ˜๋ณต

ํ•™์Šต๋ฅ  (learning rate)

: ์–ด๋А ๋งŒํฐ ์ด๋™์‹œํ‚ฌ์ง€๋ฅผ ์‹ ์ค‘ํžˆ ๊ฒฐ์ •ํ•ด์•ผ ํ•˜๋Š”๋ฐ, ์ด๋•Œ ์ด๋™๊ฑฐ๋ฆฌ๋ฅผ ์ •ํ•ด ์ฃผ๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ ํ•™์Šต๋ฅ ์ด๋‹ค ๋‹ค์‹œ ๋งํ•ด, ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ์˜ค์ฐจ ๋ณ€ํ™”์— ๋”ฐ๋ผ ์ด์ฐจ ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค๊ณ  ์ ์ ˆํ•œ ํ•™์Šต๋ฅ ์„ ์„ค์ ˆํ•ด ๋ฏธ๋ถ„ ๊ฐ’์ด 0์ธ ์ง€์ ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ์ฝ”๋”ฉ

์ตœ์†Ÿ๊ฐ’์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด์ฐจ ํ•จ์ˆ˜์—์„œ ๋ฏธ๋ถ„ํ•ด์•ผ ํ•˜๊ณ , ๊ทธ ์ด์ฐจ ํ•จ์ˆ˜๋Š” ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ๋ฅผ ํ†ตํ•ด์„œ ๋‚˜์˜จ ๊ฒƒ์ด๋‹ค

ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ๋ฅผ์˜ ์‹์€ MSE gradientDescent

์ด ๊ฐ’์„ ๋ฏธ๋ถ„ํ•  ๋•Œ ์šฐ๋ฆฌ๊ฐ€ ๊ถ๊ธˆํ•œ ๊ฒƒ์€ a์™€ b๋ผ๋Š” ๊ฒƒ์— ์ฃผ์˜ํ•ด์•ผ ํ•˜๋ฉฐ, ํ•„์š”ํ•œ ๊ฐ’์„ ์ค‘์‹ฌ์œผ๋กœ ๋ฏธ๋ถ„ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํŠน์ •ํ•œ ๊ฐ’์„ ๋ฏธ๋ถ„ํ•˜๋Š” ํŽธ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•œ๋‹ค. gradientDescentA gradientDescentB

์œ„ ๊ฒฐ๊ณผ๋กœ ์ฝ”๋”ฉ์„ ํ•˜๋ฉด

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#๊ณต๋ถ€์‹œ๊ฐ„ X์™€ ์„ฑ์  Y์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
data = [[2, 81], [4, 93], [6, 91], [8, 97]]
x = [i[0] for i in data]
y = [i[1] for i in data]

#๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ด ๋ด…๋‹ˆ๋‹ค.
plt.figure(figsize=(8,5))
plt.scatter(x, y)
plt.show()

# ๋ฆฌ์ŠคํŠธ๋กœ ๋˜์–ด ์žˆ๋Š” x์™€ y๊ฐ’์„ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด๋กœ ๋ฐ”๊พธ์–ด ์ค๋‹ˆ๋‹ค.
# (์ธ๋ฑ์Šค๋ฅผ ์ฃผ์–ด ํ•˜๋‚˜์”ฉ ๋ถˆ๋Ÿฌ์™€ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•ด ์ง€๋„๋ก ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.)
x_data = np.array(x)
y_data = np.array(y)

# ๊ธฐ์šธ๊ธฐ a์™€ ์ ˆํŽธ b์˜ ๊ฐ’์„ ์ดˆ๊ธฐํ™” ํ•ฉ๋‹ˆ๋‹ค.
a = 0
b = 0

#ํ•™์Šต๋ฅ ์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.
lr = 0.03 

#๋ช‡ ๋ฒˆ ๋ฐ˜๋ณต๋ ์ง€๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
epochs = 2001 

#๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
for i in range(epochs):                               # epoch ์ˆ˜ ๋งŒํผ ๋ฐ˜๋ณต
    y_hat = a * x_data + b                            # y๋ฅผ ๊ตฌํ•˜๋Š” ์‹์„ ์„ธ์›๋‹ˆ๋‹ค (Hy)
    error = y_data - y_hat                            # ์˜ค์ฐจ๋ฅผ ๊ตฌํ•˜๋Š” ์‹์ž…๋‹ˆ๋‹ค.
    a_diff = -(2/len(x_data)) * sum(x_data * (error)) # ์˜ค์ฐจํ•จ์ˆ˜๋ฅผ a๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. 
    b_diff = -(2/len(x_data)) * sum(error)            # ์˜ค์ฐจํ•จ์ˆ˜๋ฅผ b๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. 
    a = a - lr * a_diff                               # ํ•™์Šต๋ฅ ์„ ๊ณฑํ•ด ๊ธฐ์กด์˜ a๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    b = b - lr * b_diff                               # ํ•™์Šต๋ฅ ์„ ๊ณฑํ•ด ๊ธฐ์กด์˜ b๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    if i % 100 == 0:                                  # 100๋ฒˆ ๋ฐ˜๋ณต๋  ๋•Œ๋งˆ๋‹ค ํ˜„์žฌ์˜ a๊ฐ’, b๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
        print("epoch=%.f, ๊ธฐ์šธ๊ธฐ=%.04f, ์ ˆํŽธ=%.04f" % (i, a, b))


# ์•ž์„œ ๊ตฌํ•œ ๊ธฐ์šธ๊ธฐ์™€ ์ ˆํŽธ์„ ์ด์šฉํ•ด ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค ๋ด…๋‹ˆ๋‹ค.
y_pred = a * x_data + b
plt.scatter(x, y)
plt.plot([min(x_data), max(x_data)], [min(y_pred), max(y_pred)])
plt.show()
gradientDescentGraph
  • ์ด๋ ‡๊ฒŒ ํ•ด์„œ ์ตœ์†Œ ์ œ๊ณฑ๋ฒ•์„ ์“ฐ์ง€ ์•Š๊ณ  ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ์™€ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ํ†ตํ•ด ์›ํ•˜๋Š” ๊ฐ’์„ ๊ตฌํ•œ๋‹ค

๋‹ค์ค‘ ์„ ํ˜• ํšŒ๊ท€

: ์˜ˆ์ธกํ•œ ์ด์™ธ์˜ ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์ด์œ ๋Š” ์˜ˆ์ธกํ•œ ์š”์†Œํ•œ ์ด์™ธ์˜ ๋‹ค๋ฅธ ์š”์†Œ๊ฐ€ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธฐ ๋–„๋ฌธ์— ๋” ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ํ•˜๋ ค๋ฉด ์ถ”๊ฐ€ ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•ด์•ผ ํ•˜๋ฉฐ, ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•ด ์ƒˆ๋กœ์šด ์˜ˆ์ธก ๊ฐ’์„ ๊ตฌํ•˜๋ ค๋ฉด ๋ณ€์ˆ˜์˜ ๊ฐœ์ˆ˜๋ฅผ ๋Š˜๋ ค ๋‹ค์ค‘ ์„ ํ˜• ํšŒ๊ท€๋ฅผ ๋งŒ๋“ค์–ด ์ฃผ์–ด์•ผ ํ•œ๋‹ค

multiLinear
  • ์ด a1๊ณผ a2์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•œ๋‹ค

multiLinearGraph

Last updated