Gradient Descent

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• (Gradient Descent)

: ๋ฏธ๋ถ„ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ์˜ค์ฐจ๋ฅผ ๋น„๊ตํ•˜์—ฌ ๊ฐ€์žฅ ์ž‘์€ ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

์ฆ‰, '๋ฏธ๋ถ„ ๊ฐ’์ด 0์ธ ์ง€์ '์„ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค 1. a1์—์„œ ๋ฏธ๋ถ„์„ ๊ตฌํ•œ๋‹ค 2. ๊ตฌํ•ด์ง„ ๊ธฐ์šธ๊ธฐ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ(๊ธฐ์šธ๊ธฐ๊ฐ€ +๋ฉด ์Œ์˜ ๋ฐฉํ–ฅ, -๋ฉด ์–‘์˜ ๋ฐฉํ–ฅ)์œผ๋กœ ์–ผ๋งˆ๊ฐ„ ์ด๋™์‹œํ‚จ a2์—์„œ ๋น„๋ถ„์„ ๊ตฌํ•œ๋‹ค 3. ๊ตฌํ•œ ๋ฏธ๋ถ„ ๊ฐ’์ด 0์ด ์•„๋‹ˆ๋ฉด ์œ„ ๊ณผ์ •์„ ๋ฐ˜๋ณต

ํ•™์Šต๋ฅ  (learning rate)

: ์–ด๋А ๋งŒํฐ ์ด๋™์‹œํ‚ฌ์ง€๋ฅผ ์‹ ์ค‘ํžˆ ๊ฒฐ์ •ํ•ด์•ผ ํ•˜๋Š”๋ฐ, ์ด๋•Œ ์ด๋™๊ฑฐ๋ฆฌ๋ฅผ ์ •ํ•ด ์ฃผ๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ ํ•™์Šต๋ฅ ์ด๋‹ค ๋‹ค์‹œ ๋งํ•ด, ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ์˜ค์ฐจ ๋ณ€ํ™”์— ๋”ฐ๋ผ ์ด์ฐจ ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค๊ณ  ์ ์ ˆํ•œ ํ•™์Šต๋ฅ ์„ ์„ค์ ˆํ•ด ๋ฏธ๋ถ„ ๊ฐ’์ด 0์ธ ์ง€์ ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ์ฝ”๋”ฉ

์ตœ์†Ÿ๊ฐ’์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด์ฐจ ํ•จ์ˆ˜์—์„œ ๋ฏธ๋ถ„ํ•ด์•ผ ํ•˜๊ณ , ๊ทธ ์ด์ฐจ ํ•จ์ˆ˜๋Š” ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ๋ฅผ ํ†ตํ•ด์„œ ๋‚˜์˜จ ๊ฒƒ์ด๋‹ค

ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ๋ฅผ์˜ ์‹์€ MSE gradientDescent

์ด ๊ฐ’์„ ๋ฏธ๋ถ„ํ•  ๋•Œ ์šฐ๋ฆฌ๊ฐ€ ๊ถ๊ธˆํ•œ ๊ฒƒ์€ a์™€ b๋ผ๋Š” ๊ฒƒ์— ์ฃผ์˜ํ•ด์•ผ ํ•˜๋ฉฐ, ํ•„์š”ํ•œ ๊ฐ’์„ ์ค‘์‹ฌ์œผ๋กœ ๋ฏธ๋ถ„ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํŠน์ •ํ•œ ๊ฐ’์„ ๋ฏธ๋ถ„ํ•˜๋Š” ํŽธ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•œ๋‹ค. gradientDescentA gradientDescentB

์œ„ ๊ฒฐ๊ณผ๋กœ ์ฝ”๋”ฉ์„ ํ•˜๋ฉด

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#๊ณต๋ถ€์‹œ๊ฐ„ X์™€ ์„ฑ์  Y์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
data = [[2, 81], [4, 93], [6, 91], [8, 97]]
x = [i[0] for i in data]
y = [i[1] for i in data]

#๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ด ๋ด…๋‹ˆ๋‹ค.
plt.figure(figsize=(8,5))
plt.scatter(x, y)
plt.show()

# ๋ฆฌ์ŠคํŠธ๋กœ ๋˜์–ด ์žˆ๋Š” x์™€ y๊ฐ’์„ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด๋กœ ๋ฐ”๊พธ์–ด ์ค๋‹ˆ๋‹ค.
# (์ธ๋ฑ์Šค๋ฅผ ์ฃผ์–ด ํ•˜๋‚˜์”ฉ ๋ถˆ๋Ÿฌ์™€ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•ด ์ง€๋„๋ก ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.)
x_data = np.array(x)
y_data = np.array(y)

# ๊ธฐ์šธ๊ธฐ a์™€ ์ ˆํŽธ b์˜ ๊ฐ’์„ ์ดˆ๊ธฐํ™” ํ•ฉ๋‹ˆ๋‹ค.
a = 0
b = 0

#ํ•™์Šต๋ฅ ์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.
lr = 0.03 

#๋ช‡ ๋ฒˆ ๋ฐ˜๋ณต๋ ์ง€๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
epochs = 2001 

#๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
for i in range(epochs):                               # epoch ์ˆ˜ ๋งŒํผ ๋ฐ˜๋ณต
    y_hat = a * x_data + b                            # y๋ฅผ ๊ตฌํ•˜๋Š” ์‹์„ ์„ธ์›๋‹ˆ๋‹ค (Hy)
    error = y_data - y_hat                            # ์˜ค์ฐจ๋ฅผ ๊ตฌํ•˜๋Š” ์‹์ž…๋‹ˆ๋‹ค.
    a_diff = -(2/len(x_data)) * sum(x_data * (error)) # ์˜ค์ฐจํ•จ์ˆ˜๋ฅผ a๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. 
    b_diff = -(2/len(x_data)) * sum(error)            # ์˜ค์ฐจํ•จ์ˆ˜๋ฅผ b๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. 
    a = a - lr * a_diff                               # ํ•™์Šต๋ฅ ์„ ๊ณฑํ•ด ๊ธฐ์กด์˜ a๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    b = b - lr * b_diff                               # ํ•™์Šต๋ฅ ์„ ๊ณฑํ•ด ๊ธฐ์กด์˜ b๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    if i % 100 == 0:                                  # 100๋ฒˆ ๋ฐ˜๋ณต๋  ๋•Œ๋งˆ๋‹ค ํ˜„์žฌ์˜ a๊ฐ’, b๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
        print("epoch=%.f, ๊ธฐ์šธ๊ธฐ=%.04f, ์ ˆํŽธ=%.04f" % (i, a, b))


# ์•ž์„œ ๊ตฌํ•œ ๊ธฐ์šธ๊ธฐ์™€ ์ ˆํŽธ์„ ์ด์šฉํ•ด ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค ๋ด…๋‹ˆ๋‹ค.
y_pred = a * x_data + b
plt.scatter(x, y)
plt.plot([min(x_data), max(x_data)], [min(y_pred), max(y_pred)])
plt.show()
epoch=0, ๊ธฐ์šธ๊ธฐ=27.8400, ์ ˆํŽธ=5.4300
epoch=100, ๊ธฐ์šธ๊ธฐ=7.0739, ์ ˆํŽธ=50.5117
epoch=200, ๊ธฐ์šธ๊ธฐ=4.0960, ์ ˆํŽธ=68.2822
epoch=300, ๊ธฐ์šธ๊ธฐ=2.9757, ์ ˆํŽธ=74.9678
epoch=400, ๊ธฐ์šธ๊ธฐ=2.5542, ์ ˆํŽธ=77.4830
epoch=500, ๊ธฐ์šธ๊ธฐ=2.3956, ์ ˆํŽธ=78.4293
epoch=600, ๊ธฐ์šธ๊ธฐ=2.3360, ์ ˆํŽธ=78.7853
epoch=700, ๊ธฐ์šธ๊ธฐ=2.3135, ์ ˆํŽธ=78.9192
epoch=800, ๊ธฐ์šธ๊ธฐ=2.3051, ์ ˆํŽธ=78.9696
epoch=900, ๊ธฐ์šธ๊ธฐ=2.3019, ์ ˆํŽธ=78.9886
epoch=1000, ๊ธฐ์šธ๊ธฐ=2.3007, ์ ˆํŽธ=78.9957
epoch=1100, ๊ธฐ์šธ๊ธฐ=2.3003, ์ ˆํŽธ=78.9984
epoch=1200, ๊ธฐ์šธ๊ธฐ=2.3001, ์ ˆํŽธ=78.9994
epoch=1300, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=78.9998
epoch=1400, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=78.9999
epoch=1500, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=79.0000
epoch=1600, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=79.0000
epoch=1700, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=79.0000
epoch=1800, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=79.0000
epoch=1900, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=79.0000
epoch=2000, ๊ธฐ์šธ๊ธฐ=2.3000, ์ ˆํŽธ=79.0000
gradientDescentGraph
  • ์ด๋ ‡๊ฒŒ ํ•ด์„œ ์ตœ์†Œ ์ œ๊ณฑ๋ฒ•์„ ์“ฐ์ง€ ์•Š๊ณ  ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ์™€ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ํ†ตํ•ด ์›ํ•˜๋Š” ๊ฐ’์„ ๊ตฌํ•œ๋‹ค

๋‹ค์ค‘ ์„ ํ˜• ํšŒ๊ท€

: ์˜ˆ์ธกํ•œ ์ด์™ธ์˜ ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์ด์œ ๋Š” ์˜ˆ์ธกํ•œ ์š”์†Œํ•œ ์ด์™ธ์˜ ๋‹ค๋ฅธ ์š”์†Œ๊ฐ€ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธฐ ๋–„๋ฌธ์— ๋” ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ํ•˜๋ ค๋ฉด ์ถ”๊ฐ€ ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•ด์•ผ ํ•˜๋ฉฐ, ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•ด ์ƒˆ๋กœ์šด ์˜ˆ์ธก ๊ฐ’์„ ๊ตฌํ•˜๋ ค๋ฉด ๋ณ€์ˆ˜์˜ ๊ฐœ์ˆ˜๋ฅผ ๋Š˜๋ ค ๋‹ค์ค‘ ์„ ํ˜• ํšŒ๊ท€๋ฅผ ๋งŒ๋“ค์–ด ์ฃผ์–ด์•ผ ํ•œ๋‹ค

multiLinear
  • ์ด a1๊ณผ a2์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•œ๋‹ค

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d # 3D ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 

#๊ณต๋ถ€์‹œ๊ฐ„ X์™€ ์„ฑ์  Y์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
data = [[2, 0, 81], [4, 4, 93], [6, 2, 91], [8, 3, 97]]
x1 = [i[0] for i in data] 
x2 = [i[1] for i in data]
# x1๊ณผ x2๋ผ๋Š” ๋‘๊ฐœ์˜ ๋…๋ฆฝ ๋ณ€์ˆ˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ค์–ด ์ค€๋‹ค 
y = [i[2] for i in data]

#๊ทธ๋ž˜ํ”„๋กœ ํ™•์ธํ•ด ๋ด…๋‹ˆ๋‹ค.
ax = plt.axes(projection='3d') # ๊ทธ๋ž˜ํ”„ ์œ ํ˜•
ax.set_xlabel('study_hours')
ax.set_ylabel('private_class')
ax.set_zlabel('Score')
ax.dist = 11 
ax.scatter(x1, x2, y)
plt.show()

#๋ฆฌ์ŠคํŠธ๋กœ ๋˜์–ด ์žˆ๋Š” x์™€ y๊ฐ’์„ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด๋กœ ๋ฐ”๊พธ์–ด ์ค๋‹ˆ๋‹ค.
# (์ธ๋ฑ์Šค๋ฅผ ์ฃผ์–ด ํ•˜๋‚˜์”ฉ ๋ถˆ๋Ÿฌ์™€ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•ด ์ง€๋„๋ก ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.)
x1_data = np.array(x1)
x2_data = np.array(x2)
y_data = np.array(y)

# ๊ธฐ์šธ๊ธฐ a์™€ ์ ˆํŽธ b์˜ ๊ฐ’์„ ์ดˆ๊ธฐํ™” ํ•ฉ๋‹ˆ๋‹ค.
a1 = 0
a2 = 0
b = 0

#ํ•™์Šต๋ฅ ์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.
lr = 0.05 

#๋ช‡ ๋ฒˆ ๋ฐ˜๋ณต๋ ์ง€๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
# (0๋ถ€ํ„ฐ ์„ธ๋ฏ€๋กœ ์›ํ•˜๋Š” ๋ฐ˜๋ณต ํšŸ์ˆ˜์— +1์„ ํ•ด ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.)
epochs = 2001 

#๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
for i in range(epochs):                                  # epoch ์ˆ˜ ๋งŒํผ ๋ฐ˜๋ณต
    y_pred = a1 * x1_data + a2 * x2_data + b             # y๋ฅผ ๊ตฌํ•˜๋Š” ์‹์„ ์„ธ์›๋‹ˆ๋‹ค
    error = y_data - y_pred                              # ์˜ค์ฐจ๋ฅผ ๊ตฌํ•˜๋Š” ์‹์ž…๋‹ˆ๋‹ค.
    a1_diff = -(1/len(x1_data)) * sum(x1_data * (error)) # ์˜ค์ฐจํ•จ์ˆ˜๋ฅผ a1๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. 
    a2_diff = -(1/len(x2_data)) * sum(x2_data * (error)) # ์˜ค์ฐจํ•จ์ˆ˜๋ฅผ a2๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. 
    b_diff = -(1/len(x1_data)) * sum(y_data - y_pred)    # ์˜ค์ฐจํ•จ์ˆ˜๋ฅผ b๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.
    a1 = a1 - lr * a1_diff                               # ํ•™์Šต๋ฅ ์„ ๊ณฑํ•ด ๊ธฐ์กด์˜ a1๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    a2 = a2 - lr * a2_diff                               # ํ•™์Šต๋ฅ ์„ ๊ณฑํ•ด ๊ธฐ์กด์˜ a2๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    b = b - lr * b_diff                                  # ํ•™์Šต๋ฅ ์„ ๊ณฑํ•ด ๊ธฐ์กด์˜ b๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    if i % 100 == 0:                                     # 100๋ฒˆ ๋ฐ˜๋ณต๋  ๋•Œ๋งˆ๋‹ค ํ˜„์žฌ์˜ a1, a2, b๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
        print("epoch=%.f, ๊ธฐ์šธ๊ธฐ1=%.04f, ๊ธฐ์šธ๊ธฐ2=%.04f, ์ ˆํŽธ=%.04f" % (i, a1, a2, b))
epoch=0, ๊ธฐ์šธ๊ธฐ1=23.2000, ๊ธฐ์šธ๊ธฐ2=10.5625, ์ ˆํŽธ=4.5250
epoch=100, ๊ธฐ์šธ๊ธฐ1=6.4348, ๊ธฐ์šธ๊ธฐ2=3.9893, ์ ˆํŽธ=43.9757
epoch=200, ๊ธฐ์šธ๊ธฐ1=3.7255, ๊ธฐ์šธ๊ธฐ2=3.0541, ์ ˆํŽธ=62.5766
epoch=300, ๊ธฐ์šธ๊ธฐ1=2.5037, ๊ธฐ์šธ๊ธฐ2=2.6323, ์ ˆํŽธ=70.9656
epoch=400, ๊ธฐ์šธ๊ธฐ1=1.9527, ๊ธฐ์šธ๊ธฐ2=2.4420, ์ ˆํŽธ=74.7491
epoch=500, ๊ธฐ์šธ๊ธฐ1=1.7042, ๊ธฐ์šธ๊ธฐ2=2.3562, ์ ˆํŽธ=76.4554
epoch=600, ๊ธฐ์šธ๊ธฐ1=1.5921, ๊ธฐ์šธ๊ธฐ2=2.3175, ์ ˆํŽธ=77.2250
epoch=700, ๊ธฐ์šธ๊ธฐ1=1.5415, ๊ธฐ์šธ๊ธฐ2=2.3001, ์ ˆํŽธ=77.5720
epoch=800, ๊ธฐ์šธ๊ธฐ1=1.5187, ๊ธฐ์šธ๊ธฐ2=2.2922, ์ ˆํŽธ=77.7286
epoch=900, ๊ธฐ์šธ๊ธฐ1=1.5084, ๊ธฐ์šธ๊ธฐ2=2.2886, ์ ˆํŽธ=77.7992
epoch=1000, ๊ธฐ์šธ๊ธฐ1=1.5038, ๊ธฐ์šธ๊ธฐ2=2.2870, ์ ˆํŽธ=77.8310
epoch=1100, ๊ธฐ์šธ๊ธฐ1=1.5017, ๊ธฐ์šธ๊ธฐ2=2.2863, ์ ˆํŽธ=77.8453
epoch=1200, ๊ธฐ์šธ๊ธฐ1=1.5008, ๊ธฐ์šธ๊ธฐ2=2.2860, ์ ˆํŽธ=77.8518
epoch=1300, ๊ธฐ์šธ๊ธฐ1=1.5003, ๊ธฐ์šธ๊ธฐ2=2.2858, ์ ˆํŽธ=77.8547
epoch=1400, ๊ธฐ์šธ๊ธฐ1=1.5002, ๊ธฐ์šธ๊ธฐ2=2.2858, ์ ˆํŽธ=77.8561
epoch=1500, ๊ธฐ์šธ๊ธฐ1=1.5001, ๊ธฐ์šธ๊ธฐ2=2.2857, ์ ˆํŽธ=77.8567
epoch=1600, ๊ธฐ์šธ๊ธฐ1=1.5000, ๊ธฐ์šธ๊ธฐ2=2.2857, ์ ˆํŽธ=77.8569
epoch=1700, ๊ธฐ์šธ๊ธฐ1=1.5000, ๊ธฐ์šธ๊ธฐ2=2.2857, ์ ˆํŽธ=77.8570
epoch=1800, ๊ธฐ์šธ๊ธฐ1=1.5000, ๊ธฐ์šธ๊ธฐ2=2.2857, ์ ˆํŽธ=77.8571
epoch=1900, ๊ธฐ์šธ๊ธฐ1=1.5000, ๊ธฐ์šธ๊ธฐ2=2.2857, ์ ˆํŽธ=77.8571
epoch=2000, ๊ธฐ์šธ๊ธฐ1=1.5000, ๊ธฐ์šธ๊ธฐ2=2.2857, ์ ˆํŽธ=77.8571
multiLinearGraph

Last updated

Was this helpful?