モデル訓練のテクニック
モデル訓練のテクニック
機械学習でパラメータを更新するためのテクニックについてまとめていきたいと思います。
学習係数の決め方
機械学習では学習係数をどう決めるかが学習の成否を大きく左右し、極めて重要になります。 自動的に学習係数を決める手法では定番と言える考え方が2つあります。一つは学習の初期ほど大きい値を選び、学習の進捗とともに学習係数を小さくするといった方法です。(深層学習での)もう一つはネットワークの全ての層で同じ学習係数を用いるのではなく、層ごとに異なる値を使うといった方法です。
学習係数を自動で決める方法として有名なものにAdaGradがあります。AdaGradは勾配の2乗の合計のエポック間のが合計でεを割ったものを学習率として使用します。i成分のパラメータのベクトル成分をg_t,iとするとi成分のパラメータの更新量は以下のように表せられます。
ただこの求め方だと訓練の初期で学習係数が大きくなりすぎるということがあるため分母に定数を最初から足しておくことが多いようです。
Tensorflowで試しに実装してみます。 カリフォルニアの住宅価格のデータがありますので線形モデルで学習させてみたいと思います。
from __future__ import print_function import math from IPython import display from matplotlib import cm from matplotlib import gridspec from matplotlib import pyplot as plt import numpy as np import pandas as pd from sklearn import metrics import tensorflow as tf from tensorflow.python.data import Dataset tf.logging.set_verbosity(tf.logging.ERROR) pd.options.display.max_rows = 10 pd.options.display.float_format = '{:.1f}'.format # CSVデータの読み込み california_housing_dataframe = pd.read_csv("https://storage.googleapis.com/mledu-datasets/california_housing_train.csv", sep=",") california_housing_dataframe = california_housing_dataframe.reindex( np.random.permutation(california_housing_dataframe.index)) california_housing_dataframe["median_house_value"] /= 1000.0 # データをシャッフルした後一件のデータを取り出す def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None): """Trains a linear regression model of one feature. Args: features: pandas DataFrame of features targets: pandas DataFrame of targets batch_size: Size of batches to be passed to the model shuffle: True or False. Whether to shuffle the data. num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely Returns: Tuple of (features, labels) for next data batch """ # Convert pandas data into a dict of np arrays. features = {key:np.array(value) for key,value in dict(features).items()} # Construct a dataset, and configure batching/repeating. ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit ds = ds.batch(batch_size).repeat(num_epochs) # Shuffle the data, if specified. if shuffle: ds = ds.shuffle(buffer_size=10000) # Return the next batch of data. features, labels = ds.make_one_shot_iterator().get_next() return features, labels
まずAdaGradを使わない場合どうなるか確認してみます。
my_feature = california_housing_dataframe[["total_rooms", "median_income"]] feature_columns = [tf.feature_column.numeric_column("total_rooms"), tf.feature_column.numeric_column("median_income")] targets = california_housing_dataframe["median_house_value"] # optimizerはパラメータの更新に使う設定 # Use gradient descent as the optimizer for training the model. my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.0005,) # 勾配が指定した値以上大きくならないようにする my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0) # Configure the linear regression model with our feature columns and optimizer. # Set a learning rate of 0.0000001 for Gradient Descent. linear_regressor = tf.estimator.LinearRegressor( feature_columns=feature_columns, optimizer=my_optimizer ) for i in range(10): _ = linear_regressor.train( input_fn = lambda:my_input_fn(my_feature, targets), steps=500 ) prediction_input_fn =lambda: my_input_fn(my_feature, targets, num_epochs=1, shuffle=False) predictions = linear_regressor.predict(input_fn=prediction_input_fn) predictions = np.array([item['predictions'][0] for item in predictions]) # 教師データ、テストデータそれぞれで誤差を算出 mean_squared_error = metrics.mean_squared_error(predictions, targets) root_mean_squared_error = math.sqrt(mean_squared_error) print("period:" + str(i)) print("Mean Squared Error (on training data): %0.3f" % mean_squared_error) print("Root Mean Squared Error (on training data): %0.3f" % root_mean_squared_error)
結果は次のように誤差が安定して下がらないことが確認できます。
period:0 Mean Squared Error (on training data): 42441.611 Root Mean Squared Error (on training data): 206.014 period:1 Mean Squared Error (on training data): 35264.325 Root Mean Squared Error (on training data): 187.788 period:2 Mean Squared Error (on training data): 38522.086 Root Mean Squared Error (on training data): 196.270 period:3 Mean Squared Error (on training data): 63698.451 Root Mean Squared Error (on training data): 252.386 period:4 Mean Squared Error (on training data): 57390.924 Root Mean Squared Error (on training data): 239.564 period:5 Mean Squared Error (on training data): 57320.471 Root Mean Squared Error (on training data): 239.417 period:6 Mean Squared Error (on training data): 42164.934 Root Mean Squared Error (on training data): 205.341 period:7 Mean Squared Error (on training data): 46558.398 Root Mean Squared Error (on training data): 215.774 period:8 Mean Squared Error (on training data): 38209.068 Root Mean Squared Error (on training data): 195.471 period:9 Mean Squared Error (on training data): 38180.557 Root Mean Squared Error (on training data): 195.398
次にAdaGradの結果をみてみます。AdaGradの学習係数は自動で下がっていくので初期値として0.0005を渡しています。
my_feature = california_housing_dataframe[["total_rooms", "median_income"]] feature_columns = [tf.feature_column.numeric_column("total_rooms"), tf.feature_column.numeric_column("median_income")] targets = california_housing_dataframe["median_house_value"] # AdaGrad my_optimizer=tf.train.AdagradOptimizer(learning_rate=0.0005,initial_accumulator_value=0.01,) # 勾配が指定した値以上大きくならないようにする my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0) # Configure the linear regression model with our feature columns and optimizer. # Set a learning rate of 0.0000001 for Gradient Descent. linear_regressor = tf.estimator.LinearRegressor( feature_columns=feature_columns, optimizer=my_optimizer ) for i in range(10): _ = linear_regressor.train( input_fn = lambda:my_input_fn(my_feature, targets), steps=500 ) prediction_input_fn =lambda: my_input_fn(my_feature, targets, num_epochs=1, shuffle=False) predictions = linear_regressor.predict(input_fn=prediction_input_fn) predictions = np.array([item['predictions'][0] for item in predictions]) # 教師データ、テストデータそれぞれで誤差を算出 mean_squared_error = metrics.mean_squared_error(predictions, targets) root_mean_squared_error = math.sqrt(mean_squared_error) print("period:" + str(i)) print("Mean Squared Error (on training data): %0.3f" % mean_squared_error) print("Root Mean Squared Error (on training data): %0.3f" % root_mean_squared_error)
結果は先ほどに比べると安定して下がっていっているように見えます。
period:0 Mean Squared Error (on training data): 37073.315 Root Mean Squared Error (on training data): 192.544 period:1 Mean Squared Error (on training data): 32312.096 Root Mean Squared Error (on training data): 179.756 period:2 Mean Squared Error (on training data): 30123.440 Root Mean Squared Error (on training data): 173.561 period:3 Mean Squared Error (on training data): 28926.608 Root Mean Squared Error (on training data): 170.078 period:4 Mean Squared Error (on training data): 28210.436 Root Mean Squared Error (on training data): 167.960 period:5 Mean Squared Error (on training data): 27833.080 Root Mean Squared Error (on training data): 166.832 period:6 Mean Squared Error (on training data): 27677.242 Root Mean Squared Error (on training data): 166.365 period:7 Mean Squared Error (on training data): 27643.450 Root Mean Squared Error (on training data): 166.263 period:8 Mean Squared Error (on training data): 27708.330 Root Mean Squared Error (on training data): 166.458 period:9 Mean Squared Error (on training data): 27817.395 Root Mean Squared Error (on training data): 166.785