TensorFlow로 파일에서 데이터 읽어오기

AI/Tensorflow

TensorFlow로 파일에서 데이터 읽어오기

Hubring 2021. 3. 5. 12:09

[참고] 모두를 위한 딥러닝 - 기본적인 머신러닝과 딥러닝 강좌

Loading data from file

import numpy as np
xy = np.loadtxt('sample.csv', delimiter=',', dtype=np.float32)

x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

slicing

Queue Runners

파일 크기가 너무 커서 메모리에 모두 올리지 못할 경우 사용.
여러 파일을 받아서 파일 큐에 넣고 Reader를 통해 문서를 읽음
decoder에서 데이터를 양식에 맞게 처리(, 분리 등)
Example Queue 배치형태로 데이터를 넣어 사용하도록함.

bach 방식으로 가져오기

import tensorflow as tf
tf.set_random_seed(777)  # for reproducibility

filename_queue = tf.train.string_input_producer(
    ['data-01-test-score.csv'], shuffle=False, name='filename_queue')

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[0.], [0.], [0.], [0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)

# collect batches of csv in
train_x_batch, train_y_batch = \
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

...


# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for step in range(2001):
    x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    cost_val, hy_val, _ = sess.run(
        [cost, hypothesis, train], feed_dict={X: x_batch, Y: y_batch})
    if step % 10 == 0:
        print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)

shuffle_batch

배치를 순서대로가 아닌 섞어서 하고 싶을 경우

min_after_dequeue = 10000
capacity = min_after_dequeue + 3*batch_size
example_batch, label_batch = tf.train.shuffle_batch([example. label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue)