Table of Contents generated with DocToc

代码如下:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np

# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"

# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING,
    target_dtype=np.int,
    features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TEST,
    target_dtype=np.int,
    features_dtype=np.float32)

# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]

# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
                                            hidden_units=[10, 20, 10],
                                            n_classes=3,
                                            model_dir="/tmp/iris_model")

# Fit model.
classifier.fit(x=training_set.data,
               y=training_set.target,
               steps=2000)

# Evaluate accuracy.
accuracy_score = classifier.evaluate(x=test_set.data,
                                     y=test_set.target)["accuracy"]
print('Accuracy: {0:f}'.format(accuracy_score))

# Classify two new flower samples.
new_samples = np.array(
    [[6.4, 3.2, 4.5, 1.5], [5.8, 3.1, 5.0, 1.7]], dtype=float)
y = list(classifier.predict(new_samples, as_iterable=True))
print('Predictions: {}'.format(str(y)))

model可以保存在classifier里,即

classifier.fit(x=training_set.data, y=training_set.target, steps=2000)

等效于

classifier.fit(x=training_set.data, y=training_set.target, steps=1000)
classifier.fit(x=training_set.data, y=training_set.target, steps=1000)

流程: load data ---> build network ---> train ---> evaluate (predict).

使用tf.contrib.learn创建自己的输入函数

上面的代码中,读取数据的函数是tensorflow内置的函数,下面介绍如何自己写。

Convert feature data to Tensors

如果我们的数据是存在numpyarrays里或者pandasdataframe里,需要首先将其转换成tensorflow的数据形式Tensors

feature_column_data = [1, 2.4, 0, 9.9, 3, 120]
feature_tensor = tf.constant(feature_column_data)

tensorflow还有一种数据形式,叫做SparseTensor,用来存储feature data中有很多0的数据。需要指出dense_shape,indices, values,示例代码如下:

sparse_tensor = tf.SparseTensor(indices=[[0,1], [2,4]],
                                values=[6, 0.5],
                                dense_shape=[3, 5])

sparse_tensor的结构如下:

[[0, 6, 0, 0, 0]
 [0, 0, 0, 0, 0]
 [0, 0, 0, 0, 0.5]]

Passing input_fn Data to Your Model

基本的形式如下:

classifier.fit(input_fn=my_input_function_training_set, steps=2000)

其中my_input_function_training_set是一个函数,提供feature和label,不能只提供一个,要feature和label一起传进input_fn,否则会报ValueError

其中my_input_function_training_set的函数定义如下:

def my_input_function_training_set():
  return my_input_function(training_set)

training_set是raw data,经过my_input_function处理后,返回。 你不能这么写

classifier.fit(input_fn=my_input_fn(training_set), steps=2000)

这么写会报错,报TypeError

当然还有一种方法可以这么写:

classifier.fit(input_fn=functools.partial(my_input_function,
                                          data_set=training_set), steps=2000)

上面的functools.partial是把training_set当作参数,传进my_input_function,然后返回。

此外,还可以使用lambda来定义

classifier.fit(input_fn=lambda: my_input_fn(training_set), steps=2000)

小结:三种传递数据到input_fn的方式:

  • 使用def关键词定义函数;
  • 使用functools.partial;
  • 使用lambda

Build input_fn function

COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
           "dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
            "age", "dis", "tax", "ptratio"]
LABEL = "medv"

training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
                           skiprows=1, names=COLUMNS)
test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
                       skiprows=1, names=COLUMNS)
prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
                             skiprows=1, names=COLUMNS)


def input_fn(data_set):
  feature_cols = {k: tf.constant(data_set[k].values)
                  for k in FEATURES}
  labels = tf.constant(data_set[LABEL].values)
  return feature_cols, labels

regressor.fit(input_fn=lambda: input_fn(training_set), steps=5000)

results matching ""

    No results matching ""