[关闭]
@zh350229319 2019-03-27T07:42:16.000000Z 字数 2005 阅读 830

TensorFlow集群部署

TensorFlow


CenterOS 7 安装 Python3

Python3.6.5 安装

  1. tar -zxvf Python-3.6.5.tgz
  2. cd Python-3.6.5/
  3. ./configure --prefix=/usr/local/python3
  4. make
  5. make install
  6. # 删除原来的python引用,替换为python3
  7. rm /usr/bin/python
  8. ln -s /usr/local/python3/bin/python3 /usr/bin/python
  9. ln -s /usr/local/python3/bin/pip3 /usr/bin/pip
  10. #测试
  11. python

修复YUM命令

yum命令需要 python2 环境执行
修改 /usr/bin/yum 文件中的第一行为 #!/usr/bin/python2

TensorFlow集群

Tensorflow分布式原理理解

安装TensorFlow

因为CPU型号较老,只能安装 1.5 版本
在节点上分别执行 pip install tensorflow==1.5

server 配置

在第 1 台机器上执行 vi server.py
在另外 2 台机器上分别修改task_index为 1,2

  1. import tensorflow as tf
  2. worker1 = "hbbw214:12000"
  3. worker2 = "hbbw112:10000"
  4. worker3 = "hbbw113:10000"
  5. worker_hosts = [worker1, worker2, worker3]
  6. cluster_spec = tf.train.ClusterSpec({ "worker": worker_hosts})
  7. server = tf.train.Server(cluster_spec, job_name="worker", task_index=0)
  8. server.join()

在第2台机器上执行vi server.py

  1. import tensorflow as tf
  2. worker1 = "hbbw214:12000"
  3. worker2 = "hbbw112:12000"
  4. worker3 = "hbbw113:12000"
  5. worker_hosts = [worker1, worker2, worker3]
  6. cluster_spec = tf.train.ClusterSpec({ "worker": worker_hosts})
  7. server = tf.train.Server(cluster_spec, job_name="worker", task_index=1)
  8. server.join()

在第3台机器上执行vi server.py

  1. import tensorflow as tf
  2. worker1 = "hbbw214:12000"
  3. worker2 = "hbbw112:12000"
  4. worker3 = "hbbw113:12000"
  5. worker_hosts = [worker1, worker2, worker3]
  6. cluster_spec = tf.train.ClusterSpec({ "worker": worker_hosts})
  7. server = tf.train.Server(cluster_spec, job_name="worker", task_index=2)
  8. server.join()

在任意一台机器上执行 vi client.py

  1. import tensorflow as tf
  2. import numpy as np
  3. train_X = np.linspace(-1,1,1000000)
  4. train_Y = 2*train_X + np.random.randn(*train_X.shape)*0.33+10
  5. X = tf.placeholder("float")
  6. Y = tf.placeholder("float")
  7. w = tf.Variable(0.0, name="weight")
  8. b = tf.Variable(0.0, name="reminder")
  9. init_op = tf.global_variables_initializer()
  10. cost_op = tf.square(Y - tf.multiply(X,w) - b)
  11. train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost_op)
  12. with tf.Session("grpc://hbbw214:10000") as sess:
  13. with tf.device("/job:worker/task:0"):
  14. sess.run(init_op)
  15. for i in range(10):
  16. for (x, y) in zip(train_X, train_Y):
  17. sess.run(train_op, feed_dict={X:x, Y:y})
  18. print(sess.run(w))
  19. print(sess.run(b))

启动

先分别执行 python server.py
最后执行 python client.py

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注