[关闭]
@Arslan6and6 2016-08-29T02:24:01.000000Z 字数 8908 阅读 626

【作业二十二】殷杰

第十二章、大数据协作框架之Oozie

---Oozie WorkFlow中Action练习及Coordinate使用

作业描述:

依据课堂讲解和官方 Reference,对 Oozie WorkFlow 中的常见的 Action 练习测试,认真阅读

Oozie 官方文档,提升英文阅读能力,具体要求如下:

1)MapReduce Action 配置的要点及如何进行巧妙的配置

2)Shell Action 配置时多处注意事项,企业常用

3)如何使用 Oozie 中 Coordinate 调度 WorkFlow

1)MapReduce Action 配置的要点及如何进行巧妙的配置

配置的要点

job.properties
---------定义workflow的位置

workflow.xml
---------start
---------action
---------mapreduce / shell
---------error
---------ok
---------kill
---------end

lib
---------存放job任务需要的资源(jar包)

具体配置步骤:

1.复制 /opt/modules/oozie-4.0.0-cdh5.3.6/examples/apps/map-reduce 到新建目录 /opt/modules/oozie-4.0.0-cdh5.3.6/oozie-apps 中,以便使用examples中的map-reduce模板

2.在HDFS创建 /user/beifeng/oozie-apps

3.将要使用的 map-reduce jar包复制到 oozie-apps/map-reduce/lib/

cp /home/beifeng/jar/swordcount.jar /opt/modules/oozie-4.0.0-cdh5.3.6/oozie-apps/map-reduce/lib/

4.job.properties配置 ,注意对应目录

  1. nameNode=hdfs://hadoop-senior.ibeifeng.com:8020
  2. jobTracker=hadoop-senior.ibeifeng.com:8032
  3. queueName=default
  4. examplesRoot=oozie-apps/map-reduce
  5. oozie.wf.application.path=${nameNode}/user/beifeng/${examplesRoot}/workflow.xml
  6. outputDir=map-reduce-swc

5.workflow.xml配置

workflow名称及输入目录配置,输出目录在此不做更改

  1. //定义workflow名称, name="..."中不超过20个字符,否则会报错
  2. <workflow-app xmlns="uri:oozie:workflow:0.2" name="swc-map-reduce-wf">
  3. //定义预先删除目录
  4. <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
  5. //定义输入目录
  6. <property>
  7. <name>mapred.input.dir</name>
  8. <value>/user/${wf:user()}/${examplesRoot}/input-data</value>
  9. </property>

根据mapred.input.dir在/opt/modules/oozie-4.0.0-cdh5.3.6/oozie-apps/map-reduce创建input-data目录
复制测试文件sort.txt至input-data目录,便于mapred.input.dir有文件夹及资源可用


mapper配置

  1. <property>
  2. <name>mapred.mapper.class</name>
  3. <value>org.apache.oozie.example.SampleMapper</value>
  4. </property>

更改上面mapred.mapper.class原配置并启用新API,参照预先运行jar包参数
image_1akanc6ri1l4r8qh13ab122u1edg13.png-62kB

  1. //启用新API,防止兼容性报错
  2. <property>
  3. <name>mapred.mapper.new-api</name>
  4. <value>true</value>
  5. </property>
  6. //参照预先运行jar包参数更改 mapred.mapper.class
  7. <property>
  8. <name>mapreduce.job.map.class</name>
  9. <value>org.apache.hadoop.oozietest.WordCountMapReduce$WordCountMapper</value>
  10. </property>

添加mapreduce.map.output.key.class和mapreduce.map.output.value.class
image_1akaokpq78sit7tv3um602pe2a.png-71.8kB

  1. <property>
  2. <name>mapreduce.map.output.key.class</name>
  3. <value>org.apache.hadoop.io.Text</value>
  4. </property><property>
  5. <name>mapreduce.map.output.value.class</name>
  6. <value>org.apache.hadoop.io.IntWritable</value>
  7. </property>

reducer配置

image_1akansuhm8dh49917n3n221n0h1g.png-61.5kB

  1. <property>
  2. <name>mapred.reducer.new-api</name>
  3. <value>true</value>
  4. </property>

image_1akappnqjn8n1f36duu1q9o1mur2n.png-59.7kB

  1. <property>
  2. <name>mapreduce.job.reduce.class</name>
  3. <value>org.apache.hadoop.oozietest.WordCountMapReduce$WordCountReduce</value>
  4. </property>

image_1akaq4vj3olh1saep8oglj1dlp34.png-65.2kB

  1. <property>
  2. <name>mapreduce.job.output.key.class</name>
  3. <value>org.apache.hadoop.io.Text</value>
  4. </property>
  5. <property>
  6. <name>mapreduce.job.output.value.class</name>
  7. <value>org.apache.hadoop.io.IntWritable</value>
  8. </property>

workflow配置全文如下

  1. <workflow-app xmlns="uri:oozie:workflow:0.2" name="swc-map-reduce-wf">
  2. <start to="mr-node"/>
  3. <action name="mr-node">
  4. <map-reduce>
  5. <job-tracker>${jobTracker}</job-tracker>
  6. <name-node>${nameNode}</name-node>
  7. <prepare>
  8. <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
  9. </prepare>
  10. <configuration>
  11. <property>
  12. <name>mapred.job.queue.name</name>
  13. <value>${queueName}</value>
  14. </property>
  15. <!--mapper-->
  16. <property>
  17. <name>mapred.mapper.new-api</name>
  18. <value>true</value>
  19. </property>
  20. <property>
  21. <name>mapreduce.job.map.class</name>
  22. <value>org.apache.hadoop.oozietest.WordCountMapReduce$WordCountMapper</value>
  23. </property>
  24. <property>
  25. <name>mapreduce.map.output.key.class</name>
  26. <value>org.apache.hadoop.io.Text</value>
  27. </property><property>
  28. <name>mapreduce.map.output.value.class</name>
  29. <value>org.apache.hadoop.io.IntWritable</value>
  30. </property>
  31. <!--reducer-->
  32. <property>
  33. <name>mapred.reducer.new-api</name>
  34. <value>true</value>
  35. </property>
  36. <property>
  37. <name>mapreduce.job.reduce.class</name>
  38. <value>org.apache.hadoop.oozietest.WordCountMapReduce$WordCountReduce</value>
  39. </property>
  40. <property>
  41. <name>mapreduce.job.output.key.class</name>
  42. <value>org.apache.hadoop.io.Text</value>
  43. </property>
  44. <property>
  45. <name>mapreduce.job.output.value.class</name>
  46. <value>org.apache.hadoop.io.IntWritable</value>
  47. </property>
  48. <!--putdir-->
  49. <property>
  50. <name>mapred.input.dir</name>
  51. <value>/user/${wf:user()}/${examplesRoot}/input-data</value>
  52. </property>
  53. <property>
  54. <name>mapred.output.dir</name>
  55. <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
  56. </property>
  57. </configuration>
  58. </map-reduce>
  59. <ok to="end"/>
  60. <error to="fail"/>
  61. </action>
  62. <kill name="fail">
  63. <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  64. </kill>
  65. <end name="end"/>
  66. </workflow-app>

执行操作

上传制作好的map-reduce文件夹至HDFS中oozie工作文件夹oozie-apps
并删除oozie-apps/map-reduce/lib下默认jar包oozie-examples-4.0.0-cdh5.3.6.jar

  1. bin/hdfs dfs -put /opt/modules/oozie-4.0.0-cdh5.3.6/oozie-apps/map-reduce/ oozie-apps

参照examples执行命令更改生成现有命令

  1. //原examples执行命令
  2. bin/oozie job -oozie http://hadoop-senior.ibeifeng.com:11000/oozie -config examples/apps/map-reduce/job.properties -run
  1. //现命令
  2. bin/oozie job -oozie http://hadoop-senior.ibeifeng.com:11000/oozie -config oozie-apps/map-reduce/job.properties -run

查看执行结果
image_1akatqjp0pkr15eg18g81u561bik3h.png-14.9kB

image_1akatskh1e0h1kc4t4if2vkai3u.png-55.4kB

image_1akb08j9a12vna6cth9ja5ch74b.png-26.6kB

总结配置对应关系

image_1akbal7k9jq6k99seq1p3dmkm13.png-91.4kB

2)Shell Action 配置时多处注意事项,企业常用

1.制作一个shell脚本
vi meninfo.sh

!bin/bash

/usr/bin/free -m >> /tmp/meminfo

2.复制examples/app文件夹下shell文件夹,到工作目录oozie-apps下,shell文件夹改名为mem-shell方便使用

3.配置job.properties文件

  1. nameNode=hdfs://hadoop-senior.ibeifeng.com:8020
  2. jobTracker=hadoop-senior.ibeifeng.com:8032
  3. queueName=default
  4. examplesRoot=oozie-apps/mem-shell
  5. oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/
  6. EXEC=meminfo.sh

4.配置workflow.xml文件
① name="mem-shell-wf"
② 删除 decision kill、 name="fail-output"、 argument和 等标签
③参考官网页面 http://oozie.apache.org/docs/4.0.0/DG_ShellActionExtension.html 语法

  1. <exec>${EXEC}</exec>
  2. <argument>A</argument>
  3. <argument>B</argument>
  4. <file>${EXEC}#${EXEC}</file> <!--Copy the executable to compute node's current working directory -->

更改 exec 和 file。
argument通常在shell脚本中设置,在此设置会加大更改工作量
file标签:根据“ Copy the executable to compute node's current working directory ” ,${EXEC}的路径不能引用,必须要写决定路径,否则任务会失败。

  1. <exec>${EXEC}</exec>
  2. <file>/user/beifeng/oozie-apps/mem-shell/${EXEC}#${EXEC}</file>

将ok标签改为 end

  1. <ok to="end"/>

④上传mem-shell文件夹到HDFS

  1. bin/hdfs dfs -put /opt/modules/oozie-4.0.0-cdh5.3.6/oozie-apps/mem-shell/ oozie-apps

⑤执行命令

  1. bin/oozie job -oozie http://hadoop-senior.ibeifeng.com:11000/oozie -config oozie-apps/mem-shell/job.properties -run

image_1akioi7j61p1e1tpsa7s1i2dais9.png-32kB

3)如何使用 Oozie 中 Coordinate 调度 WorkFlow

1.确认并同一服务器集群时间为通用协调时(UTC,Universal Time Coordinated)。UTC与格林尼治平均时(GMT,Greenwich Mean Time)相同。
date -R
Mon, 06 Jun 2016 20:38:57 +0800
+0800为东八区,时区正确。
如不正确, 修改oozie-site.xml添加

  1. <property>
  2. <name>oozie.processing.timezone</name>
  3. <value>GMT+0800</value>
  4. </property>

修改系统时区

  1. # rm -rf /etc/localtime
  2. # ln -s /usr/share/zoneinfo/Asia/S
  3. Saigon Sakhalin Samarkand Seoul Shanghai Singapore
  4. # ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

修改js时区 $OOZIE_HOME/oozie-server/webapps/oozie/oozie-console.js

  1. function getTimeZone() {
  2. Ext.state.Manager.setProvider(new Ext.state.CookieProvider());
  3. return Ext.state.Manager.get("TimezoneId","GMT+0800");

2.使用 mem-shell 案例中的workflow.xml

  1. rm -rf oozie-apps/cron/workflow.xml
  2. cp oozie-apps/mem-shell/workflow.xml oozie-apps/cron/

3.修改 job.properties

  1. nameNode=hdfs://hadoop-senior.ibeifeng.com:8020
  2. jobTracker=hadoop-senior.ibeifeng.com:8032
  3. queueName=default
  4. examplesRoot=oozie-apps/cron
  5. oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/
  6. start=2016-06-06T21:30+0800
  7. end=2016-06-06T23:00+0800
  8. workflowAppUri=${nameNode}/user/${user.name}/${examplesRoot}/
  9. EXEC=meminfo.sh

4.修改 coordinator.xml

  1. <coordinator-app name="cron-coord" frequency="${coord:minutes(2)}" start="${start}" end="${end}" timezone="GMT+0800"
  2. xmlns="uri:oozie:coordinator:0.2">

或者

  1. <coordinator-app name="cron-coord" frequency="*/2 * * * *" start="${start}" end="${end}" timezone="GMT+0800"
  2. xmlns="uri:oozie:coordinator:0.2">

5.因为 coordinator.xml 设置每2分钟执行一次任务 < 系统默认每5分钟检查时间 ,需关闭系统默认检查。
按照 oozie-default.xml 中的配置:

  1. <property>
  2. <name>oozie.service.coord.check.maximum.frequency</name>
  3. <value>false</value>
  4. <description>
  5. When true, Oozie will reject any coordinators with a frequency faster than 5 minutes. It is not recommended to disable
  6. this check or submit coordinators with frequencies faster than 5 minutes: doing so can cause unintended behavior and
  7. additional system stress.
  8. </description>
  9. </property>

在 oozie-site.xml 中 添加该设置,并将值设置为false。

  1. <property>
  2. <name>oozie.service.coord.check.maximum.frequency</name>
  3. <value>false</value>
  4. <description>
  5. When true, Oozie will reject any coordinators with a frequency faster than 5 minutes. It is not recommended to disable
  6. this check or submit coordinators with frequencies faster than 5 minutes: doing so can cause unintended behavior and
  7. additional system stress.
  8. </description>
  9. </property>

6.执行命令并查看结果
重启 oozie 后

  1. bin/oozie job -oozie http://hadoop-senior.ibeifeng.com:11000/oozie -config oozie-apps/cron/job.properties -run
  1. tail -f /tmp/meminfo
  2. total used free shared buffers cached
  3. Mem: 988 938 49 0 1 50
  4. -/+ buffers/cache: 887 101
  5. Swap: 1983 788 1195
  6. total used free shared buffers cached
  7. Mem: 988 940 48 0 1 57
  8. -/+ buffers/cache: 881 107
  9. Swap: 1983 790 1193
  10. total used free shared buffers cached
  11. Mem: 988 928 60 0 0 51
  12. -/+ buffers/cache: 876 112
  13. Swap: 1983 799 1184
  14. total used free shared buffers cached
  15. Mem: 988 939 49 0 0 60
  16. -/+ buffers/cache: 877 111
  17. Swap: 1983 797 1186
  18. total used free shared buffers cached
  19. Mem: 988 938 50 0 1 50
  20. -/+ buffers/cache: 886 102
  21. Swap: 1983 790 1193
  22. total used free shared buffers cached

image_1akj4pgqs150957h3i81m5j4mn1m.png-116.9kB

image_1akj5ruc51oaa7o7v7s1qpusbj23.png-126.6kB

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注