[关闭]
@EVA001 2017-12-02T12:44:39.000000Z 字数 17591 阅读 350

Spark学习笔记(一)

未分类


基本概括

概述

spark快速
- 扩充了mapreduce
- 基于内存计算(中间结果的存储位置)
spark通用
- 批处理hadoop
- 迭代计算 机器学习系统
- 交互式查询 hive
- 流处理 storm
spark开放
- Python API
- Java/Scala API
- SQL API
- 整合好hadoop/kafka

主要内容

发展历史

2009 RAD实验室,引入内存存储
2010 开源
2011 AMP实验室,Spark Streaming
2013 Apache顶级项目

主要组件

Spark Core:
- 包括spark的基本功能,任务调度、内存管理、容错机制
- 内部定义RDDs(弹性分布式数据集)
- 提供APIs来创建和操作RDDs
- 为其他组件提供底层服务
Spark SQL:
- 处理结构化数据的库,类似于HiveSQL、Mysql
- 用于报表统计等
Spark Streaming:
- 实时数据流处理组件,类似Storm
- 提供API来操作实时数据流
- 使用场景是从Kafka等消息队列中接收数据实时统计
Spark Mlib:
- 包含通用机器学习功能的包,Machine Learning Lib
- 包含分类、聚类、回归、模型评估、数据导入等
- Mlib所有算法均支持集群的横向扩展(区别于python的单机)
GraphX:
- 处理图数据的库,并行的进行图的计算
- 类似其他组件,都继承了RDD API
- 提供各种图操作和常用的图算法,PageRank等
Spark Cluster Managers:
- 集群管理,Spark自带一个集群管理调度器
- 其他类似的有Hadoop YARN,Apache Mesos

紧密集成的优点:

hadoop应用场景

spark应用场景

Doug Cutting的观点:

生态系统、各司其职
Spark需要借助HDFS进行持久化存储

运行环境搭建

基础环境:

具体步骤:

详见http://dblab.xmu.edu.cn/blog/spark-quick-start-guide/
主要是两个步骤:
1. 安装Hadoop(不做介绍)
2. 解压Spark到对应位置,然后在spark-env.sh中添加SPARK_DIST_CLASSPATH
3. run-example SparkPi已可以正常运行示例
注意几点:

Spark目录:

Spark shell:

Scala shell:/bin/scala-shell

注意:

  1. [hadoop@hadoop01 bin]$ ./spark-shell
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. Setting default log level to "WARN".
  8. To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  9. 17/06/30 12:17:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  10. 17/06/30 12:17:49 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
  11. 17/06/30 12:17:49 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
  12. java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
  13. at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1053)
  14. at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
  15. at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
  16. at scala.Option.getOrElse(Option.scala:121)
  17. at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:129)
  18. at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:126)
  19. at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:938)
  20. at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:938)
  21. at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  22. at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  23. at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  24. at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  25. at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  26. at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:938)
  27. at org.apache.spark.repl.Main$.createSparkSession(Main.scala:97)
  28. ... 47 elided
  29. Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.net.ConnectException: Call From hadoop01/192.168.146.130 to hadoop01:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused;
  30. at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
  31. at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:193)
  32. at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
  33. at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
  34. at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
  35. at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
  36. at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
  37. at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
  38. at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
  39. at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1050)
  40. ... 61 more
  41. Caused by: java.lang.RuntimeException: java.net.ConnectException: Call From hadoop01/192.168.146.130 to hadoop01:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
  42. at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  43. at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
  44. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  45. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  46. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  47. at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  48. at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  49. at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
  50. at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
  51. at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
  52. at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
  53. at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:194)
  54. at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
  55. at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
  56. at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  57. ... 70 more
  58. Caused by: java.net.ConnectException: Call From hadoop01/192.168.146.130 to hadoop01:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
  59. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  60. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  61. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  62. at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  63. at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
  64. at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
  65. at org.apache.hadoop.ipc.Client.call(Client.java:1479)
  66. at org.apache.hadoop.ipc.Client.call(Client.java:1412)
  67. at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  68. at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
  69. at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
  70. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  71. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  72. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  73. at java.lang.reflect.Method.invoke(Method.java:498)
  74. at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
  75. at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  76. at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
  77. at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
  78. at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
  79. at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
  80. at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  81. at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
  82. at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
  83. at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
  84. at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  85. at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  86. ... 84 more
  87. Caused by: java.net.ConnectException: 拒绝连接
  88. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  89. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  90. at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  91. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
  92. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
  93. at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
  94. at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
  95. at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
  96. at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
  97. at org.apache.hadoop.ipc.Client.call(Client.java:1451)
  98. ... 104 more
  99. <console>:14: error: not found: value spark
  100. import spark.implicits._
  101. ^
  102. <console>:14: error: not found: value spark
  103. import spark.sql
  104. ^
  105. Welcome to
  106. ____ __
  107. / __/__ ___ _____/ /__
  108. _\ \/ _ \/ _ `/ __/ '_/
  109. /___/ .__/\_,_/_/ /_/\_\ version 2.2.0
  110. /_/
  111. Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
  112. Type in expressions to have them evaluated.
  113. Type :help for more information.
  114. scala>
  115. scala> val lines = sc.textFile("/home/hadoop/look.sh")
  116. <console>:17: error: not found: value sc
  117. val lines = sc.textFile("/home/hadoop/look.sh")
  118. ^
  1. scala> val lines = sc.textFile("/home/hadoop/look.sh")
  2. lines: org.apache.spark.rdd.RDD[String] = /home/hadoop/look.sh MapPartitionsRDD[1] at textFile at <console>:24
  3. scala> lines.count()
  4. org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://hadoop01:9000/home/hadoop/look.sh
  5. at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
  6. at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
  7. at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
  8. at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
  9. at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  10. at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  11. at scala.Option.getOrElse(Option.scala:121)
  12. at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  13. at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  14. at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  15. at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  16. at scala.Option.getOrElse(Option.scala:121)
  17. at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  18. at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
  19. at org.apache.spark.rdd.RDD.count(RDD.scala:1158)
  20. ... 48 elided
  21. scala> val lines = sc.textFile("file:///home/hadoop/look.sh")
  22. lines: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/look.sh MapPartitionsRDD[3] at textFile at <console>:24
  23. scala> lines.count()
  24. res1: Long = 26
  25. scala> lines.first()
  26. res2: String = #!/bin/bash

开发环境搭建

安装Scala环境

注意:

第一个Scala程序:WordCount

注意:
类似于Hadoop,如果开发环境不在集群内,例如在自己PC中的IDEA进行开发(使用虚拟机同理),那么就会产生两种运行方式,一是本地运行,二是提交集群运行。
本质上两种方式都是先打包,再上传(本地或集群)。即流程是一致的,但是在PC中引入的spark-core的作用是不同的,提交集群运行时,PC中的spark-core内容只是作为语法检查,类方法调用等辅助作用;但是本地运行时,除了上述功能外,其还充当了计算部分,即可以使PC成为一个类似节点的且有计算能力的存在。

全部步骤:
PC上安装Scala环境,IDEA,IDEA安装Scala插件

1.本地运行

  1. import org.apache.spark.{SparkConf, SparkContext}
  2. import org.apache.spark.rdd.RDD
  3. object WordCount extends App {
  4. // 读取本地文件
  5. val path = "C:\\Users\\msi\\Desktop\\xiaomi2.txt"
  6. // 本地调试
  7. val conf = new SparkConf().setAppName("SparkDemo").setMaster("local")
  8. val sc = new SparkContext(conf)
  9. val lines = sc.textFile(path)
  10. val words = lines.flatMap(_.split(" ")).filter(word => word != " ")
  11. val pairs = words.map(word => (word, 1))
  12. val wordscount: RDD[(String, Int)] = pairs.reduceByKey(_ + _)
  13. wordscount.collect.foreach(println)
  14. }

打印结果:
注意下述的IP地址和file路径,确实是在本地运行的,而且就是引入的sparl-core起的作用

  1. D:\Java\jdk1.8.0_77\bin\java "-javaagent:D:\JetBrains\IntelliJ IDEA
  2. ...
  3. 17/11/28 00:40:21 INFO Executor: Starting executor ID driver on host localhost
  4. 17/11/28 00:40:21 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58570.
  5. 17/11/28 00:40:21 INFO NettyBlockTransferService: Server created on 192.168.230.1:58570
  6. 17/11/28 00:40:21 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
  7. 17/11/28 00:40:21 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.230.1, 58570, None)
  8. 17/11/28 00:40:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.230.1:58570 with 1992.9 MB RAM, BlockManagerId(driver, 192.168.230.1, 58570, None)
  9. ...
  10. 17/11/28 00:40:22 INFO HadoopRDD: Input split: file:/C:/Users/msi/Desktop/xiaomi2.txt:0+903
  11. 17/11/28 00:40:22 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1111 bytes result sent to driver
  12. 17/11/28 00:40:22 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 183 ms on localhost (executor driver) (1/1)
  13. ...
  14. (小米客服那些事,1)
  15. (贤艾森秋t4krP0,1)
  16. (北京IHG向,1)
  17. 17/11/28 00:40:22 INFO SparkContext: Invoking stop() from shutdown hook
  18. 17/11/28 00:40:22 INFO SparkUI: Stopped Spark web UI at http://192.168.230.1:4040
  19. ...
  20. Process finished with exit code 0

2.提交集群运行

  1. import org.apache.spark.{SparkConf, SparkContext}
  2. import org.apache.spark.rdd.RDD
  3. object WordCount extends App {
  4. // 读取hdfs文件
  5. val path = "hdfs://192.168.146.130:9000/spark/look.sh"
  6. //远程调试
  7. val conf = new SparkConf()
  8. .setAppName("scalasparktest")
  9. .setMaster("spark://192.168.146.130:7077")
  10. .setJars(List("I:\\IDEA_PROJ\\ScalaSparkTest\\out\\scalasparktest_jar\\scalasparktest.jar"))
  11. val sc = new SparkContext(conf)
  12. val lines = sc.textFile(path)
  13. val words = lines.flatMap(_.split(" ")).filter(word => word != " ")
  14. val pairs = words.map(word => (word, 1))
  15. val wordscount: RDD[(String, Int)] = pairs.reduceByKey(_ + _)
  16. wordscount.collect.foreach(println)
  17. }

image_1bvv7g94014f91qml150nsnb1ito9.png-60.3kB
image_1bvv7i5bj14a0l4c2kc1pra1d70m.png-43kB
点击OK后,选择Jar打包后的路径
image_1bvv8p92j104td4dhfn1ro59b51g.png-154.8kB

使用命令:
启动master: ./sbin/start-master.sh
启动worker: ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.146.130:7077
需要配置spark-env.sh中:(下面设为localhost就远程不了了)
export SPARK_MASTER_HOST=192.168.146.130
export SPARK_LOCAL_IP=192.168.146.130
注意更新配置文件后需要把master和worker都重启才可以生效(单机两者都在一个机器上的情况)

出现的错误:
错误:java.io.FileNotFoundException: Jar I:\IDEA_PROJ\ScalaSparkTest\out\scalasparktest.jar not found
解决:修改setJar方法参数中的jar路径

错误:Could not connect to spark://192.168.146.130:7077
解决:重启worker和master,前提是spark-env.sh中的MASTER_IP和WORKER_IP要设置正确

错误:Exception: Call From msi-PC/192.168.230.1 to 192.168.146.130:8020 failed on connection exception: java.net.ConnectException: Connection refused: no further information;
解决:hdfs端口错误,很多教程写的是8020端口,但我hdfs是9000端口,所以要更正

错误:Invalid signature file digest for Manifest main attributes
解决:打包的文件很大,把全部依赖都打包了,90多M,但正常应该10多M,删掉无用的依赖,并且把sbt中spark-core的依赖设为provided模式
image_1bvvammdghvqppd1dp3nrc1ema1t.png-110.7kB

错误:重复出现如下错误

  1. 17/11/28 20:20:52 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  2. 17/11/28 20:21:07 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  3. 17/11/28 20:21:22 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

解决:应该是将虚拟机挂起后再激活使用时出现的一些错误,将hadoop和spark都重启即可

提交集群运行的结果:(注意IP和端口,确实是提交到集群/虚拟机 上运行后返回的结果)
整个过程全部在IDEA中,完全达到了本地调试,自动上传集群,并返回结果的流程

  1. D:\Java\jdk1.8.0_77\bin\java "-javaagent:D:\JetBrains\IntelliJ IDEA
  2. ...
  3. 17/11/28 02:09:39 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20170630223625-0006/0 on worker-20170630215502-192.168.146.130-50762 (192.168.146.130:50762) with 1 cores
  4. 17/11/28 02:09:39 INFO StandaloneSchedulerBackend: Granted executor ID app-20170630223625-0006/0 on hostPort 192.168.146.130:50762 with 1 cores, 1024.0 MB RAM
  5. 17/11/28 02:09:39 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20170630223625-0006/0 is now RUNNING
  6. ...
  7. 17/11/28 02:09:43 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.146.130:47071 with 413.9 MB RAM, BlockManagerId(0, 192.168.146.130, 47071, None)
  8. ...
  9. 17/11/28 02:09:50 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
  10. 17/11/28 02:09:50 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.146.130, executor 0, partition 0, ANY, 4853 bytes)
  11. ...
  12. 17/11/28 02:09:55 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.146.130, executor 0, partition 1, ANY, 4853 bytes)
  13. ...
  14. 17/11/28 02:09:55 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
  15. 17/11/28 02:09:55 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, 192.168.146.130, executor 0, partition 0, NODE_LOCAL, 4625 bytes)
  16. ...
  17. 17/11/28 02:09:56 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, 192.168.146.130, executor 0, partition 1, NODE_LOCAL, 4625 bytes)
  18. ...
  19. (-ef|grep,1)
  20. ($Jarstr,1)
  21. ([[,1)
  22. (do,1)
  23. (YES,1)
  24. (while,1)
  25. ("$Jarinfo",1)
  26. (echo,1)
  27. (#!/bin/bash,1)
  28. 17/11/28 02:09:56 INFO SparkContext: Invoking stop() from shutdown hook
  29. ...
  30. Process finished with exit code 0
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注