Spark streaming消费TBDS kafka代码讲解 - TBDSUDC/tdbs-document GitHub Wiki

Kafka开发文档

kafka开发前准备工作

  • 获取认证信息,从tbdsportal界面获取认证信息

1566546416028

  • 创建认证配置文件

    . 构建kafka认证文件:

    echo "security.protocol=SASL_TBDS" >> /opt/client_tbds.properties

    echo "sasl.mechanism=TBDS" >> /opt/client_tbds.properties

    echo "sasl.tbds.secure.id=dfksxxxxxxxxxx1k5" >> /opt/client_tbds.properties

    echo "sasl.tbds.secure.key=eU3xxxxxxvxxxxx" >> /opt/client_tbds.properties

    echo "group.id=testgroup" >> /opt/client_tbds.properties

  • 创建topic:

    cd /data/bigdata/tbds/usr/hdp/2.2.0.0-2041/kafka/bin

    ./kafka-topics.sh --create --zookeeper ..**:2181 --replication-factor 3 --partitions 2 --topic web_click_topic

    ./kafka-topics.sh --describe --topic web_click_topic --zookeeper tbds-10-3-0-13:2181

  • 权限修改

    1566546585153

编码重点

  • Java编码直接消费

1566547060075

  • Spark消费kafka消息

    import org.apache.kafka.common.TopicPartition
    import org.apache.spark.SparkConf
    import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
    import org.apache.spark.streaming.{Seconds, StreamingContext}
    
    /**
      * spark 与 kafka 集成demo
      */
    object SparkAndKafkaDemo {
    
      /**
        * Main 程序的主方法
        * @param args
        */
      def main(args: Array[String]): Unit = {
        /**定义程序运行的名称**/
        val appName = "sparkKafkaDemo";
        val groupId = "test_group"
        val brokers = "172.xxxxxx:6667";
        /**初始化spark conf的信息,采用本地运行模式**/
        val sparkConf = new SparkConf().setAppName(appName);
        val sc = new StreamingContext(sparkConf,Seconds(5));
        //定义checkpoint
        var topics = Array("test01").toSet;
        //定义kafka的参数
        val kafkaParams = collection.Map[String, Object](
          "bootstrap.servers" -> brokers,
          "key.serializer" -> classOf[org.apache.kafka.common.serialization.StringSerializer],
          "value.serializer" -> classOf[org.apache.kafka.common.serialization.StringSerializer],
          "key.deserializer" -> classOf[org.apache.kafka.common.serialization.StringDeserializer],
          "value.deserializer" -> classOf[org.apache.kafka.common.serialization.StringDeserializer],
          "group.id" -> groupId,
          "auto.offset.reset" -> "latest",
          "enable.auto.commit" -> (true: java.lang.Boolean),
          "security.protocol" -> "SASL_TBDS",
          "sasl.mechanism" -> "TBDS",
          "sasl.tbds.secure.id" -> "cxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
          "sasl.tbds.secure.key" -> "27GOxxxxxxxxxxxxxxxxxxxxxxxx"
        )
    
        //定义消费者,忽略offset的设置了哈
        var consumerStrategy = ConsumerStrategies.Subscribe[String,String](topics,kafkaParams);
    
        //通过kafaUtils来进行数据的读取
        val lines = KafkaUtils.createDirectStream(sc,LocationStrategies.PreferConsistent,consumerStrategy)
    
        //读取的数据进行转换成单词,根据空格进行分割,然后在统计技术
        val words = lines.flatMap(_.topic().split(" ")).map(x=>(x,1))
        //根据第一个key进行分组,然后将数据相加
        words.reduceByKey(_ + _).print()
        sc.start()
        sc.awaitTermination()
      }
    }
    

代码demo

https://github.com/TBDSUDC/TBDSDemo/tree/master/%E7%94%A8%E6%88%B7%E6%89%8B%E5%86%8C%E9%9C%80%E8%A6%81%E7%94%A8%E5%88%B0%E7%9A%84%E7%A4%BA%E4%BE%8B%E7%A8%8B%E5%BA%8F/Kafka%20java%E6%8E%A5%E5%8F%A3%E6%A1%88%E4%BE%8B/TBDS%E8%AE%A4%E8%AF%81

https://github.com/TBDSUDC/TBDSDemo/tree/master/%E7%94%A8%E6%88%B7%E6%89%8B%E5%86%8C%E9%9C%80%E8%A6%81%E7%94%A8%E5%88%B0%E7%9A%84%E7%A4%BA%E4%BE%8B%E7%A8%8B%E5%BA%8F/spark-streaming%E6%B6%88%E8%B4%B9kafka%E7%A4%BA%E4%BE%8B