I am starting to think theres a concept of the clustering that is causing this that I don't understand. Changing the Kafka broker security settings will not update the Flume configuration. I dont see how m2 could be generating into hdfs as in the configuration file we never mention the IP address of m2, only of m1. And when I stopped it, only one resolved itself! Then I started m2 again and a new tmp file appeared! I'm completely baffled. Oddly enough though when I started m1 again, two tmp files appeared. When I stopped m2, the other resolved itself. I thought maybe since we have two masters in our cluster, I would stop one of them and let the other run - see if maybe if somehow a flume on master1 was creating one tmp file and a flume on master2 is creating the other and Im getting unusual results.įor instance, it seemed to indicate this was the issue when I stopped m1 and then only one. Im trying to discover if I could have more than one flume agent running but I don't see but one flume agent running. The data loss isn't occurring but as I specified in my other question, I'm still getting two. ![]() Thanks but actually while syslog is our original source its not the source for the hdfs sink. ![]() I also cannot find any indication in the kafka logs or flume logs of any data loss. I find it interesting and unusual that the blocks of skipped data are always 10, 20 or 30 mins and happen at least once an hour in my data.Ī1. = .kafka.KafkaSourceĪ1. = 10.xx.x.xx:xxxxĪ1. = memory-channelĪ1. = 100000Ī1. = 1000Ī1. = DataStreamĪ1. = /topics/%/%m-%d-%YĪ1. = firewall1Ī1. =. I find it interesting and unusual that the blocks of skipped data are always 10, 20 or 30 mins and happen at least once an hour in my data. (The original data is coming from a syslog, going through a different flume agent and being put into the Kafka topic - the data loss isn't happening there). (The original data is coming from a syslog, going through a different flume agent and being put into the Kafka topic - the data loss isnt happening there). log file that is being generated by Kafka. I have verified that the skipped data is in the topic. ![]() The pattern seems to be in 10, 20 or 30 minute blocks of data skipping. If you do not see that page, try a different browser. I am experiencing data loss (skipping chunks of time in data) when I am pulling data off a kafka topic as a source and putting it into an HDFS file (DataStream) as a sink. Apache Flume Download Apache Flume is distributed under the Apache License, version 2.0 The link in the Mirrors column should display a list of available mirrors with a default selection based on your inferred location.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |