Stream processing - Beam - /s (strangemonad's notes)

# Beam ## Questions - how to use the flink runner on KDA? - can you use beam-sql on KDA? - Flink vs apex? - https://stackoverflow.com/questions/45861918/apache-apex-vs-apache-flink - fundamentally focused on yarn (i.e. a hadoop-based streaming engine) - Consuming Beam app outputs in akka gracefully? - Akka streams? - Kafkfa streams - KStream: immutalbe (possibly unbounded?) log - KTable: mutable materialized view - Where does rx java fit in? - ## Beam Overview - A unified api (i.e. unifies batch and streaming) - Specifies a programing model (via an api) exposed in multiple langauges - Doesn't specify an execution engine (operates with many execution engines via specialized runners e.g. the flink runner or the spark runner). - Like Flink, inspired by MillWheel and Dataflow papers - How beam runs on top of flint http://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html - Beam (core) + a beam runner impl translates the beam program into one compatible with the execution engine. - Beam: Apache Flink runner https://beam.apache.org/documentation/runners/flink/ - Reasons for Beam on Flink - Beam unifies back and streaming - Beam smoothly supports multiple programing languages + native ecosystem libs (e.g. numpy pandas, tensorflow) - leverage the best of the flink engine (e.g. leverage flink's exactly once semantics and management tools) ## Beam Runner compatibility - https://beam.apache.org/documentation/runners/capability-matrix/ # Apps with queriable state - Managing Streaming And Queryable State In Spark, Akka Streams, Kafka Streams, And Flink https://www.lightbend.com/blog/managing-streaming-and-queryable-state-in-spark-akka-streams-kafka-streams-flink ## Thoughts - Make the pipeline smart - similar to pushing intelligence to the edge -