Skip to content

Latest commit

 

History

History
60 lines (58 loc) · 7.62 KB

app-setup.md

File metadata and controls

60 lines (58 loc) · 7.62 KB

Schedule a Real-Time Analytic Task & a Source that emits events

We will now configure a Source to emit data into the Kafka brokers. A real-time analytic task using SPark Streaming will then consume the data and write the results to the spatiotemporal-store. The spatiotemporal-store uses Elasticsearch to efficiently index observations by space, time, and all the other attributes of the event. The JavaScript map app periodically queries to reflect the latest state of observations on a map.

Step 1: We will now review a real-time analytic task marathon configuration found at spatiotemporal-esri-analytics/rat01.json. Breaking the marathon app configuration file down:

  • deploys 3 instances of a 'rat01' deployed as mesosphere/spark-1.6.1.6 Docker containers
  • each container is allocated 4 cpu shares & 2GB of memory
  • each container starts up with the spark-submit command with lots of application specific parameters
  • the --class gets bootstraped in via a URI downloaded prior to the start of each container



Step 2: To schedule spatiotemporal-esri-analytics/rat01.json onto the DC/OS cluster issue the following DC/OS CLI command
  • dcos marathon app add spatiotemporal-esri-analytics/rat01.json



Step 3: Open the Marathon dashboard to view the deployment progress of rat01:



Step 4: Click on the rat01 application to see more details include what hosts and ports it was scheduled to:



Step 5: Open the Mesos dashboard to view the active tasks of rat01:



Step 6: For each rat01 instance click on it's 'Sandbox' and open the stdout file to monitor verbose print outs of rat01:



Step 7: The three stdout files of the associated rat01 instances are showing that they are saving 0 records to Elasticsearch. This is because we have not yet enabled a Source that will emit events.



Step 8: In order for us to partition events sent to Kafka in an evenly distributed mode we will create a topic with partitions matching the number of brokers we have deployed. The Source (producer) has code, see SpatiotemporalFileEventSource.scala use of SimplePartitioner.scala, that partitions the events in a consistent manner so the same taxi ids go to the same partitions while evenly distributing the load across the configured partitions of the topic.
  • dcos kafka topic create taxi --partitions=3 --replication=1



Step 9: The Source has runtime parameters that specify deployment hosts & ports of the Kafka brokers. To learn this information use the DC/OS CLI and issue the following command
  • dcos kafka connection



Step 10: We will now review a source task marathon configuration found at spatiotemporal-event-source/source01.json. Breaking the marathon app configuration file down:
  • deploys 1 instance of a 'source01' deployed as a amollenkopf/spatiotemporal-event-source Docker container
  • each container is allocated 1 cpu shares & 5GB of memory (needed for the large simulation file)
  • each container starts up with the java command with lots of application specific parameters (including the Kafka broker hosts & ports)
  • the --class comes as part of the amollenkopf/spatiotemporal-event-source Docker image



Step 11: To schedule a Source that emits events into a Kafka topic's partitions running on a DC/OS cluster issue the following DC/OS CLI command
  • dcos marathon app add spatiotemporal-event-source/source01.json



Step 12: Open the Marathon dashboard to view the deployment progress of source01 (it will take 1-2 minutes to deploy as the Docker image is large due to the size of the simulation file):



Step 13: Click on the source01 application to see more details include what host and port it was scheduled to:



Step 14: Open the Mesos dashboard to view the active task of source01:



Step 15: Click on the 'Sandbox' of the source01 instance and open the stdout file to monitor verbose print outs of source01:



Step 16: The stdout file of the associated source01 instance shows that it is emitting events to the Kafka topic partitions every 3 seconds:



Step 17: The three stdout files of the associated rat01 instances are now showing they are consuming these events evenly as each is subscribed to a unique Kafka topic partition:



Step 18: Go back to the browser tab that has the map app and hit the refresh button. You should now see taxi content appearing on the map asgeohash aggregations that are auto-updated as new data appears in Elasticsearch:



Step 19: The map app has the ability to enable 'Replay' of the spatiotemporal observations. To enable this flip the dial to on and use the time slider on the bottom left corner to specify the time window you want to replay with:



Step 20: Steppign forward on the replay we can see the counts (labels on the goehash aggregations) increasing:



Step 21: The map app also supports the ability to generate a client-side heatmap based on content being queried from Elasticsearch:



Step 22: Using the timeslider we can see how the density changes over time



Step 23: Disabling both the Heatmap and Replay capabilities we get back to a near real-time view of the obervations:



Step 24: Reviewing the stdout files of the associated real-time analytic tasks we can see that they are continuing to process events in a distributed fashion: