Recently, we migrated one of our web apps to the Webpack 4, which decreases build time and reduces chunk size by using Split Chunks plugin. It automatically identifies modules which should be split by heuristics and splits the chunks. This blog post deals with our efforts in understanding the mysterious Split Chunks plugin.

The Problem

The problem we were facing with default Split Chunks config is that a module of large size 550 KB was duplicated in 4 async chunks. So, our goal was specifically to decrease the bundle size and utilize a better code splitting mechanism in the app.

Our Webpack configuration file looks like this:

// Filename: webpack.config.js

const webpack = require('webpack');
module.exports = {
   //...
   optimization: {
      splitChunks: {
         chunks: 'all'
      }
   }
};

We used webpack-bundle-analyzer to get a nice view of our problem.

Observation

By default, Split Chunks plugin only affects on-demand chunks and it split chunks based on following conditions:

  1. A new chunk should be shared or containing modules should be from the node_modules folder.
  2. New chunk should be bigger than 30 KB.
  3. Maximum number of parallel requests when loading chunks on demand should be lower or equal to 5.
  4. Maximum number of parallel requests at initial page load should be lower or equal to 3.

In our case, a separate chunk of the large-sized library would not be created.

What’s the reasoning behind this?

It satisfies first and second conditions as it is being used in 4 chunks and its size (550 KB) is bigger than 30 KB so concludes that it should be in a new chunk. But it does not satisfy the third one as 5 chunks were already created at each dynamic import which is the maximum limit for async requests. We observed that the first 4 chunks include all modules which are shared among 7,6,5,5 async chunks respectively and the last one is its own chunk. Modules on which a maximum number of async chunks are dependent on have been given priority and as a library is required by only 4 async chunks, a chunk containing it would not be created.

When we run yarn build to build our assets, a chunk named vendors~async.chunk.1~async.chunk.2~async.chunk.3~async.chunk.4 is not found in the output:

Solutions

We can have more control over this functionality. We can change default configuration in either or combination of the following ways:

  1. Increasing maxAsyncRequests result in more chunks. A large number of requests degrades the performance but it’s not a concern in HTTP/2 because of the request and response multiplexing. So, this configuration should be preferred in case of HTTP/2 only.

    Now let’s take a look at Webpack configuration file after this change:

        // Filename: webpack.config.js

        const webpack = require('webpack');
        module.exports = {
            //...
           optimization: {
              splitChunks: {
                 chunks: 'all',
                 maxAysncRequests: 20
              }
           }
        };
    
  1. Increasing minSize also gives the desired result. Some modules with higher usage in our app and size less than minSize would not be included in separate chunks as they all violate the second condition like in case of minSize 100 KB, modules greater than 100 KB are considered giving more possibilities for creating chunks containing large-sized modules.

    Now let’s take a look at Webpack configuration file after this change:

        // Filename: webpack.config.js

        const webpack = require('webpack');
        module.exports = {
            //...
           optimization: {
              splitChunks: {
                 chunks: 'all',
                 minSize: 100000
              }
           }
        };
     

Experiment

Steps:

  1. We picked any two async chunks between which a large-sized third-party library (550 KB) is shared. Let’s call these chunks as async.chunk.1 and async.chunk.2 and assume that chunk’s name and corresponding route’s name are same.
  2. Loaded async.chunk.1 route first and calculated the total content size loaded.
  3. Then navigated from async.chunk.1 route to async.chunk.2 route and calculated the content size again.

Results with first approach(varying the maxAsyncRequest property):

|   MaxAsyncRequests   |           async.chunk.1          |        async.chunk.2       |
|----------------------|----------------------------------|----------------------------|
|          5           |            1521.6 KB             |          758 KB            |
|          10          |            1523.76 KB            |          79.1 KB           |
|          15          |            1524 KB               |          79.1 KB           |
|          20          |            1524.3 KB             |          79.1 KB           |

After this change our bundles look like this:

With this configuration, a separate chunk named vendors~async.chunk.1~async.chunk.2~async.chunk.3~async.chunk.4 is created which is shown below:

Results with second approach(varying the minSize property):

|       MinSize       |          async.chunk.1           |        async.chunk.2       |
|---------------------|----------------------------------|----------------------------|
|        30 KB        |            1521.6 KB             |          758 KB            |
|        50 KB        |            1521.6 KB             |          188 KB            |
|        100 KB       |            1521.4 KB             |          78.4 KB           |

After this change our bundles look like this:

In this case too, a large-sized library is extracted into a separate chunk named vendors~async.chunk.1~async.chunk.2~async.chunk.3~async.chunk.4 which is shown below:

Note: async.chunk.2 chunk size in case of 50 KB minSize configuration is 188 KB whereas its size is reduced to 78.4 KB in case of 100 KB minSize configuration. This is because one more module of size 146 KB that are shared among four other chunks are extracted into a separate chunk decreasing overall bundle size to 78.4 KB (Awesome!).

Conclusion

Increasing minSize and maxAsyncRequests both decreases the size of async.chunk.2 chunk.

The second approach can result in multiple large-sized chunks, each one having multiple duplicated small-sized modules. On the other hand, the first approach will result in a large number of small chunks which do not have any duplicated module. Loading multiple small chunks increases the loading time of page but with HTTP/2, it will work efficiently.

Finally, we achieved what we wanted, a big library is now separated from our bundles and lazy loaded on demand. Thanks to Dinkar Pundir for helping me in solving the above problem. If you have any doubt feel free to drop a comment or tweet us at @wingify_engg.

Happy Chunking… !!


Heatmaps record visitor clicks on the live state of your website, which can be used to interpret user behavior on elements like modal boxes, pages behind logins, and dynamic URLs.

VWO Heatmap in action on vwo.com

But here comes a question, how to verify Heatmap E2E using automation? How to check if clicks are being plotted correctly? How to check if there is no data loss while plotting the clicks?

The answer to above questions is HTML Canvas. As VWO heatmaps are rendered on HTML canvas, we decided to leverage that to verify Heatmap E2E as well. The best part of using Canvas is that, it can be integrated easily with your existing Selenium scripts.

How can Canvas be used for Heatmap Automation?

There are two phases in order to verify if the heatmaps are working or not.

  1. The first phase is to plot clicks on the test page and store the clicks co-ordinates. This can be easily done using Selenium.
     //get elements location from the top of DOM
     element.getLocation().then(function (location) {
         //get elements height and width
         element.getSize().then(function (size) {
             //store element’s center coordinates w.r.t. top left corner of DOM in array    
             clickDataArray.push(new Coordinates(Math.floor(location.x + size.width / 2), Math.floor(location.y + size.height / 2)));
         });
     });   
    

    In this function, we are simply finding the center coordinates of an element where we have clicked and storing it in to an array. These stored coordinates would be further used to check if the clicks are plotted using the canvas functions or not.

  2. The second phase is to leverage canvas functions and the co-ordinate data stored in order to verify if heatmaps are plotted correctly. We simply check if heatmap canvas is empty and if it is empty, we would not check further.
     exports.isCanvasEmpty = function () {
         browser.wait(EC.presenceOf(element(by.tagName('canvas'))), 5000);
         return browser.executeScript(function () {
             var canvas = document.getElementsByTagName('canvas')[0];
             var imgWidth = canvas.width || canvas.naturalWidth;
             var imgHeight = canvas.height || canvas.naturalHeight;
             // true if all pixels Alpha equals to zero
             var ctx = canvas.getContext('2d');
             var imageData = ctx.getImageData(0, 0, imgWidth, imgHeight);
             //alpha channel is the 4th value in the imageData.data array that’s why we are incrementing it by 4
             for (var i = 0; i < imageData.data.length; i += 4) {
                 if (imageData.data[i + 3] !== 0) {
                     return false;
                 }
             }
             return true;
         });
     }:
    

In this function, we are getting the 2d context of the canvas and then we are iterating over the image data to check if alpha channel of all pixel points is greater than zero. Alpha channel is an 8-bit layer in a graphics file format that is used for expressing translucency (transparency), which in turn means that if the value of alpha channel of a pixel is equal to zero, nothing is plotted over that pixel.

If for any pixel the value of alpha channel is greater than zero, this tells us that the canvas is not empty which indeed means clicks are plotted onto the heatmap.

Once we are sure that the canvas is not empty, we can proceed further to check that the clicks are plotted on the canvas at the correct position i.e exactly where we clicked using selenium.

exports.checkCanvasPlotting = function (coordinates) {
    'use strict';
    browser.wait(EC.presenceOf(element(by.tagName('canvas'))), 5000);
    return browser.executeScript(
        function () {
            var coord = arguments[0];
            var canvas = document.getElementsByTagName('canvas')[0];
            // true if all pixels Alpha equals to zero
            var ctx = canvas.getContext('2d');
            if (ctx.getImageData(coord.x, coord.y, 1, 1).data[3] === 0) {
                return false;
            }
            return true;
    }, coordinates);
};

In this function, we are using the same canvas function to get the imageData and then checking that for all the coordinates where clicks were plotted the value of alpha channel is greater than zero.

The above function can be easily called as below:

exports.validateHeatmapPlotting = function (coordinateArray) {
    'use strict';
    for (var i = 0; i < coordinateArray.length; i++) {
        expect(canvasUtils.checkCanvasPlotting(coordinateArray[i])).toBe(true);
    }
};

Conclusion

  • Canvas utility functions and selenium can be easily leveraged in order to verify basic heatmap functionality using automation.
  • These can be easily extended in order to verify number of clicks on element and also to verify plotting intensity.

Hope this post was a good enough reference to help you write end-to-end automation script for heatmap testing. If you have any questions about this, let us know via comments.


This article is inspired from Animating Vue JS by Sarah Drasner at JS Channel 2017.

Problem Statement - Why Animation?

Website UI Development is not about making things beautiful. It’s all about website performance and customer experience. According to studies from Amazon and Walmart, they discovered a drop of conversion rate/revenue on increasing the user interaction time as the user feels interrupted during the interaction. Another study discovered that a customised animated loader made a higher wait time and lower abandon rate compared to generic one as the user felt more interactive with the former loader.

In a nutshell, the animation of your application should be more interactive and engaging for the user, kind of like a cinema booking application and a form inside a location tag for example.

What is VueJS?

For those who are familiar with Angular and ReactJS, VueJS is a progressive JavaScript framework that supports some features:

  • A virtual DOM
  • Declarative Rendering
  • Computed properties
  • Reactive components
  • Conditional rendering … to name a few

Some of these features are quite similar to what Angular and ReactJS already provide. However, you can check its comparison with other frameworks.

Todo List Example

Let’s take a simple example of Todo list, containing a list of tasks with the functionality of adding/removing a task to/from the list.

This will be our view in HTML file, assuming that you’ve included VueJS in a script tag already.

<div id="app">
    <input type="text" v-model="task"/>
    <input type="button" value="Add" v-on:click="addTaskToList"/>
    <ul>
        <li v-for="(todo, index) in todoList">
            {{ todo }}
            <input type="button" value="Remove" v-on:click="removeTaskFromList(index)"/>  
        </li>
    </ul>
</div>

Meanwhile, our JS file looks like this.

var app = new Vue({
    el: '#app',
    data: {
        task: 'my first task',
        todoList : []
    },
    methods : {
        addTaskToList : function(){
            this.todoList.push(this.task);
        },
        removeTaskFromList : function(index){
            this.todoList.splice(index, 1);
        }
    }
});

The code itself is self-explanatory. It simply adds a task inside the todoList using addTaskToList method and removes from the list using removeTaskFromList.

The event binding and loops syntax in the HTML looks similar to what you see in AngularJS. However, the syntax of variables and methods is different in VueJS, which reminds you of private variables and public methods you used to code in C++. You can view the demo.

Let’s add more interaction in this. A confirmation pop-up should appear with OK and Cancel options. Regardless of the option chosen, the pop-up should be closed later on.

In HTML, let’s modify the list element

<li v-for="(todo, index) in todoList">
    {{ todo }}
    <input type="button" value="Remove" v-on:click="onRemoveTask(index)"/>
</li>

And add a new pop-up element

<div v-show="isPopupOpen">
    Are you sure you want to remove this from Todo List?<br/>
    <input type="button" value="OK" v-on:click="confirmRemove()"/>
    <input type="button" value="Cancel" v-on:click="cancelRemove()"/>
</div>

Meanwhile in JS, initialize new data variables inside

data: {
    isPopupOpen : false,
    currentIndex: -1
}

And also, add some methods

methods : {
    onRemoveTask : function(index) {
        this.isPopupOpen = true;
        this.currentIndex = index;
    },
    confirmRemove : function() {
        this.removeTaskFromList(this.currentIndex);
        this.isPopupOpen = false;
    },
    cancelRemove : function() {
        this.isPopupOpen = false;
    }
}

Let’s add some animation into it.

For the fading-in/out the pop-up, you need to wrap our pop-up inside transition tag.

<transition name="fade">
    <div v-show="isPopupOpen">
        … Pop-up element content
    </div>
</transition>

This element takes care of the transition logic. You don’t need to bother when to start or stop transition. All you’ve to mention is what kind of transition you want to see and for how long. This can be done using some CSS classes provided by VueJS.

.fade-enter-active, .fade-leave-active {
    transition: opacity 0.5s ease-out;
}

.fade-enter, .fade-leave-to {
    opacity: 0;
}

Note: The fade prefix used in this class should match the name attribute of the transition component.

For blurring the form and the list elements once the pop-up appears, they should be wrapped inside a contained conditionally bounded using v-bind attribute.

<div v-bind:class="[isPopupOpen ? 'disabled' : '', ‘container’]">
    … Form and Todo List element content
</div>

And add the required CSS

.container {
    transition: all 0.05s ease-out;
}

.disabled {
    filter: blur(2px);
    opacity: 0.4;
    pointer-events: none;  // This makes sure that nothing else is clicked other than pop-up options
}

You can check the complete code and view demo.

Advantages

  • Clean
  • Semantic
  • Maintainable

This is how you can create applications and make animations in more simpler and semantic way. However, you must have intermediate knowledge of HTML, CSS and JavaScript. If you think VueJS is promising, go ahead and try it out. There is much more that you will love to learn about. Check out the official documentation.


SASS is a preprocessor that provides features like variables, nesting, mixins, inheritance and other nifty goodies and makes CSS clean and easy to maintain.

The @extend directive in SASS allows us to easily share styles between selectors. But its usage can have adverse effects when used with bigger projects. Lets see how.

In VWO’s SASS code, we have more than 50 files. The need of inheritance removal came when the code started to become unpredictable and difficult to debug. Difficulty in debugging made us override the CSS as and when new requirement came; otherwise it requires a lot of time to understand existing code of inheritance before starting, so that any new rule addition does not break the existing CSS. That’s how the need of @extend removal came.

Here are the reasons why we discarded @extend.

High maintainability

.title {
	text-transform: uppercase;
	font-size: 11px;
}

label {
	@extend .title;
	font-size: 13px;
}

…and in the end of the file somewhere adding,

.title {
	font-size: 12px;
}

If this file is opened and looked up for the label rules, one would expect it to be of 13px but in reality, it will be of 12px. <label>I will always be 12px</label>

This is because on compilation the result looks like this:

.title , label {
	text-transform: uppercase; 
}

label {
	font-size: 13px; 
}

.title , label {
	font-size: 12px; 
}

label shares the rules at the last definition of .title.

If someone tries to override title and is not aware of the fact that it has been extended in some other class, the person might end up adding some wrong rules unintentionally.

Difficult debugging

It becomes difficult to debug if the project’s CSS is large because you need to keep track of every extended class. If we consider the above example of label and .title, looking at the CSS in browser, it will be difficult for us to figure out the reason of font-size being 12px for label. It requires a lot of time of debug such code, especially if you have multiple SASS files.

Increased file size

After we removed @extend from all our sass files, size got reduced from 164KB => 154KB

Distributed Code

The code for one class should be contained at one place rather than distributed at many places. Classes or Placeholders extended in virtue of maintaining the code actually make it untidy and difficult to understand in case of multiple CSS files or long CSS code. Here’s an example:

.font--13 {
	font-size: 13px;
}

.tile {
	display: inline-block;
	border: 1px solid;
	@extend .font--13;
}

%size--200 {
	width: 200px;
	height: 200px;
}

.tile--200 {
	@extend .tile;
	@extend %size--200;
	font-size: 14px;
}

.circle--200 {
	@extend %size--200;
}

Generated Code:

.font--13, .tile, .tile--200 {
	font-size: 13px;
}

.tile, .tile--200 {
	display: inline-block;
	border: 1px solid;
}

.tile--200, .circle--200 {
	width: 200px;
	height: 200px;
}

.tile--200 {
	font-size: 14px;
}

The generated code is highly unreadable and not at all lucid. This particular code has rules staggered at 4 places just for class .tile–200.

Solution to @extend

We solved these problems with the help of mixins or directly writing the rule if it’s a one liner.

For e.g. in above example: SASS would be

.font--13 {
	font-size: 13px;
}

@mixin tile {
	display: inline-block;
	border: 1px solid;
	font-size: 13px;
}

.tile {
	@include tile;
}

@mixin size--200 {
	width: 200px;
	height: 200px;
}

.tile--200 {
	@include tile;
	@include size--200;
	font-size: 14px;
}

.circle--200 {
	@include size--200;
}

Generated CSS code will be:

.font--13 {
	font-size: 13px;
}

.tile {
	display: inline-block;
	border: 1px solid;
	font-size: 13px;
}

.tile--200 {
	display: inline-block;
	border: 1px solid;
	font-size: 13px;
	width: 200px;
	height: 200px;
	font-size: 14px;
}

.circle--200 {
	width: 200px;
	height: 200px;
}

This code has rules for every class maintained at just one place making it easier to understand and lucid which results in easy debugging and requires low maintenance.

All these reasons forced us to remove @extend from our SASS and hence our code and coders lived happily ever after!

Cheers!


I have been working with Apache Kafka for more than 4 years now and have seen it evolve from a basic distributed commit log service (Something very similar to Transaction log or Operation log) to a full fledged tool for data pipelining and become the backbone of data collection platforms. For those who don’t know about Kafka, it was developed by LinkedIn, and was open sourced in early 2011. It is a distributed pub-sub messaging system that is designed to be fast, scalable and durable. Like other pub-sub messaging systems, Kafka maintains stream(s) of messages in topic(s). Producers are special processors that write data to Topics while, Consumers read from topics, to store data to extract some meaningful information that might be required at a later stage. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes. Kafka lets you store streams of messages in a fault-tolerant way and allows processing these streams in near realtime.

Apache Kafka has gone through various design changes since its inception, Kafka 0.9 came out with support of High Level Consumer API, which helped in removing dependency of Apache Zookeeper. It is now only used to manage metadata of topics created in Kafka. Also, in case some Kafka node goes down or rebalance is triggered due to addition of new nodes, Zookeeper runs the leader election algorithm in a fair and consistent manner. For versions less than 0.9 Apache Zookeeper was also used for managing the offsets of the consumer group. Offset management is the mechanism, which tracks the number of records that have been consumed from a partition of a topic for a particular consumer group. Kafka 0.10 came out with out of the box support for Stream Processing. This streaming platform enables capturing flow of events and changes caused by these events, and store these to other data systems such as RDBMS, key-value stores, or some warehouse depending upon use case. I was really happy and took it for a run by doing some counting aggregations. The aggregation was fast and I hardly had to write 50 lines for it. I was very happy and impressed with results. I streamed around 2 million events in around a minute on my laptop with couple of instances only. But I never got a chance to use it in production for a year or so.

Around 3 months back when our team started stress testing backend stores by generating a lot of data, our backend stores started to give up due to the high number of insertion and updates. We didn’t have the choice to add more hardware as we were already using a lot of resources and wanted a solution that fits our current bill. Our data team had lot of discussions and I heard a lot of people talk about things like Apache Samza, Apache Spark, Apache Flink etc. Because, we have a small team, adding another component in technology stack was not a good idea and I didn’t want team to spend time learning about these technologies with product release around the corner. Since our data pipeline is built around Kafka, I started playing around with data. The idea was to convert multiple updates to the backend stores into a single update/insert to ensure that number of hits that our DB is taking is reduced. Since we process a lot of data, we thought about windowing our events based on time and aggregating them. I started to work on it and in matter of hours my streaming application was ready. We started with 1 minute window and we were surprised with the result. We were able to reduce DB hits by 70%. YES 70 PERCENT!!!!!!

Here are the screenshots from one of our servers that show the impact of window aggregation.

Before Aggregation
After Aggregation

With streaming capabilities built into it, Apache Kafka has become one of the most powerful tool that allows you to store and aggregate data at insane speed. And we’ll see a gain in its adoption in coming years.

Let’s see how Kafka Streams work

Kafka Streams allows us to perform stream processing, hence requires some sort of internal state management. This internal state is managed in state stores which uses RocksDB. A state store can be lost on failure or fault-tolerant restored after the failure. The default implementation used by Kafka Streams DSL is a fault-tolerant state store using

  • An internally created and compacted changelog topic (for fault-tolerance)
  • One (or multiple) RocksDB instances (for cached key-value lookups). Thus, in case of starting/stopping applications and rewinding/reprocessing, this internal data needs to get managed correctly.

KStream and KTable

KStream is an abstraction of a record stream of Key-Value pairs. So if you have a click stream coming in, and you are trying to aggregate session level information, the key will be session id and the other information will be the value. Similarly for URL level aggregation, a combination of URL and session will be the key.

KTable is an abstraction of a changelog stream from a primary-keyed table. Each record in this stream is an update on the primary-keyed table with the record key as the primary key. The aggregation results are stored in KTable. Intermediate aggregation uses a RocksDB instance as key-value state store that also persists to local disk. Flushing to disk happens asynchronously to keep it fast and non blocking. An internal compacted changelog topic is also created. The state store sends changes to the changelog topic in a batch, either when a default batch size has been reached or when the commit interval is reached. A pictorial representation of what happens under the hood is given below

Kafka Streams Internal Functioning
*Image is taken from Apache Kafka documentation

Kafka Streams commit the current processing progress in regular intervals. If a commit is triggered, all state stores need to flush data to disc, i.e., all internal topics needs to get flushed to Kafka. Finally, all current topic offsets are committed to Kafka. In case of failure and restart, the application can resume processing from its last commit point.

Let’s understand this with help of an example

Imagine a stream of such events coming to server for a very high traffic website. Let’s assume there is a big web gaming platform where 50K-80K concurrent users generate about 80K-120K events per second and there is a requirement to find following things:

  • Number of clicks user has done in a session
  • Total Pages he has viewed in a session
  • Total amount of time user has spent in a session.

Let the JSON structure be as follows:

{
  "uuid":"user id",
  "session_id": "some uuid",
  "event": "click/page_view",
  "time_spent":14
}

Ingestion at above mentioned pace in a DB or ensuring that these events gets stored in DB in itself is a challenge. A lot of hardware will be required to cope with this traffic as it is. Hence, it doesn’t make sense to store data directly in DB. A streaming application is a very good fit here. A streaming application is going to leverage the fact that for most of the user the clicks and page views will be concentrated in a time window. So it is possible that in 5 minutes a user might be clicking x times and giving y pageviews on an average. We can introduce a 5 minute window and club these request to form a single equivalent DB request. Hence reducing (x+y) hits to 1 hit in a window of 5 minutes. Thus reducing the traffic to 1/(x+y) of what was coming earlier.

I have written a Sample Kafka Streams Project to make it easier for you to understand. Let’s take a look at sequence diagram below. This diagram shows how various components of sample project interact with each other.

Kafka Streams Sequence Diagram

All this flow is defined with the help of Kafka Streams DSL, the code snippet is given below

//Defining Source Streams from multiple topics.
KStream<String, ClickStream> clickStream = kStreamBuilder.stream(stringSerde, clickStreamSerde,
     Main.TOPIC_PROPERTIES.getProperty("topic.click.input").split(","));

//Kafka Streams DSL in action with filtering and cleaning logic and passing it through aggregation collector
clickStream
     .filter((k,v) -> (v!=null))
     .map((k, v) ->
           new KeyValue<>(v.getSessionId(),v))
     .through(stringSerde, clickStreamSerde, Main.TOPIC_PROPERTIES.getProperty("topic.click.output"))
     .groupBy((k, v) -> k, stringSerde, clickStreamSerde)
     .aggregate(ClickStreamCollector::new, (k, v, clickStreamCollector) -> clickStreamCollector.add(v),
           TimeWindows.of(1 * 60 * 1000), collectorSerde,
           Main.TOPIC_PROPERTIES.getProperty("topic.click.aggregation"))
     .to(windowedSerde, collectorSerde, new ClickStreamPartitioner(), Main.TOPIC_PROPERTIES.getProperty("topic.click.summary"));

It’s worth noting that for each step we need to define a serializer and deserializer. In above code snippet

  • stringSerde: Defines the Serialization and Deserialization for String
  • clickStreamSerde: Defines the Serialization and Deserialization for Raw click Data
  • collectorSerde: Defines the Serialization and Deserialization for RocksDB intermediate storage.
  • windowedSerde: Defines the serialization and Deserialization for Kafka Windowed Aggregation storage

Its very easy to implement streams over Kafka and it can be leveraged to reduce the DB traffic and for other applications, where windowing or sessionization makes sense. You can play around with this project and in case you want to reach out to me or have any doubt please drop your queries in comments section.

Happy Streaming..!