We are excited to announce our sponsorship of
The Fifth Elephant - a popular conference around
the Big Data ecosystem. The conference will be held in Bangalore, India
from 23rd to 26th July.
Our engineers will be present at the conference. If you are interested in our
work, want to know more about what we are doing,
want to work with us (we’re hiring), get some
cool goodies or just want to say Hi!, please visit our booth (B7) or catch any
of our team members. We’d love to talk to you!
In November last year, I started developing an infrastructure that would allow us to
collect, store, search and retrieve high volume data. The idea was
to collect all the URLs on which our homegrown CDN
would serve JS content. Based on our current traffic, we were looking to collect some 10k URLs per
second across four major geographic regions where we run our servers.
In the beginning we tried MySQL, Redis, Riak, CouchDB, MongoDB, ElasticSearch but
nothing worked out for us with that kind of high speed writes. We also wanted our
system to respond very quickly, under 40ms between
internal servers on private network. This post talks about how we were able to
make such a system using C++11, RocksDB and Thrift.
First, let me start by sharing the use cases of such a system in VWO; the
following screenshot shows a feature where users can enter a URL to check if VWO
Smart Code was installed on it.
VWO Smart Code checker
The following screenshot shows another feature where users can see a list of URLs
matching a complex wildcard pattern, regex pattern, string rule etc. while
creating a campaign.
VWO URL Matching Helper
several opensource databases but none of them would fit our requirements except
Cassandra. In clustered deployment, reads from Cassandra were too slow and slower
when data size would grew. After understanding how Cassandra worked under the
hood such as its log structured storage like LevelDB I started playing with opensource
embeddable databases that would use similar approach such as LevelDB and Kyoto Cabinet.
At the time, I found an embeddable persistent key-value store
library built on LevelDB called RocksDB.
It was opensourced by Facebook and had a fairly active developer community so I
with it. I read the project wiki,
wrote some working code and joined their Facebook group to ask questions around
prefix lookup. The community was helpful, especially Igor and
Siying who gave me enough hints
around prefix lookup,
using custom extractors
and bloom filters which helped me
write something that actually worked in our production environment for the first time.
Explaining the technology and jargons is out of scope of this post but I would like
to encourage the readers to read
aboutLevelDB and to read the RocksDB wiki.
RocksDB FB Group
For capturing the URLs with peak velocity up to 10k serves/s, I reused our
distributed queue based infrastructure.
For storage, search and retrieval of URLs I wrote a custom datastore service
using C++, RocksDB and Thrift called HarvestDB. Thrift
provided the RPC mechanism
for implementing this system as a distributed service accessible by various
backend sub-systems. The backend sub-systems use client libraries
generated by Thrift compiler for communicating
with the HarvestDB server.
The HarvestDB service implements five remote procedures - ping, get,
put, search and purge. The following Thrift IDL
describes this service.
Clients use ping to check HarvestDB server connectivity before executing
other procedures. RabbitMQ consumers consume collected URLs and put them to
HarvestDB. The PHP based application backend uses custom Thrift based client
library to get (read) and to search URLs.
A Python program runs as a periodic cron job and uses purge procedure to purge old entries
based on timestamp which makes sure we don’t exhaust our storage
resources. The system is in production for more than five months now and is
capable of handling (benchmarked) workload of up to 24k writes/second while consuming
less than 500MB RAM. Our future work will be on replication, sharding and fault
tolerance of this service. The following diagram illustrates this architecture.
e2e or end-to-end or UI testing is a methodology used to test whether the flow of an application is performing as designed from start to finish. In simple words, it is testing of your application from the user endpoint where the whole system is a blackbox with only the UI exposed to the user.
It can become quite an overhead if done manually and if your application has a large number of interactions/pages to test.
In the rest of the article I’ll talk about webdriverJS and Jasmine to automate your e2e testing, a combination which isn’t talked about much on the web.
What is WebDriverJS?
This was something which took me quite sometime to put my head around and I feel this was more or less due to the various available options for everything related to WebDriver.
So let’s take it from the top and see what its all about.
This having support for almost all major browsers, is a very good alternative to automate our tests in the browser.
So whatever you do in the browser while testing your application, like navigating to pages, clicking a button, writing text in input boxes, submitting forms etc, can be automated using Selenium.
WebDriver (or Selenium 2) basically refers to the language bindings and the implementations of the individual browser controlling code.
WebDriver introduces a JSON wire protocol for various language bindings to communicate with the browser controller.
For example, to click an element in the browser, the binding will send POST request on /session/:sessionId/element/:id/click
So, at one end there is the language binding and a server, known as Selenium server, on the other. Both communicate using the JSON wire protocol.
So as you might guess, WebDriverJS is simply a wrapper over the JSON wire protocol exposing high level functions to make our life easy.
Now if you search webdriver JS on the web, you’ll come across 2 different bindings namely selenium-webdriver and webdriverjs (yeah, lots of driver), both available as node modules. You can use anyone you like, though we’ll stick to the official one i.e. selenium-webdriver.
Done! You can now require the package and with a lil’ configuration you can open any webpage in the browser:
To run your test file, all you do is:
Note: In addition to the npm package, you will need to download the WebDriver implementations you wish to utilize. As of 2.34.0, selenium-webdriver natively supports the ChromeDriver. Simply download a copy and make sure it can be found on your PATH. The other drivers (e.g. Firefox, Internet Explorer, and Safari), still require the standalone Selenium server.
Difference from other language bindings
WebDriverJS has an important difference from other bindings in any other language - It is asynchronous.
So if you had done the following in python:
But it doesn’t stop here. Even with promises, the above code would have become:
Do you smell callback hell in there? To make it more neat, WebDriverJS has a wrapper for Promise called as ControlFlow.
In simple words, this is how ControlFlow prevents callback hell:
It maintains a list of schedule actions.
The exposed functions in WebDriverJS do not actually do their stuff, instead they just push the required action into the above mentioned list.
ControlFlow puts every new entry in the then callback of the last entry of the list, thus ensuring the sequence between them.
And so, it enables us to simply do:
Isn’t that awesome!
Controlflow also provides an execute function to push your custom function inside the execution list and the return value of that function is used to resolve/reject that particular execution. So you can use promises and do any asynchronous thing in your custom code:
Combining WebDriverJS with Jasmine
Our browser automation is setup with selenium. Now we need a testing framework to handle our tests. That is where Jasmine comes in.
If we were to convert our earlier testfile.js to check for correct page title, here is what it might look like:
Now the above file needs to be run with jasmine-node, like so:
This will fire the browser and do the mentioned operations, but you’ll notice that Jasmine won’t give any results for the test. Why?
Well…that happens because Jasmine has finished executing and no expect statement ever executed because of the expectation being inside an asynchronous callback of getTitle function.
To solve such asynchronicity in our tests, jasmine-node provides a way to tell that a particular it block is asynchronous. It is done by accepting a done callback in the specification (it function) which makes Jasmine wait for the done() to be executed. So here is how we fix the above code:
Quick tip: You might want to tweak the time allowed for tests to complete in Jasmine like so:
Bonus for Angular apps
Angular framework has been very testing focused since the very beginning. Needless to say, they have devoted a lot of time on e2e testing as well.
Protractor is a library by the Angular team which is a wrapper on WebDriverJS and Jasmine and is specifically tailored to make testing of Angular apps a breeze.
Checkout some of the neat addons it gives you:
Apart from querying element based on id, css selector, xpath etc, it lets you query on basis of binding, model, repeater etc. Sweet!
It has Jasmine’s expect function patched to accept promises. So, for example, in our previous test where we were checking for title:
can be refactored to a much cleaner:
And more such cool stuff to make end-to-end testing for Angular apps super-easy.
In the end
e2e testing is important for the apps being written today and hence it becomes important for it to be automated and at the same time fun and easy to perform. There are numerous tools available for you to choose and this article talks about one such tool combination.
Hope this helps you get started. So what are you waiting for, lets write some end-to-end tests!
Using an e2e testing stack you want to share? Let us know in the comments.
Our home-grown geo-distributed architecture
latencies possible. Using the same architecture we do data acquisition as well.
Over the years we’ve done a lot of changes to our backend, this post talks
about some scaling and reliability aspects and our recent work on making fast and
reliable data acquisition system using message queues which is in production for
about three months now. I’ll start by giving some background on our previous
Web beacons are widely used to do data
acquisition, the idea is to have a webpage send us data using an HTTP request
and the server sends some valid object. There are many ways to do this. To keep
the size of the returned object small, for every HTTP request we
return a tiny 1x1 pixel gif image and our geo-distributed architecture along with
our managed Anycast DNS service helps us to do this with very low latencies,
we aim for less than 40ms. When an HTTP request hits one of our data acquisition servers, OpenResty
handles it and our Lua based code processes the request in the same process thread.
OpenResty is a nginx mod which among many things bundles luajit that allows
us to write URL handlers in Lua and the code runs within the web server. Our Lua code
does some quick checks, transformations and writes the data to a Redis
server which is used as fast in-memory data sink. The data stored in Redis is
later moved, processed and stored in our database servers.
This was the architecture when I had joined
Wingify couple of months ago. Things were going smooth but the problem was we were
not quite sure about data accuracy and scalability. We used Redis as a fast
in-memory data storage sink, which our custom written PHP based queue infrastructure
would read from, our backend would process it and write to our database servers.
The PHP code was not scalable and after about a week of hacking, exploring options
we found few bottlenecks and decided to re-do the backend queue infrastructure.
We explored many options and decided to use RabbitMQ.
We wrote a few proof-of-concept backend programs in Go, Python and PHP and
did a lot of testing, benchmarking and real-world load testing.
Ankit, Sparsh and I discussed how we should move forward and we finally
decided to explore two models in which we would replace the home-grown PHP queue
system with RabbitMQ. In the first model, we wrote directly to RabbitMQ from the
Lua code. In the second model, we wrote a transport agent which moved data from Redis
to RabbitMQ. And we wrote RabbitMQ consumers in both cases.
There was no Lua-resty library for RabbitMQ, so I wrote one using cosocket APIs
which could publish messages to a RabbitMQ broker over STOMP protocol. The library
opensourced for the hacker community.
Later, I rewrote our Lua handler code using this library and ran a loader.io
load test. It failed this model due to very low throughput,
we performed a load test on a small 1G DigitalOcean instance for both models.
For us, the STOMP protocol
and slow RabbitMQ STOMP adapter were performance bottlenecks. RabbitMQ was not
as fast as Redis, so we decided to keep it and work on the second
model. For our requirements, we wrote a proof-of-concept Redis to RabbitMQ transport
agent called agentredrabbit to leverage Redis as a fast in-memory storage sink and
use RabbitMQ as a reliable broker. The POC worked well in terms of performance,
throughput, scalability and failover. In next few weeks we were able to write a
production level queue based pipeline for our data acquisition system.
For about a month, we ran the new pipeline in production against the existing one,
to A/B test our backend :) To do that we modified our Lua code to write to two
different Redis lists, the original list was consumed by the existing pipeline, the other was
consumed by the new RabbitMQ based pipeline. The consumer would process and write
data to a new database. This allowed us to compare realtime data from the two
pipelines. During this period we tweaked our implementation a lot, rewrote the
producers and consumers thrice and had two major phases of refactoring.
A/B testing of existing and new architecture
Based on results against a 1G DigitalOcean instance like
for the first model and against the A/B comparison of existing pipeline in realtime,
we migrated to the new pipeline based on RabbitMQ. Other issues of HA,
redundancy and failover were addressed in this migration as well.
The new architecture ensures no single point of failure and has mechanisms to
recover from failure and fault.
Queue (RabbitMQ) based architecture in production
We’ve opensourced agentredrabbit
which can be used as a general purpose fast and reliable transport agent for
moving data in chunks from Redis lists to RabbitMQ with some assumptions and queue
name conventions. The flow diagram below has hints on how it works, checkout the
README for details.
When I got an opportunity of interning with the engineering team at Wingify it made me ecstatic because of an exciting office with fascinating transparent walls full of geeky stuff, I came across on my first visit for an interview — and of course Wingify is a becoming a buzz word in IT industry.
On my first day I was a bit nervous, dressed and prepared as I believed anyone working from 10:00 am to 7:00 pm would. When I reached the office only the office boy was present — honestly speaking I had a feeling that I am at a wrong place because there was no way a software company should look like at 10:30 in the morning on a working day. After a while I was surrounded by people in shorts, denims, t-shirts with smiling faces having friendly chats.
Working at Wingify provided me with an entirely new set of skills like software development design patterns and maintenance that is going to be invaluable for my future. My work here mainly included front-end optimization and internationalization.
Worked on template based engine for the translation of web pages in different languages.
I worked along with the marketing team, and this added an additional dimension to my work by interdependent relationship. I also spent my time researching and learning different methods and technologies for various things such as process automation. All these roles and responsibilities taught me to manage time, being attentive and organized, and enhanced problem-solving abilities.
At Wingify you have the solidarity and independence of your own space and an atmosphere where interns like myself would not hesitate to ask questions as they are answered and explained by highly skilled and dedicated engineering team sitting next to you, which makes it easy to get work done. Awesome appreciation mails boost you up with confidence. Personally, I couldn’t have imagined a better internship experience.
Interning with Wingify provides you with a wonderful learning experience. In a nutshell, it is a great place to work and party \m/