About PyData

I recently got an opportunity to speak at the PyData, Delhi. PyData is a tech group, with chapters in New Delhi and other regions, where Python enthusiasts share their ideas and projects related to Data Analysis and Machine Learning.

Talks at PyData

There were three talks at PyData, namely Machine Learning using Tensor Flow, Data Layer at Wingify and mine, Learning Data Analysis by Scraping Websites. All the talks were thorough and excellent! In the talk, Data layer at Wingify by Manish Gill 🤓, he talked about how we handle millions of requests at Wingify.

Some of Images of the PyData Meetup Hosted by Wingify.

Background About My Talk

Let me give you a little background. It was the Friday before the PyData Meetup/Conference. Our engineering team was doing its daily tasks. I had just grabbed a coffee to alleviate my laziness. Suddenly, our engineering lead came and asked us whether anyone could present on a topic at the PyData that we were to organise the very next day. An initial speaker, who had confirmed earlier, backed out at the last moment because he had fallen sick. I could see that most of the team members tried to avoid volunteering in such a short notice and also probably because the next day was a Saturday (though this is my personal opinion). But I had something different on my mind and during this planning or confusion, I volunteered for it 🤓. I had a project that I had done, back when I was learning Python. So I offered to present it. He agreed to it and asked me to keep the presentation ready.

Preparing the Project & Slides

That Friday night, I started searching for the old files which I had used. Finally, I found all of them on my website, downloaded them and ran the code. It worked like a charm 😍. Yeah! I quickly created the slides around it, and after finishing, smiled and went to sleep at 4.30 am.

Little About the Basics of My Talk.

The presentation that I gave was on Learning Data Analysis by Scraping Websites. During my college days, we heavily used the BeautifulSoup Library in Python to scrape websites for the many personal projects. During this project, I got the idea to scrape data from the websites which aggregated movies related data. By doing that, I thought that I could create a list of all movies that I must definitely watch. The movies had to satisfy the following criteria:

  1. Release date >= 2000
  2. Rating > 8

It was not the best idea at that time to scrape websites and then analyse(Data frame). But I learned a lot of things by scraping data from the website using Beautifulsoup, then analyzing data using Pandas, visualizing data using MatplotLib (a Python library) and finally coming to conclusion about my movies recommendation.

Coming back to the objective - Finding and sorting the movies released between 2000-2017 in the order of relevance (I didn’t want to watch movies < 2000). Below is the code to scrape IMDB for movies data from 2000-2017.

from bs4 import BeautifulSoup
import urllib2
def main():
    print("** ======  Data Extracting Lib -- by Promode  ===== **")
    testUrl = "http://www.imdb.com/search/title?at=0&count=100&\
    groups=top_1000&release_date=2000,2017&sort=moviemeter"
    pageSource = urllib2.urlopen(testUrl).read()
    soupPKG = BeautifulSoup(pageSource, 'lxml')
    titles = soupPKG.findAll("div",class_='lister-item mode-advanced')
    mymovieslist = []
    mymovies = {}
    for t in titles:
        mymovies = {}
        mymovies['name'] = t.findAll("a")[1].text
        mymovies['year'] = str(t.find("span", "lister-item-year").text).replace('','')
        mymovies['rating'] = float(str(t.find("span", "rating-rating").text)\
        .replace('','')[0:-3])
        mymovies['runtime'] = t.find("span", "runtime").text
        mymovieslist.append(mymovies)
    print mymovieslist
if __name__=="__main__":
    main()

Click here to have a look at the full source code.

You can see the trends like Maximum Rating - Sorted by Rating, Year Vs Rating Trend

DataFrame - Rating is Set as Index


Maximum Rating - Sorted by Rating


Year Vs Rating Trend

Take away from the Talk

With this method, you would have winner’s data from the data set. For example, suppose you want to create a Cricket Team(IPLT20) which has the maximum probability to win the match, what you can do is parse the IPLT20) website for last 5 years’ data and select the top 5 batsmen and 6 bowlers 😎.

Conclusion

I totally understand that this may not be the best project for the data analysis. I am still learning and I showed what I had done. I believe that it served my purpose.

I will be doing more research on data analysis in Python. Thanks for reading this. Below is my talk slides:

Slides:

Slides :-


Introduction

This article will deal with the issues we face with the current API architecture (mostly REST) and why demand-driven APIs seem a perfect replacement for it. We will also talk in brief, about GraphQL and how it is a feasible solution for implementing demand-driven applications.

Note: This article is inspired from Demand driven Applications with GraphQL by Vinci Rufus at JS Channel 2017.

Why Demand-driven API? What’s wrong with REST?

Let’s take a simple example of author & articles. If we are given a requirement to develop an API to fetch authors or articles, it will most probably go like this, if we follow REST:

  • GET /authors/:authorId
  • GET /articles/:articleId

Let’s taken an example where we have to show an article snippet on my website’s dashboard. We would need its title, description & author name. So we hit the latter end point and it will give a response like:

{
  title: 'Demand Driven APIs Using GraphQL',
  createdAt: '2017-04-25',
  updatedAt: '2017-08-25',
  articleId: '96',
  authorId: 50,
  status: 'published',
  description: 'Lorem Ipsum...'
}

There are two problems with this response:

1) Extra information: We only needed the title & description but we got everything related to the article and we cannot get rid of this extra payload as this extra information might be getting consumed at some other page i.e. Edit Article Page.

2) Missing information: We were expecting author name but instead we got authorId. This is bad and to solve this we would probably be making another network call on the former end point to get the author name. It’s an overhead making 2 network calls just to fetch 3 parameters, don’t you think? Also, it will just get more complex as we include more resources i.e. comments, images etc.

How Demand-driven Applications Work?

Now that we understand few issues with REST based APIs, we need a smart system which can give me the exact information required instead of giving me partial/extra information.This can be solved if the client demands what it actually needs and server gives it only that piece of information. This can be done using GraphQL.

Let’s try to solve our problem using GraphQL. The exact information that our client need can be represented in GraphQL as:

{
  article (id: articleId)
  {
    title,
    description,
    author {
      name
    }
  }
}

The server can have a single end point with the following schema:

type Article(id: Integer) {
  title: String,
  description: String,
  status: String,
  createdAt: Date,
  updatedAt: Date,
  status: String,
  author: Author
}

type Author(id: Integer) {
  name: String,
  email: String,
  photo: Picture,
  followers: [User]
}

type Picture(id: Integer) {
  imgPath: String,
  imgHeight: Integer,
  imgWidth: Integer
}

And each field in our schema can have a function to fetch that piece of information. In our case:

  function Article(id) {
    return Article.find(id);
  }

  function Article_title(article) {
    return article.title;
  }

  function Article_description(article) {
    return article.description;
  }

  function Article_author(article) {
    return article.author;
  }

  function Author_name(author) {
    return author.name;
  }

On querying the data, we get i.e.

curl -XGET http://myapp/articles -d "query={
  article(id: 1) {
    title,
    description,
    author {
      name
    }
  }
}"

We will get like this:

{
  title: 'Demand Driven APIs Using GraphQL',
  description: 'Lorem Ipsum...',
  author: {
    name: 'Sahil Batla'
  }
}

This is what we needed, now we can keep the endpoint same and tweak with fields required to display relevant information at any page of our website.

Advantages of Demand-driven APIs

1) Single end point for serving any piece of information.

2) Less payload of data as no extra information is served.

3) Versioning of APIs become simpler as we can control the exact information required.

Disadvantages of Demand-driven APIs

1) Latency may increase due to a single end point handling all the querying of data.

2) No lazy loading possible as it’s a single call which will contain all the data.

Try it Out

If you think GraphQL is promising go ahead and try it out. There is much more to it that you will love to learn. Check out its official documentation. It has been implemented in all the well known languages and you can find it all here.


Have you ever seen a bunch of geeks lock themselves up in a room, hacking throughout the day? This was witnessed when Wingify had its very first Capture The Flag battle.

Capture the Flag (CTF) is a special kind of information security competition which provides a safe and legal way to try your hand at hacking challenges. We have learned a lot of computer science and security concepts in classes, and by reading articles. But participating in a CTF actually teaches how to break into things when they are not implemented properly, which happens all the time in the real world. In this, all you need to do is to find a flag which is a proof that you solved the puzzle, and submitting it to the platform earns your team points. Flags are typically chosen to look very distinctive, so that when you see one, you’ll know that it’s a flag, and that you’ve solved the puzzle. For example, flag{congr4tz_y0u_found_1t}.

Preparation

Sometime back, Facebook open-sourced a platform to host Jeopardy styled CTF competitions and we couldn’t resist ourselves from using it. It’s simply amazing and sleek. It took around 2-3 weeks to prepare for the event and we had fun brainstorming creating the problem set. Creating the problems required thinking of some real world scenarios from the field of software development and security and combine them with references like Mr. Robot, Snowden, etc. A few ideas were taken from prior experience participating in online CTFs and Wargames.

Event

Wingify CTF was an internal event and very first of its type. Bonus points were offered for teaming up with someone from a non-engineering role. We saw some great participation from the customer support, customer success & marketing teams as well. To bring everyone on the same page, participants were asked to register for the event by solving a teaser. And the teaser was to find a flag in a registration form. You’d be surprised to hear that the form was made using Google Forms 😮.

It was an 8-hour long online event which had 45 participants among 16 teams. There was a total of 12 challenges ranging between 40 and 400 based on the difficulty level with total available 1840 points. The set of challenges included problems in web application security and forensics. There was another teaser to be solved before starting off the real game. Early in the CTF, everyone was doing pretty well especially team Matrix and Hunters. In half of the time, quite a good number of hackers were already done with all the problems except the two most difficult ones. When the team Rootcon and Hustlers solved the challenge worth 400 points, they were the clear winners on everyone’s mind. But as they say, it’s not over till it’s over. At the last-minute when team RSS captured that big flag and stood the first place, it was the same feeling like a dramatic last-minute goal in Football.👏

Challenges

I’d like to mention some of the interesting challenges.

  1. XSS - When we talk about Frontend security, cross-site scripting is the first vulnerability that comes to everyone’s mind. One of the challenges was to detect an XSS vulnerability and exploit it by stealing the cookies. The key challenge while creating this problem was using PhantomJS, a headless WebKit, to check whether the XSS payload got successfully triggered. shell_exec('phantomjs fake-browser.js --url' . $url . ' --password ' . getenv('FLAG'));

  2. S3 Secrets/Credentials - This problem was based on the fact that the credentials, such as Amazon S3 keys, Github tokens, and passwords, are often included in published GitHub repositories. Once you have put sensitive data in a Git repository, it is going to stay in the repo’s history forever (there are ways to avoid this).

  3. Encryption - One of my personal favorites was the problem requiring teams to calculate the MD5 of a given string. Sounds pretty straight, right? The challenge is right here in front of you. Can you capture the flag and send it to [email protected]? 😊

Winners

  1. Team RSS - Rachit Gulati, Sahil Batla, and Sandeep Singh

  2. Team ROOTCON - Gaurav Nanda, Aakansh Gulati, and Ankita Gupta

  3. Team HUSTLERS - Rahul Kumar, Arun Sori, and Dinkar Pundir

Each participant from the top two teams was given Yubikey and Bluetooth Speaker respectively.

Chhavi and I were able to pull off the event successfully. It turned out to be great and everyone had fun hacking together. I would highly recommend doing something like this for your organization. This will surely increase the breadth of security knowledge.

Mini CTF (External)

Last week, Wingify hosted a PyData Meetup and attendees played a quick round of CTF. You can find the pictures below.

If you would like to practice for such events, you should definitely participate in the online CTFs. You can find the list of long-running CTFs. And if you like playing CTFs, we are hiring for Security Engineer position 😍 🙂.


I am a frontend developer at Wingify and I am building a really awesome product, PushCrew. Last month, we had a hackathon. The idea was to ‘Solve Daily Problems’, interesting right? 😃

I am an avid reader and I read a lot of stuff on the web, but I often find myself copying parts of different articles and pasting in my notepad. I always thought that it would be a great idea to have all my summaries at single place. I wanted a platform that could show all the highlighted parts of the articles that I have liked without me having to juggle between different tabs. So instead of waiting for an app like this to be built, I went ahead and created a micro bookmarker at the hackathon.

My idea was simple, and I knew that I could build it alone. So I was a one-person team (Obviously me! 😛 ).

The idea was not just to build, but also to learn something new because that’s the whole purpose of attending a hackathon, right? Since I had never built a Chrome extension before, I started reading about how to build an extension and took some guidance from our in-house frontend God, a.k.a chinchang 😛 . I devoted some good chunk of time to decide my strategy for building the product.

So, after spending an entire night on coke and pizzas, I was able to build a beautiful extension which was working, and solving, at least my problem of highlighting parts of articles that I liked on the web. I really hope it helps a lot of people (read: readers) as well.

Download this awesome application now 🤘 .

Here are a few glimpses of my hack.

Sum It Up demo
Another Sum It Up demo

Prohibited content (Only for geeks):

As soon as the user selects some text on the page (HTML) and right clicks on it, (s)he is shown an option to ‘Save to Sum it up’ in the context menu. On clicking the option, Sum It Up saves the highlighted data (color, text, DOM node, page URL, timestamp etc.) in the JSON format to the local storage (so no breachment of privacy) inside the Chrome browser. The main challenge was to maintain the highlighter for the partial DOM selection which I have solved by putting the custom span tag to all the elements which reside in that selected area.

Some features that you might find useful are:

  1. (High) light it up.
  2. Collect your notes.
  3. Email them.
  4. Searching made easy.
  5. Tweet your note.
  6. Are you a markdown lover? Yes you can export in markdown too.
  7. Directly jump to the micro section of the website.

Sum It Up got featured on Product Hunt too! Yippee :) (My very first submission on Product Hunt and that too got featured, it’s like Diwali Bonus 😀 )

PS: This is my first blog post so please be kind to me. I am open to any feedback 😀


Recently, Wingify had organised a 24-hour Internal Hackathon where the developers from Wingify created a lot of awesome projects for daily use. The theme was “Solve Daily Problems”. Be it a generic problem or an internal team problem, hackers from Wingify tried to solve many problems over the night. So, Pramod Dutta and I created a Google Chrome extension “VWO X-Ray” (one of the winners), which has proved to be helpful to our internal team.

VWO X-Ray was created to easily debug the VWO smart code on a website. Whether it’s a developer or a Customer Happiness Engineer or a client, they need some basic information about VWO running on a particular page. This Google Chrome extension enables the user to view the account ID, the running VWO campaigns and the cookies created by VWO on that page. The basic features of the extension are:

  1. View account ID on the page and impersonate into it directly.
  2. The Home Tab will show all campaigns on the page and their information like whether campaigns are running, segmentations passed etc.
  3. Directly open a specific campaign, with a single click, into the VWO app.
  4. Directly copy the “Share report link” of the campaign and share it with anyone.
  5. View VWO cookies’ information in a detailed and clear view.
  6. Notification feature when any campaign variation is applied on the page or any goal has been triggered.
  7. The Full Data Tab will give you a glimpse of the app dashboard. You can change the account ID to get any other account’s data.
  8. The Session Data Tab will show current session’s information (Track and Analyse), various campaigns’ data and goals’ data (which ones have been triggered and which ones have not been).
  9. The Impersonate Tab will enable you to impersonate into any account and campaign directly. Just enter the account ID and campaign ID(optional).
  10. This extension, by default, makes 100% sampling rate for Track and Analyse campaigns (most wanted feature for our QA team and Customer Happiness Engineers team).

Here are some screenshots of the VWO X-Ray extension running on our vwo.com website:

The various campaigns running on the page and their statuses


A clear view of the session data information

We will also be shortly releasing this to our clients, so that they too can get basic information just by using the extension.

Here is the demo of VWO X-Ray: