Wednesday, November 27, 2013

Web Analytics - What happens behind the scenes !!

I was always familiar with the saying "DATA is huge and is endless". Slowly I am beginning to realize the truth behind it. After going through 12 weeks of extremely insightful BI course, I was amazed with the breadth and depth of what the course was offering. The topic web analytics is my favorite topic among the various topics being thought in the BI class since I am a web enthusiast since high school days. I was always working with web technology paradigms such as HTML, JSP, JQuery and I loved working with them. I have always been involved in developing websites and I was amazed to learn that there are tools available which helps websites to optimize their traffic.

I was first exposed to the concept of website optimization during this summer. I worked with the Digital Marketing organization of Adobe Systems on a product called Test & Target. It was previously an Omniture product and was later acquired by Adobe in 2009. I was actively involved in automating the simplified Test & Target backend which eventually reduced the time to market for the product. I am sure the goal of most e-commerce companies is to increase the conversion rate along with optimizing the website traffic. The Test & Target tool aims to help companies create multiple campaigns which  help them evaluate how each campaign performs. The evaluation can be performed across the different audiences, browsers, demography etc. This type of website optimization is called A/B testing which is widely used. This helps the companies to get an idea on how each version of the website is performing against different metrics and this helps them to increase revenue.

I am sure everyone who have used this tool to optimize their website would never know what happens behind the scenes of these tools. Since I was involved in automating the back-end of this tool I have a good idea on how these tools work which I will try to explain in the most simplest way possible. All the operations performed by the tool most of the time will have the related Application Programming Interface (API) which takes in the data from the User-Interface and passes the data to the web server. An API is an interface which helps software components to interact with each other efficiently. These API's interact with the server either through XML (SOAP) or in the URL (REST). They carry out most of the HTTP operations such as GET, POST, PUT and DELETE which helps them to interact with the server. The final computed result is again passed back to the API which renders the required output on the user-interface. This output is used by companies to track their conversion rate and determine if their goals are reached.

My work involved automating the API's which carried out multiple tasks such as testing the working of the tool against various possible metrics and dimensions. If carried out manually it would have taken a lot of time which was eliminated because of the automation carried out. I was able to appreciate the the power of the tool more after understanding the importance of web analytics in the BI class. As a part of course requirements, I used the Google analytics which helps in analyzing the website traffic and providing recommendations. This tool is also excellent, seamless and user-friendly. After using two different tools which serve to cater different needs, I am truly amazed with the concept of web analytics and the potential it offers.

Tuesday, November 12, 2013

Network Analysis of Facebook Groups

A very interesting concept which I learnt in the Business Intelligence class was about Network Analysis. The various insights regarding their applications and metrics were extremely interesting. As a part of our curriculum we were required to use a very user-friendly network analysis tool called Gephi. This tool takes the input in the form of a .gdf file and creates a network consisting of nodes and edges. Various measures can be calculated and the network can be visually analyzed. For my homework, I performed the analysis for un-directed and un-weighted network. To gain more insights into the tool and understand the metrics better, I wanted to perform analysis on a directed weighted network. This blog post describes various interesting insights I observed after performing the analysis.

For my analysis, I picked the Eller MIS at the University of Arizona group as it is common to most of us. I extracted the Eller MIS facebook group data using netviz and imported it in gephi. After importing the data, defining the properties and using YifanHu's Multilevel algorithm the below visualization was obtained



where the green color nodes are the users, the red nodes are posts being posted by the group and the blue color nodes are posts being by users of the group.

When we analyze the portion of the network, we can clearly see that the center node has many users posting the relevant post. On a closer examination, we can see that the the node represents the post regarding the US News ranking of the MIS department. This is the most prominent node which has a high in-degree.

Another interesting fact which we can observe it that one user has posted a particular post multiple times (Has a high weight on the edge). On closer examination of the node, it is found that the particular post was regarding the tuition fee of Eller. This shows that this particular user is interested in knowing the exact tuition fee. Another post which is posted more often by a single user is regarding the issues during accessing a particular page of the Eller website. From this we can say that the users use this group as a forum to discuss various questions.
On running the various metrics, the below values were obtained:

We can see that the density of the graph is very less. This clearly indicates that the people in this group are not closely connected to one another. The modularity is high which shows that even if it is not densely populated the individual modules are densely connected. Another interesting fact is the network diameter is just 1. This shows that the network is not very spread out and most users can reach other users with one hop. This is because the group is a small group and the metrics also prove it. This graph also has 158 strongly connected components which indicates that there are a lot of direct routes between two nodes in this network.

From the above analysis, it can be concluded that the Eller MIS group in Facebook has sparsely populated network with the main emphasis being on posts related to the program and also the it serves as a forum to discuss concerns and queries.