Tuesday, December 17, 2013

My learning from the MIS 587 course

Finally the semester has come to an end with our last exam and project report. Its unbelievable that 4 months swept away so quickly. But in this four months I have learnt a lot both theoretically as well as practically. I learnt a lot on the first day of class itself on how data has become such an integral part of every business. I was always confused on what exactly is Big data which got cleared completely. Every day of class I appreciated different concepts and learnt the practical use of it. The data warehousing, difference between OLTP/OLAP, ETL processing, star schema were extremely new concepts to me. The application of all these concepts on a problem (HW-1) and implementing dashboards using OBIEE and Microstrategy was truly a great learning experience. The Network analysis on twitter data was also another learning in different latitude and longitude. I plan to utilize all the knowledge and experience in the future to solve business problems using technology. 

Wednesday, November 27, 2013

Web Analytics - What happens behind the scenes !!

I was always familiar with the saying "DATA is huge and is endless". Slowly I am beginning to realize the truth behind it. After going through 12 weeks of extremely insightful BI course, I was amazed with the breadth and depth of what the course was offering. The topic web analytics is my favorite topic among the various topics being thought in the BI class since I am a web enthusiast since high school days. I was always working with web technology paradigms such as HTML, JSP, JQuery and I loved working with them. I have always been involved in developing websites and I was amazed to learn that there are tools available which helps websites to optimize their traffic.

I was first exposed to the concept of website optimization during this summer. I worked with the Digital Marketing organization of Adobe Systems on a product called Test & Target. It was previously an Omniture product and was later acquired by Adobe in 2009. I was actively involved in automating the simplified Test & Target backend which eventually reduced the time to market for the product. I am sure the goal of most e-commerce companies is to increase the conversion rate along with optimizing the website traffic. The Test & Target tool aims to help companies create multiple campaigns which  help them evaluate how each campaign performs. The evaluation can be performed across the different audiences, browsers, demography etc. This type of website optimization is called A/B testing which is widely used. This helps the companies to get an idea on how each version of the website is performing against different metrics and this helps them to increase revenue.

I am sure everyone who have used this tool to optimize their website would never know what happens behind the scenes of these tools. Since I was involved in automating the back-end of this tool I have a good idea on how these tools work which I will try to explain in the most simplest way possible. All the operations performed by the tool most of the time will have the related Application Programming Interface (API) which takes in the data from the User-Interface and passes the data to the web server. An API is an interface which helps software components to interact with each other efficiently. These API's interact with the server either through XML (SOAP) or in the URL (REST). They carry out most of the HTTP operations such as GET, POST, PUT and DELETE which helps them to interact with the server. The final computed result is again passed back to the API which renders the required output on the user-interface. This output is used by companies to track their conversion rate and determine if their goals are reached.

My work involved automating the API's which carried out multiple tasks such as testing the working of the tool against various possible metrics and dimensions. If carried out manually it would have taken a lot of time which was eliminated because of the automation carried out. I was able to appreciate the the power of the tool more after understanding the importance of web analytics in the BI class. As a part of course requirements, I used the Google analytics which helps in analyzing the website traffic and providing recommendations. This tool is also excellent, seamless and user-friendly. After using two different tools which serve to cater different needs, I am truly amazed with the concept of web analytics and the potential it offers.

Tuesday, November 12, 2013

Network Analysis of Facebook Groups

A very interesting concept which I learnt in the Business Intelligence class was about Network Analysis. The various insights regarding their applications and metrics were extremely interesting. As a part of our curriculum we were required to use a very user-friendly network analysis tool called Gephi. This tool takes the input in the form of a .gdf file and creates a network consisting of nodes and edges. Various measures can be calculated and the network can be visually analyzed. For my homework, I performed the analysis for un-directed and un-weighted network. To gain more insights into the tool and understand the metrics better, I wanted to perform analysis on a directed weighted network. This blog post describes various interesting insights I observed after performing the analysis.

For my analysis, I picked the Eller MIS at the University of Arizona group as it is common to most of us. I extracted the Eller MIS facebook group data using netviz and imported it in gephi. After importing the data, defining the properties and using YifanHu's Multilevel algorithm the below visualization was obtained



where the green color nodes are the users, the red nodes are posts being posted by the group and the blue color nodes are posts being by users of the group.

When we analyze the portion of the network, we can clearly see that the center node has many users posting the relevant post. On a closer examination, we can see that the the node represents the post regarding the US News ranking of the MIS department. This is the most prominent node which has a high in-degree.

Another interesting fact which we can observe it that one user has posted a particular post multiple times (Has a high weight on the edge). On closer examination of the node, it is found that the particular post was regarding the tuition fee of Eller. This shows that this particular user is interested in knowing the exact tuition fee. Another post which is posted more often by a single user is regarding the issues during accessing a particular page of the Eller website. From this we can say that the users use this group as a forum to discuss various questions.
On running the various metrics, the below values were obtained:

We can see that the density of the graph is very less. This clearly indicates that the people in this group are not closely connected to one another. The modularity is high which shows that even if it is not densely populated the individual modules are densely connected. Another interesting fact is the network diameter is just 1. This shows that the network is not very spread out and most users can reach other users with one hop. This is because the group is a small group and the metrics also prove it. This graph also has 158 strongly connected components which indicates that there are a lot of direct routes between two nodes in this network.

From the above analysis, it can be concluded that the Eller MIS group in Facebook has sparsely populated network with the main emphasis being on posts related to the program and also the it serves as a forum to discuss concerns and queries. 

Monday, October 14, 2013

INSITE Big Data Symposium - Wonderful Learning Experience !!

I can't believe that I am already 8 weeks into the semester. I must say time files very fast. Business Intelligence course has been a really good learning experience with engaging lectures by Dr. Sudha Ram. The mention of Big Data symposium was made on the first day of class itself. I was looking forward to it since then and the day (October 10th 2013) had finally arrived. It was a nice and bright Thursday morning. I reached the UofA student Union - south ballroom fifteen minutes before 8 AM. Before entering,  I was handed over my name badge and the agenda for the day. It was a really long agenda with several talented and erudite speakers from big companies such as IBM, SAP, Macy's etc. I was eagerly looking forward to their sessions.



The sessions started with Opening remarks by our Professor Dr. Sudha Ram where the speakers were introduced and the major goal of the symposium was addressed. The first session was by Brain Gentile where he spoke about the rise of Big Data, the myths regarding Big Data and Big Data Transformation. Certain concepts introduced such as 4th V or veracity with respect to data were extremely interesting and intriguing. The wide range of applications which JasperSoft offers was also another nice learning. The second session was also yet another interesting session by Brenda Dietrich from IBM where the term analytics was expressed seamlessly. After a short break, session on how Big Data is used to make better business decisions was taken. The speakers were Darren Stoll and Kerem Tomak from Macy's. Several novel concepts such as Big Data Ecosystem layers were discussed in detail and how they can be used to make better business decisions. The concept where Analytics is the pillar of business was emphasized by giving real-time examples. Finally, Tim Hood from SAP presented on the SAP HANA tool and its uses before the lunch break. The entire morning was a fruitful one where I learnt a lot of novel and intriguing concepts. I was looking forward for more insightful presentations.

The afternoon sessions were equally interesting and useful. Extremely important applications of Big Data such as how Big Data is used to solve security concerns were discussed by David Cowart. These concepts and ideas were new to me and it was a very good learning experience. After security, applications of Big Data in Healthcare industry were discussed. Healthcare has always been my passion and it was amazing to learn how Big Data was used to predict patterns and detect outliers from clusters. After security and Healthcare, applications of Big Data in dynamic pricing of tickets were discussed by Zaheer Benjamin. Finally the sessions ended with a extremely flawless presentation by our professor Dr. Sudha Ram who discussed Big Data research being performed in the university using smart card. The Visuals created were extremely appalling and inspiring. The day ended with a very good closing note and it was truly a extremely informative session.

Lessons learnt from the symposium

  • Big Data is not structured or unstructured. Big Data can have more than one type of data which consists of structured, semi-structured and un-structured data. This can be referred to Multi-structured data
  • Big Data is more than data from Social Media. The classic factors of production entitled land, labor and capital. But today's world, Time and Speed constitutes a major portion
  • Big Data is undergoing a series of transformations where the focus is moving towards predictive analysis, 100% of the users being data users, data being controlled by systems where concerns regarding privacy may arise and cost will be extremely low
  • The fact of external data being more prevalent in today's world the concept of internal data does not cease to exist. It has its own importance
  • Data to be used to take business decisions should be real-time or the time taken to execute the data should be minimized
  • The major reason for the emergence of Big Data is lower costs with increased efficiency
  • Data is not useful unless there is a clear goal on how the data should be utilized. The concept of using Big Data against a framework where the customer focus is achieved is encouraged
  • Big Data serves as a Glue different parts of the organization which enables to perform better execution
The symposium was indeed a great learning experience. I would like to thank my professor Dr. Sudha Ram for organizing this symposium and imparting great knowledge of  today's view of Big Data. 


Wednesday, September 25, 2013

Applications of BigData in Health Care Industry

In my previous post, I discussed about Big Data and how it is being used in today’s world. In this post, I will discuss more on how Big Data is being used in a particular industry such as healthcare. These days terms such as NoSQL, MongoDB, CouchDB are being very frequently used. Have many of us wondered why they are being used or what are its benefits? As discussed in my last post, data from external sources such as social media data is becoming prominent rather than the internal data which companies used to rely on 10 years back. It is a challenging task to restructure the database schema and data warehouse to fit the external data. It is not a feasible option as well since external data comes from various sources and they do not have any specific format. How can this data be tracked and stored to analyze results? To store these kind of data, we require a non-relational database.


The terms MongoDB, CouchDb are types of non-relational database which does not have the relational structure. Healthcare is a major industry which uses BigData and the necessary applications to track and analyze patient records. While many companies use Data warehouse and star schemas to perform prediction and reporting, there are few who use the traditional NoSql databases. One example to illustrate this point is the use of MUMPS database in IT Healthcare companies such as Epic Systems. 

MUMPS database is a traditional database which was used back in 1950. Its prevalence was lost because of the invention of SQL and RDBMS. But recently since external data is increasing these traditional sources have gained their importance. It is a hierarchical database unlike the relational database. It is very useful in the case of healthcare industry since it helps them to maintain the data efficiently without placing a constraint that the database should be in 3NF. To explain this point more clearly, if the hospital wants to track the behavior of patients along with their frequency and cause of visits. Using the relational schema, we could have a relationship between patient and visit. Each visit is related to a procedure. It is possible that a patient can have more than one visits and each visit can have multiple procedures. Maintaing these in a relational database will require the tables to be in the 3rd Normal form. Because of this constraint, it becomes difficult to maintain multiple details such as phone numbers of the patients. Creating separate attributes for the phone numbers can solve the issue but it becomes a cumbersome task maintaining the NULL values for phone numbers (patients who do not have multiple phone numbers). Creating a hierarchical database can be a better option since it does not require the tables to be in 1NF.  

This post just gave a brief overview of why traditional databases such as MUMPS are being widely used in the healthcare industry and their use.  In my following posts, I will discuss more on Big Data, Business Intelligence and their applications. 

References

Sunday, September 8, 2013

BIG DATA - "The buzzword"

I am sure most of us would have heard of the term “Big Data”. But do we all really know what Big Data is. For most of us it is just a term which signifies a “lot of data”. But Big Data has lot more than just “lot of data”. Big data is a buzzword which is used to describe massive amount of data (both structured and unstructured data). This data is so huge that it is difficult to process it using traditional database and software techniques.

Now some of you might be wondering how Big Data is different from another famous Buzz word “Business Intelligence”. Well, 10 years back data was not as massive as it is today. Thus conventional techniques such as querying and reporting (Internal data) formed a major part of Business Intelligence. But now the world is moving into Web and Social Media where we have thousands of data majorly in the external form which has no defined structure. Processing this data using the conventional Business Intelligence techniques is not possible. This external data which is very massive forms a part of Big Data. The science of pre-processing, storing, analyzing, and predicting patterns is called “Data Science” or "Business Analytics".

After giving a broad picture of what Big data is, the next question which arises is how is Big Data useful. Who are the users, the business needs, its applications in real world. As mentioned in the previous paragraph, data is increasing massively. Companies like Google who is leading in the Search Engine market deal with large amount of data on the web. To query and provide search results against petabytes (1,024 terabytes)  or exabytes of data quickly and efficiently, a lot of intelligence needs to be applied. Google came up with very useful algorithms such as Hadoop, Map Reduce and its variations to manage their data. This process is continuous and challenging where the algorithms has to be updated to manage the exponentially growing data. 

Apart from Google, there are other various other companies who are moving towards Big Data. The Health care industry demands maintaining large amount data consisting of doctor information, patient records and the insurance details. A simple conventional database will not suffice the purpose. To increase revenue, these companies are trying to predict various patterns on the diseases which can possibly occur and the required prevention to be taken using Data Science techniques. Even companies such as Amazon, eBay, PayPal who are into e-commerce are using Big Data techniques to improve their business. Th recommendations which appear after one purchases a product from these e-commerce sites are examples of how Big Data is being used. Apart from web based companies, other companies who had their focus on standalone applications are moving towards Big Data and Analytics. The best example for this would be Adobe Systems who had their business focused on Flash and Flex during 2010. But now even they are moving towards Big Data industry capitalizing on Digital Marketing (Acquisition of Omniture in 2009) and using all their applications ( Adobe Reader, Adobe Photoshop ) on Creative Cloud.

We have got a general idea of what Big Data is and how it is used in today's world. But, the knowledge one posses on data can never be complete as data is growing endlessly. It is a very interesting and challenging space to conquer. Stay tuned to my next post where I will discuss more about Big Data and its related applications. 

References: