Tuesday, August 20, 2013

Business Intelligence: Using Mining Structures, Models, Cubes & Dashboards

In a previous article I discussed the design of the data warehouse database. This is the first article in a series of articles that discuss ways to access and use data warehouse data to achieve corporate goals. Topics discussed in this article include mining structures and models, cubes, scorecards and dashboards.

I remember when I was first exposed to SQL Analysis Services. That was when it was launched for SQL Server 2000. Business Intelligence (BI) has come a very long way since then; and, Microsoft has (over the years) continuously updated its suite of BI tools. With SQL Server 2000 Analysis Services accessing data warehouse information took more effort and provided less options than today.

Today BI developers can use the Business Intelligence Development Studio or Visual Studio 2010 (or higher) to build mining structures and models that help sales and marketing teams launch more effective strategies to sell products or services. And, let's face it, nearly every company has a product or service it wants to sell. And, most want marketing to be more efficient and effective in this sluggish economy. What BI can do is answer the question "who" is most likely to buy my product or service. Instead of sending marketing materials to every single prospect; companies can narrow the list down to the people most likely to make the purchase. BI developers can also build cubes; and, ultimately develop scorecards so companies can monitor sales progress; or, determine an organizations progress towards meeting established targets. 

Getting Started


Before a BI developer can build scorecards and dashboards; there are preparation steps that must be followed. The developer must for create a business intelligence project to access and prepare the data in the data warehouse. But cubes are not the only power-tool offered through BI. Mining structures and models give access to predictions that can save time and money as well as boost sales. However, Before a mining structure and model or cube can be created the following three tasks must be completed from within SQL Server Business Development Studio; or, a BI project created with Visual Studio 2010 or higher:
  • Create a Data Source: A connection must be made to the database that contains the data to be accessed.
  • Create Data Source View(s): A view of the data, to be accessed, must be created. The Data Source View uses the data source to access the database so developers can select the tables/views that contain the data. If a developer is to use some (not all) of the columns in a table the developer can create a Named Query using the Data Source View. Although the project is Analysis Services, Transact-SQL (T-SQL) is used to build the Named Query because the query directly accesses the database objects. The T-SQL select statement can be used to select the desired table and fields, as shown in the following picture.


Likewise, if a New Named Calculation is created, from the Data Source View, T-SQL expressions are also used.



  • Create Dimension(s): Data that presents meaningful attributes to support a measurable event, within an organization, are modeled in business intelligence as dimensions. Dimensions provide the data that helps companies learn more about itself. For example, a company's purchases are measurable events as are its sales. Details, or attributes, associated with sales might include products, promotions, customers, region, etc. The details, which are the data stored in dimension tables, provide a way for companies to answer questions about  the internal events. For example, the sale of products might lead the company to ask what products brought in the most money? what products aren't selling? If companies spent money on promotions that company may want to know what ads were tied to peaks in sales? A company may even ask who are our top 10 customers? Or, in what region is our product most popular?  Dimension details is what provides the details that gives companies answers to these questions. And, in a Visual Studio 2010 BI project the dimension(s) are defined before mining structures and models or cubes are created.

Mining Structures and Models


Any company that has historical promotional, orders/sales, demographic and other data has a data goldmine. BI developers can create a mining structure that defines the input data for mining models. The mining models can use demographic (such as gender, age, region, marital status, etc.) and other data to identify the makeup of and drivers for customers who have purchased a product or service. This information can then be used to identify the people most likely to buy. The Microsoft SQL Server 2012 Tutorials - Analysis Services Data Mining tutorials provide BI developers a great foundation to learn how to build a business intelligence project and work with mining structures and models.

Through the tutorials the developers use data in the AdventureWorksDW2012 database to build three mining models: the Decision-Tree, Cluster and Naive Bayes. In addition, the tutorial teaches developers to test the accuracy of each model's predictions to identify the most accurate mining model by creating a Mining Accuracy Chart. Developers begin by selecting the models to be compared.




The Lift Chart's property window (bottom right corner in the following picture) displays the Model Name, Score (which conveys the effectiveness of the model), Population Correct  (a value that defines the percentage of population data that is correct) and Predict Probability (a value that tells how accurate the predictions will be).  There are several useful resources that provide additional information on Lift Charts as follows:  http://technet.microsoft.com/en-us/library/ms175428.aspx and http://msdn.microsoft.com/en-us/library/ab77eca1-bd48-4fef-b27f-ff5b648e0501%28v=sql.90%29 . In the following Lift Chart, the TM_Decision_Tree model is closest to the Ideal Model and will, therefore, provide the most accurate predictions regarding who will buy the product (which in this example is a bike).




Once the company has the profile of the persons most likely to buy the product; marketing materials and promotions can be designed for the targeted audience. The following paragraphs provide additional information about Microsoft models used in the above chart.

Decision Tree Model


The Decision Tree and other mining models, used by Microsoft BI tools, apply advanced statistical methods that have been used for years to analyze data. 

The Decision tree model was developed to make predictions. In the following example, the decision tree results originate from an All node for bike buyers (also taken from Microsoft SQL Server 2012 Tutorials - Analysis Services Data Mining). The light blue represents the group that has a 50.59% probability of buying a bike versus the pink group that has a 49.41% probability of not buying a bike (as shown in the properties window in the bottom right corner of the picture below).


The decision tree presents the attributes as nodes that can be expanded or collapsed. From the All root node the decision tree shows bike buyers in four groups of people based on the number (0, 1, 2, 3 and 4), which defines the number of cars owned.


From there users can see the people grouped by income, age, commuting distance, marital status, etc.


Users can mouse over a node to see the number of cases broken down by the light blue (represented by 0), pink (represented as a 1) and cases with missing values (represented as the word missing).


If the drill-down option is selected when the mining model is created; users can select the drill-down option to view the cases associated with a node.




The Dependency Network tab of the Mining Model Viewer provides access to the All Links slider. This view shows all of the factors relevant in predicting whether someone will buy a bike.


Users can move the slider down towards the Strongest Links option to see the factors that that have the most impact on predicting whether someone is or is not a bike buyer. According to the decision tree model (shown below) age, region and number of cars owned are among the strongest influences. Notice, in addition to the predictions the influencing factors also represent information someone is not likely to conclude without a model.


The following paragraphs present additional information on the two other models that were compared to the decision tree model to determine the most accurate model in this scenario.

Microsoft Clustering


The following diagram shows the data modeled using the Clustering algorithm. Notice the clustering mining model presents the same data in a way that looks very different from the decision tree model. The data is grouped into clusters.


The Cluster Profiles view, show below, provides details on how the clusters are divided into the same attributes (income, age, commute distance, etc.) as the decision tree.


Microsoft Naive Bayes


With the Naive Bayes model the user is first presented the Dependency Network tab. The All Links slider is moved to view the influencing factors used to identify bike buyers.



The Naive Bayes model also has an Attribute Profiles view and an Attributes Characteristics view, as shown below. The following view sow the probability based on attributes.


Querying Multidimensional Data


When creating the inputs for a model the BI Developer uses the Mining Model Prediction tab, which is available from the Mining Structure design window. The mining model is built from the mining structure, which uses Multidimensional Expressions (MDX) as shown in the picture below. The Mining Model Prediction window has an option that allows the developer to manually build the MDX query; or, the developer can select the input table(s)/field(s) and the system will build the MDX query.  Follow are a few resources to learn more about MDX:  Basic MDX Query article; Key MDX concepts article: http://technet.microsoft.com/en-us/library/ms144884.aspx; and MDX Reference: http://technet.microsoft.com/en-us/library/ms145506.aspx .

Cubes


While mining models aide in predictions, cubes provide a way for developers to select views or tables that will be used as measures, which are numeric values pulled from a fact table. The developer can then select one or more dimensions (discussed in the beginning of this article) to be associated with the measure. The following picture shows the Internet Sales table selected as the measure. Geography, Customer, Product, DimPromotion are all dimensions associated with the measure.


Once the measures and dimensions are created users can access the cube's Browse tab and select a dimension. From within the Browse tab users can view the dimension data associated with the measure. In the following example the Customer dimension was selected, therefore; the Browse details show sales data for each customer.


SharePoint 2010 PerformancePoint Dashboards


SharePoint 2010 extends the capability of BI cubes. Developers can make use of SharePoint's business intelligence capabilities by creating a SharePoint site collection from the Business Intelligence Center template. SharePoint creates a Business Intelligence site collection that provides a way to manage external connections, create reports, scorecards and dashboards. (The following article provides useful information on the difference between a scorecard and a dashboard:  http://office.microsoft.com/en-us/sharepoint-server-help/what-is-the-difference-between-a-dashboard-and-a-scorecard-HA101772797.aspx)


Developers can use the Dashboard Designer, accessed from within SharePoint, to define color-coded thresholds that reveal values below, above or at business targets (for performance measurement) and perform other tasks.



Scorecards let managers and executives view data, in real-time, to remain abreast of an organization's progress towards reaching established goals. Excel charts (discussed in an upcoming post) and reports can be added to the business intelligence center to build dashboards that can help the entire organization improve customer service, sales, productivity and more (see the following tutorial on balanced scorecards:  http://technet.microsoft.com/en-us/library/hh750382%28v=office.14%29.aspx).



With SharePoint 2010 organizations not only manage documents; but also achieve piece of mind by knowing where a company stands with every aspect of its key business operations. The following link provides access to a tutorial that includes instructions on how to build a dashboard and scorecards:  http://msdn.microsoft.com/en-us/sql10r2byfbi-trainingcourse_sql10r2byfbi08-hol-01.aspx