SAS Predictive Modelling Interview Questions

What is Predictive Modelling?

Predictive modeling knowledge is one of the most sought-after skill today. It is in demand these days. It is being used in almost every domain ranging from finance, retail to manufacturing. It is being looked as a method of solving complex business problems. It helps to grow businesses e.g. predictive acquisition model, optimization engine to solve network problem etc.

What are the essential steps in a predictive modeling project?

It consists of the following steps –

Establish business objective of a predictive model
Pull Historical Data – Internal and External
Select Observation and Performance Window
Create newly derived variables
Split Data into Training, Validation and Test Samples
Clean Data – Treatment of Missing Values and Outliers
Variable Reduction / Selection
Variable Transformation
Develop Model
Validate Model
Check Model Performance
Deploy Model
Monitor Model

Explain the problem statement of your project. What are the financial impacts of it?

Cover the objective or main goal of your predictive model. Compare monetary benefits of the predictive model vs. No-model. Also highlights the non-monetary benefits (if any).

Difference between Linear and Logistic Regression?

Two main differences are as follows –

Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups). While Binary logistic regression requires the dependent variable to be binary – two categories only (0/1). Multinomial or ordinary logistic regression can have dependent variable with more than two categories.

Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value. While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood).

How to treat outliers?

There are several methods to treat outliers –

Percentile Capping
Box-Plot Method
Mean plus minus 3 Standard Deviation
Weight of Evidence

What is multi co-linearity and how to deal it?

Multi co-linearity implies high correlation between independent variables. It is one of the assumptions in linear and logistic regression. It can be identified by looking at VIF score of variables. VIF > 2.5 implies moderate co-linearity issue. VIF >5 is considered as high co-linearity.

It can be handled by iterative process: first step – remove variable having highest VIF and then check VIF of remaining variables. If VIF of remaining variables > 2.5, then follow the same first step until VIF < =2.5

Explain co-linearity between continuous and categorical variables?

Co-linearity between categorical and continuous variables is very common. The choice of reference category for dummy variables affects multi co-linearity. It means changing the reference category of dummy variables can avoid co-linearity. Pick a reference category with highest proportion of cases.

What are the applications of predictive modeling?

Predictive modeling is mostly used in the following areas –

Acquisition – Cross Sell / Up Sell
Retention – Predictive Attrition Model
Customer Lifetime Value Model
Next Best Offer
Market Mix Model
Pricing Model
Campaign Response Model
Probability of Customers defaulting on loan
Segment customers based on their homogenous attributes
Demand Forecasting
Usage Simulation
Underwriting
Optimization – Optimize Network

Is VIF a correct method to compute co-linearity in this case?

VIF is not a correct method in this case. VIFs should only be run for continuous variables. The t-test method can be used to check co-linearity between continuous and dummy variable.

Difference between Factor Analysis and PCA?

The main 3 difference between these two techniques are as follows –

In Principal Components Analysis, the components are calculated as linear combinations of the original variables. In Factor Analysis, the original variables are defined as linear combinations of the factors.
Principal Components Analysis is used as a variable reduction technique whereas Factor Analysis is used to understand what constructs underlie the data.

In Principal Components Analysis, the goal is to explain as much of the total variance in the variables as possible. The goal in Factor Analysis is to explain the co-variances or correlations between the variables.

Post Views: 2,174

November 26, 2019

SAS Predictive Modelling Interview Questions

Drop An Enquiry

Search

COURSE CATEGORIES

Upgrade your skills by applying the world Best Online Learning Platform

More Than 5000+ satisfied students and 100+ successful Corporate Trainings

We Provide Best Training by certified Industry experts on real time base

Request for Free Demo