What is Logistic Regression Object Create

search

Hypothesis 3

Now we want to include the travel class in our model and thus examine Hypothesis 3. The class is also a factor that has 3 characteristics. We look at the levels:

The three classes are coded with 1, 2 and 3, with the 3rd class being the reference (since it comes first). We have to take this into account when interpreting. Let's look at the results:

This time, all predictors are significant - that too Age. Before we move on to the interpretation, we want to check the addition of the class for significance as a whole. We can see from the output that each dummy variable has significant predictive power in itself, but we want a significance decision for the variable Travel class perform as a whole and use the likelihood ratio test (\ (\ chi ^ 2 \) - difference test).

Model comparison

We already know this from the hierarchical regression session. However, we don't have it here restricted To do likelihoods, so we can omit the argument! We also do not check any variances here, so that we can interpret the \ (p \) values ​​as they are output.

The test yields a significant likelihood difference, \ (\ chi ^ 2 (df = 2) = \) 102.674, \ (p <.001 \). The test here has 2 degrees of freedom, since two dummy variables were also included in the model! Consequently, class membership improves the prediction of the probability of survival of the Titanic disaster.

Interpretation of results

In order to interpret the results in a meaningful way, let's look at the odds ratios again:

The intercept is interpreted at the point at which all predictors assume the value 0. This is the case when the age and all dummy variables assume the value 0, i.e. we are in the reference category for all categorical predictors. Thus, a male infant from the 3rd grade has a 0.265-fold lower probability of surviving compared to dying. If all other predictors in the model are kept constant, the odds of survival decrease by a factor of 0.964 when age increases by one year. If all other predictors in the model are kept constant, the odds of survival increase by a factor of 12,463 when women are compared to men. If all other predictors in the model are kept constant, the odds of survival increase by a factor of 13,205 if a person from the 1st class is compared to a person from the 3rd class. If all other predictors in the model are kept constant, the odds of survival increase by a factor of 3,564 if a person from the 2nd grade is compared to a person from the 3rd grade.

Graphic illustration

We want to take another look at the same results graphically. To do this, we have to perform a prediction using our model again. The prediction and the appending to the data record is carried out in the same way as above:

survivedpclasssexagelogit_m2odds_m2p_m2logit_m3odds_m3p_m3
11228-1.3405720.26169590.20741600.21864431.24438860.5544444
03236-1.3839790.25057940.2003707-2.65786310.07009790.0655060
12131.2609953.52893290.77919742.356257610.55138980.9134303
121401.0602372.88705580.74273590.98780282.68532770.7286537
03232-1.3622760.25607740.2038707-2.50992210.08127460.0751655
02234-1.3731280.25331350.2021150-1.31306660.26899390.2119741

However, we still have to merge the group membership into a variable. We have two variables that indicate group membership: gender and travel class. However, we want to have a total of 6 separate lines drawn. To do this, we use a trick: we transform gender and class back into numbers. We do this in two steps, because as soon as you think in terms of factors, it re-codes internally so that the reference category is always the 1st factor. Hence, if we simply applied the function, the factors in the order in which they are listed at would be converted to numbers from 1 to maximum number of categories. For this reason, we first convert the numbers into letter chains / strings by applying and then we dare to transform them into numbers. We look at this using the example of gender by displaying the first 6 elements:

So after we have converted the factors back into numbers, we then multiply the gender by 100. So that stands for women and that for men. If we now add up the class membership, we get 6 different numbers: = women from the 1st grade, = women from the 2nd grade and = women from the 3rd grade. Stand accordingly, and each for men from 1st, 2nd and 3rd travel class. If we now declare this variable as a factor again, we can use it to generate the desired graphic:

survivedpclasssexagelogit_m2odds_m2p_m2logit_m3odds_m3p_m3class_sex
11228-1.3405720.26169590.20741600.21864431.24438860.5544444201
03236-1.3839790.25057940.2003707-2.65786310.07009790.0655060203
12131.2609953.52893290.77919742.356257610.55138980.9134303102
121401.0602372.88705580.74273590.98780282.68532770.7286537102
03232-1.3622760.25607740.2038707-2.50992210.08127460.0751655203
02234-1.3731280.25331350.2021150-1.31306660.26899390.2119741202

Of course there are other ways to determine this grouping variable. You have now experienced first-hand how well thought out the data must be so that a graphic can be easily created! If you are further interested, you can also take a look at the edited session by Martin Schultze: Graphics with (this is of course purely voluntary!). Now we can create the graphics in the same way as above, where we only have to change the names for the color, the model and the title a little:

All three graphs clearly show the main effects of the analysis: The probability of survival of the Titanic disaster decreases with age. Women have a higher probability of survival than men (of a comparable age) and, speaking descriptively, people who travel in the 1st class had a higher probability of survival than those from the 2nd and 3rd class and those from the 2nd class also had a higher probability of survival than those from the 3rd grade (always for a comparable age and gender). The addition “for comparable age” is basically the same as the addition “with all other predictors in the model kept constant”, because it is difficult to compare the survival probability of a twenty-year-old man from the 1st grade with that of a sixty-year-old woman from the 3rd grade because then we do not know whether the probabilities are different because it is a woman or a man of the respective class, or whether it can be traced back to age, or whether a combination of the variables produces the result - what we do but what you can do is compare the 6 lines for the same age! In addition, we can see from the modeling of the logit or the probability that women who traveled in the 3rd grade had approximately the same probability of survival as men from the 1st grade (again: in each case for a comparable age!). Within the sexes, however, the classes were sorted in the same way: \ (1> 2> 3 \). Overall, this suggests that the motto "Women and children first”Was obeyed.

Finally, we can state that all three hypotheses are likely to be fulfilled and that age, gender and class all influenced the probability of survival in the Titanic disaster.

You can download the entire code used in this session here.