Exercise: Passengers of the Titanic

The Titanic, with over 2000 passengers on board, including hundreds of emigrants to the US, as well as some of the world's richest, sank in 1912. The seaborn library provides a smaller-sized, anonymized data set of Titanic's passengers. Without identifying information, we can't tell the poor immigrant from the wealthy, yet the data manages to tell a story in other ways. Your task in this exercise is to answer a series of questions from the data, beginning with the mundane and ending with who survived.

In [1]:
import numpy as np
import pandas as pd
import seaborn 
In [2]:
t = seaborn.load_dataset('titanic')
t.head()
Out[2]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Tasks: The exercise is to answer the following questions.

  • How many passengers are described in the data set?
  • How many distinct values are in who column?
  • How many missing values do you find in each data column?
  • Does the data contain passengers over 60 old? How many?
  • What is the passenger age distribution? (Plot it.)
  • What are the 3-quantiles of the passenger age distribution?

(Finite samples are divided into $q$ subsets of nearly equal sizes by $q$-quantiles. The 2-quantile is the median.)

  • How will you drop all passengers with no embarked data?
  • What is the average, minimum, and maximum fares paid by the passengers?
  • What are the proportions of passengers in different classes?
  • What is the female to male ratio in each travel class?
  • What fraction survived?

(This fraction is sometimes called the survival rate - although it is an improper name in the sense that there is no "rate" to speak of here; the question is to compute a dimensionless fraction.)

  • Are the survival rates of male and female passengers different?
  • Are the survival rates of first, second, and third class passengers different?
  • How can one print a table of survival rate dependencies on class and gender?
  • How can one print a table with number of survivors and average fare for each gender and cabin?