Raj Gopalan's Analytics and Machine Learning projects using Python

Python ML Project 1:

Please click on the title below the picture, to be taken to the github repo containing the Python code in the form an iPython Notebook. The notebook includes Markdown explanation of my approach to breaking down the problem and arriving at a conclusion.

Titanic disaster image

This project utilizes Titanic data set and tries to predict the survival rate of a test data set, by learning from the training data set. I have used multivariate logistic regression in Python, to achieve the desired result.


I begin by trying to frame questions that may help me analyze the data intelligently, such as:

  • Do passenger Id, tickets, fares and names have anything to do with survival rate?
  • Can I ignore cabins, since there are so few "valid" data points related to cabins? How are cabins and Pclass related to survival rate?
  • How does the demography data such as Gender, Family, Socio-Economic-Status (SES), Age relate to survival rate?
  • Where did passengers board the Titanic from and does the survival rate depend on the port of embarkment?

  • I then utilize classic regression techniques using Python's sklearn ML library to identify potential survival rates.

    More to come ... Stay tuned!