# import the librairies

import pandas as pd
import numpy as np
from sklearn import tree
train.describe()
train.shape
(891,12)


How many people in your training set survived the disaster with the Titanic? To see this, you can use the value_counts() method in combination with standard bracket notation to select a single column of a DataFrame:

# Passengers that survived vs passengers that passed away

survived_num = train["Survived"].value_counts()
print(survived_num)

0    549
1    342
Name: Survived, dtype: int64


# As proportions

percentage = train["Survived"].value_counts(normalize=True)
print(percentage)

0    0.616162
1    0.383838
Name: Survived, dtype: float64


# Males that survived vs males that passed away

male_survived = train["Survived"][train["Sex"] == 'male'].value_counts()
print(male_survived)

0    468
1    109
Name: Survived, dtype: int64


# Females that survived vs Females that passed away

female_survived = train["Survived"][train["Sex"] == 'female'].value_counts()
print(female_survived)

1    233
0     81
Name: Survived, dtype: int64


# Normalized male survival

nor_male = train["Survived"][train["Sex"] == 'male'].value_counts(normalize = True)
print(nor_male)

0    0.811092
1    0.188908

Name: Survived, dtype: float64


# Normalized female survival

nor_female = train["Survived"][train["Sex"] == 'female'].value_counts(normalize = True)
print(nor_female)

1    0.742038
0    0.257962
Name: Survived, dtype: float64


# Does Age play a role ?

Another variable that could influence survival is age; since it's probable that children were saved first. You can test this by creating a new column with a categorical variable Child. Child will take the value 1 in cases where age is less than 18, and a value of 0 in cases where age is greater than or equal to 18.

To add this new variable you need to do two things (i) create a new column, and (ii) provide the values for each observation (i.e., row) based on the age of the passenger.

Adding a new column with Pandas in Python is easy and can be done via the following syntax:

train["Child"] = float('NaN')


# Assign 1 to passengers under 18, 0 to those 18 or older. Print the new column.

train["Child"][train["Age"] < 18] = 1
train["Child"][train["Age"] >= 18] = 0


# Print normalized Survival Rates for passengers under 18

print (train["Survived"][train["Child"] == 1].value_counts( normalize =  True))
1    0.539823
0    0.460177
Name: Survived, dtype: float64


# Print normalized Survival Rates for passengers 18 or older

print(train["Survived"][train["Child"] == 0].value_counts( normalize =   True))
0    0.618968
1    0.381032
Name: Survived, dtype: float64


# Convert the male and female groups to integer form

train["Sex"][train["Sex"] == "male"] = 0
train["Sex"][train["Sex"] == "female"] = 1

# Impute the Embarked variable

train["Embarked"] = train["Embarked"].fillna("S")


# Convert the Embarked classes to integer form

train["Embarked"][train["Embarked"] == "S"] = 0
train["Embarked"][train["Embarked"] == "C"] = 1
train["Embarked"][train["Embarked"] == "Q"] = 2


# Print the Sex and Embarked columns

print(train["Sex"])
print(train["Embarked"])


# Print the train data to see the available features

print(train)


# Create the target and features numpy arrays: target, features_one

target = train["Survived"].values
features_one = train[["Pclass", "Sex", "Age", "Fare"]].values


# Fit your first decision tree: my_tree_one

my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one,target)


# Look at the importance and score of the included features

print(my_tree_one.feature_importances_)
print(my_tree_one.score(features_one,target))


05-05-2016

Bình luận

Bỏ hay Hay
{{comment.like_count}}
{{ comment_error }}

Hiển thị thử

Chỉnh sửa

### Nguyễn Cảnh Hiếu

2 bài viết.
0 người follow
Cùng một tác giả
1 3
Xác xuất có điều kiện Giả sử E và F là hai sự kiện độc lập. Ta có công thức tính xác suất E với điều kiện F là: P(E/F) = P(E,F) / P(F) Ta ứng ...
Nguyễn Cảnh Hiếu viết 6 năm trước
1 3
Bài viết liên quan
1 5
fCC: Technical Documentation Page note So I have finished the HTML part of this exercise and I want to come here to lament about the lengthy HTML ...
HungHayHo viết gần 4 năm trước
1 5
4 0
I used Spring boot, Hibernate few times back then at University, I'v started using it again recently. In this (Link), I want to check how Spring J...
Rey viết hơn 3 năm trước
4 0
25 1
Toán tử XOR có tính chất: + A XOR A = 0 + 0 XOR A = A Với tính chất này, có thể cài đặt bài toán sau với độ phức tạp O(N) về runtime, và với O(1)...
kiennt viết hơn 5 năm trước
25 1

kipalog

bình luận

2 bài viết.
0 người follow

Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!