After we have finished with the theoretical discussion of the Neural Networks it is time to work on our first model.

We start by loading the dataset into Jupyter notebook

#Load the data
data = pd.read_csv('atlas-higgs-challenge-2014-v2.csv')

This is a Higgs ML challenge. For anyone interested in detailed data analysis please read Lecture I – Exploratory Data Analysis of these lecture series.

We start by dropping all the NaN values from the dataset. It is not right to do so in this particular situation as these missing values are not random and most of them are due to the number of jets produced after the particle decay. However, to make things easier we will not deal with this issue here.

#Remove missing data
#Ideally split by jets here
data = data.replace(-999.0, np.nan)
data.dropna(inplace = True)

Now, we split our dataset into the training and testing parts. We are not splitting into the validation set as this is done automatically by the algorithm that trains the neural net

from sklearn.model_selection import train_test_split
#Split the data into training, validation and test set
X = data[range(1,31)]
y = data['Label']
y[y=='s'] = 1
y[y=='b'] = 0
w = data['KaggleWeight']
sum_weights = sum(w)
N = X.shape[0]
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(X, y, w, test_size=0.2, random_state=42)

On the way we rename the states values to 0 and 1 and isolate the weights as a

separate variable. The split is 80% train set and 20% test set.

As we know already, the key metric for Higgs ML challenge is AMS – Absolute Median Significance of discovery.

def ams(s, b):
return sqrt(2 * ((s + b + 10) * log(1.0 + s/(b + 10)) - s))
def get_ams_score(W, Y, Y_pred):
multiplier = sum_weights/sum(W) #normalization of weight
s = W * (Y == 1) * (Y_pred == 1)
b = W * (Y == 0) * (Y_pred == 1)
s = np.sum(s)
b = np.sum(b)
return ams(s, b)
#Check for random prediction
y_pred = np.random.binomial(1, 0.5, y_test.shape[0])
print(get_ams_score(w_test, y_test, y_pred))

We have tested how much we score in the AMS by randomly assigning the labels to the input variables. The value of AMS is roughly .

We are in position now to traing our first model.

#Now we may train the Neural Network
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(early_stopping = True)
mlp.fit(X_train, np.array(y_train, dtype='int'))

and predict the values of labels on the test set.

y_pred = mlp.predict(X_test)
print(get_ams_score(w_test, y_test, y_pred))
print(1.*sum(y_pred == y_test)/y_test.shape[0])

As a result we score and quess the right labels ~80% of the time.

Before we proceed with data transformation I will describe a trick that can help make the AMS slightly better. As we know, MLP (multi-layer perceptron) will optimize the cross-entropy not directly the AMS. The labels to observations are assigned according to the condition . Since the fraction of Higgs bosons is much less than 50 percent, it might be advantageous to label only those particles as Higgs which has much higher probability .

def AMS_optim(theta):
y_pred = (y_proba > theta)
return get_ams_score(w_test, y_test, y_pred)
theta = 0.8
y_proba = mlp.predict_proba(X_test)[:,1]
y_pred = (y_proba > theta)
print(get_ams_score(w_test, y_test, y_pred))
print(1.*sum(y_pred == y_test)/y_test.shape[0])
Theta = [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
scores = map(AMS_optim, Theta)
plt.plot(Theta, scores)
plt.show()

As a result we get the following dependence between the AMS and

We have thus increased the value of AMS to about *1.3*

Let us now improve the model by applying the transformation of the input set:

X_mod = X
columns_negative = X_mod.apply(lambda x: (x<=0).sum(), axis = 0)
print columns_negative
X_mod['DER_mass_MMC'] = np.log(X_mod['DER_mass_MMC'])
X_mod['DER_mass_MMC'] = np.log(X_mod['DER_mass_MMC'])
X_mod['DER_mass_vis'] = np.log(X_mod['DER_mass_vis'])
X_mod['DER_pt_h'] = np.log(X_mod['DER_pt_h'])
X_mod['DER_mass_jet_jet'] = np.log(X_mod['DER_mass_jet_jet'])
X_mod['DER_deltar_tau_lep'] = np.log(X_mod['DER_deltar_tau_lep'])
X_mod['DER_pt_tot'] = np.log(X_mod['DER_pt_tot'])
X_mod['DER_sum_pt'] = np.log(X_mod['DER_sum_pt'])
X_mod['DER_pt_ratio_lep_tau'] = np.log(X_mod['DER_pt_ratio_lep_tau'])
X_mod['PRI_tau_pt'] = np.log(X_mod['PRI_tau_pt'])
X_mod['PRI_lep_pt'] = np.log(X_mod['PRI_lep_pt'])
X_mod['PRI_met'] = np.log(X_mod['PRI_met'])
X_mod['PRI_met_sumet'] = np.log(X_mod['PRI_met_sumet'])
X_mod['PRI_jet_leading_pt'] = np.log(X_mod['PRI_jet_leading_pt'])
X_mod['PRI_jet_subleading_pt'] = np.log(X_mod['PRI_jet_subleading_pt'])

apply the modifications and rerun the model.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_mod = scaler.fit_transform(X_mod)
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(X_mod, y, w, test_size=0.2, random_state=42)
mlp = MLPClassifier(early_stopping = True)
mlp.fit(X_train, np.array(y_train, dtype='int'))
y_pred = mlp.predict(X_test)
print(get_ams_score(w_test, y_test, y_pred))
print(1.*sum(y_pred == y_test)/y_test.shape[0])
y_proba = mlp.predict_proba(X_test)[:, 1]
scores = map(AMS_optim, Theta)
plt.plot(Theta, scores)
plt.show()

finally getting to the value of AMS of more than 2.3.