Doubling for the win.

For many years I am debating the use of martingale or doubling strategies. In this strategy the trade size get doubles with each loss until the first win happens. The hope is that after the first win the sequence makes exactly one dollar as is justified by the following formula 1 + 2 + 2^2 + 2^3 + \ldots + 2^k = 2^k-1. In real trading such a trading strategy usually produces very smooth nice equity curves that eventually end up with a crash. Despite this fact, I know many knowledgeable people who are still using this system and defend it.

In this post I will try to prove from at least one perspective why trading the random walk in the usual way might be beneficial to trading a martingale.

We start with the definition of the underlying process. We will assume that capital of a trader is a Markov chain with the state space given by the current capital. We say we trade a random walk when transition probability is defined as follows P(X_{t +1} = x+1|X_t = x) = p and P(X_{t +1} = x-1|X_t = x) = q where p>q.

We start with the capital x and we want to compute the probability of us hitting zero capital. It is not hard to prove the following proposition

Proposition. The probability to hit zero starting from the capital of x is given by (\frac{p}{q})^x.

In particular, if p=q=1/2 it is well known that the probability of a bankruptcy is sure. However, what makes it possible to trade such a strategy that expected time to a crash is equal to infinity.

One important consequence from this result is when starting capital is very large the chance to go bancrupt approaches zero.

Completely different story with the doubling strategy. I can prove the following

Proposition. For a doubling strategy, the probability to go bancrupt is equal to one.

From this perspective trading martingale is a sure loss.

Lecture IV – Our first model.

After we have finished with the theoretical discussion of the Neural Networks it is time to work on our first model.

We start by loading the dataset into Jupyter notebook

#Load the data
data = pd.read_csv('atlas-higgs-challenge-2014-v2.csv')

This is a Higgs ML challenge. For anyone interested in detailed data analysis please read Lecture I – Exploratory Data Analysis of these lecture series.

We start by dropping all the NaN values from the dataset. It is not right to do so in this particular situation as these missing values are not random and most of them are due to the number of jets produced after the particle decay. However, to make things easier we will not deal with this issue here.

#Remove missing data 
#Ideally split by jets here
data = data.replace(-999.0, np.nan)
data.dropna(inplace = True)

Now, we split our dataset into the training and testing parts. We are not splitting into the validation set as this is done automatically by the algorithm that trains the neural net

from sklearn.model_selection import train_test_split

#Split the data into training, validation and test set
X = data[range(1,31)]
y = data['Label']
y[y=='s'] = 1
y[y=='b'] = 0
w = data['KaggleWeight']
sum_weights = sum(w)

N = X.shape[0]
X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(X, y, w, test_size=0.2, random_state=42)

On the way we rename the states values to 0 and 1 and isolate the weights as a
separate variable. The split is 80% train set and 20% test set.
As we know already, the key metric for Higgs ML challenge is AMS – Absolute Median Significance of discovery.

def ams(s, b):
return sqrt(2 * ((s + b + 10) * log(1.0 + s/(b + 10)) - s))

def get_ams_score(W, Y, Y_pred):
multiplier = sum_weights/sum(W) #normalization of weight
s = W * (Y == 1) * (Y_pred == 1)
b = W * (Y == 0) * (Y_pred == 1)
s = np.sum(s)
b = np.sum(b)

return ams(s, b)

#Check for random prediction
y_pred = np.random.binomial(1, 0.5, y_test.shape[0])

print(get_ams_score(w_test, y_test, y_pred))

We have tested how much we score in the AMS by randomly assigning the labels to the input variables. The value of AMS is roughly AMS = 0.5.

We are in position now to traing our first model.

#Now we may train the Neural Network
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(early_stopping = True), np.array(y_train, dtype='int'))

and predict the values of labels on the test set.

y_pred = mlp.predict(X_test)
print(get_ams_score(w_test, y_test, y_pred))
print(1.*sum(y_pred == y_test)/y_test.shape[0])

As a result we score AMS = 1.21 and quess the right labels ~80% of the time.

Before we proceed with data transformation I will describe a trick that can help make the AMS slightly better. As we know, MLP (multi-layer perceptron) will optimize the cross-entropy not directly the AMS. The labels to observations are assigned according to the condition P(y=1|x)>0.5. Since the fraction of Higgs bosons is much less than 50 percent, it might be advantageous to label only those particles as Higgs which has much higher probability P(y=1|x)>\theta, \theta >0.5.

def AMS_optim(theta):
y_pred = (y_proba > theta)
return get_ams_score(w_test, y_test, y_pred)

theta = 0.8
y_proba = mlp.predict_proba(X_test)[:,1]
y_pred = (y_proba > theta)
print(get_ams_score(w_test, y_test, y_pred))
print(1.*sum(y_pred == y_test)/y_test.shape[0])
Theta = [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
scores = map(AMS_optim, Theta)

plt.plot(Theta, scores)

As a result we get the following dependence between the AMS and \theta:


We have thus increased the value of AMS to about 1.3

Let us now improve the model by applying the transformation of the input set:

X_mod = X
columns_negative = X_mod.apply(lambda x: (x<=0).sum(), axis = 0)
print columns_negative

X_mod['DER_mass_MMC'] = np.log(X_mod['DER_mass_MMC'])

X_mod['DER_mass_MMC'] = np.log(X_mod['DER_mass_MMC'])
X_mod['DER_mass_vis'] = np.log(X_mod['DER_mass_vis'])
X_mod['DER_pt_h'] = np.log(X_mod['DER_pt_h'])
X_mod['DER_mass_jet_jet'] = np.log(X_mod['DER_mass_jet_jet'])
X_mod['DER_deltar_tau_lep'] = np.log(X_mod['DER_deltar_tau_lep'])
X_mod['DER_pt_tot'] = np.log(X_mod['DER_pt_tot'])
X_mod['DER_sum_pt'] = np.log(X_mod['DER_sum_pt'])
X_mod['DER_pt_ratio_lep_tau'] = np.log(X_mod['DER_pt_ratio_lep_tau'])
X_mod['PRI_tau_pt'] = np.log(X_mod['PRI_tau_pt'])
X_mod['PRI_lep_pt'] = np.log(X_mod['PRI_lep_pt'])
X_mod['PRI_met'] = np.log(X_mod['PRI_met'])
X_mod['PRI_met_sumet'] = np.log(X_mod['PRI_met_sumet'])
X_mod['PRI_jet_leading_pt'] = np.log(X_mod['PRI_jet_leading_pt'])
X_mod['PRI_jet_subleading_pt'] = np.log(X_mod['PRI_jet_subleading_pt'])

apply the modifications and rerun the model.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_mod = scaler.fit_transform(X_mod)

X_train, X_test, y_train, y_test, w_train, w_test = train_test_split(X_mod, y, w, test_size=0.2, random_state=42)

mlp = MLPClassifier(early_stopping = True), np.array(y_train, dtype='int'))

y_pred = mlp.predict(X_test)
print(get_ams_score(w_test, y_test, y_pred))
print(1.*sum(y_pred == y_test)/y_test.shape[0])

y_proba = mlp.predict_proba(X_test)[:, 1]
scores = map(AMS_optim, Theta)
plt.plot(Theta, scores)


finally getting to the value of AMS of more than 2.3.