By the numbers...: 2015

Wednesday, December 23, 2015

How to interpret Beta?

How to interpret beta?

sensitivity of the expected excess asset returns, E(Ri) - Rf, to the expected excess market returns, E(Rm) - Rf
so, beta = (E(Ri) - Rf) / (E(Rm) - Rf)

in other words, beta tells u how many times your instrument should return above rfr given the market return above rfr

say the market return = 5%, rfr = 2%, and your stock has a 1.5 beta
beta tells you that your instrument should be getting 6.5% return
if your instrument returns above 6.5%, ALPHA!! :D
if below, then why are u investing in that instrument rather than the market portfolio?

how to calculate beta?

Cov(Ri, Rm)/Var(Rm)
= (corr(Ri, Rm) * sigma i * sigma m) / (sigma m)^2
= corr(Ri, Rm) * sigma i / sigma m

putting in back to capm
Ri = rfr + beta * (Rm - rfr)
Ri - rfr = corr(Ri, Rm) * sigma i * (Rm-rfr)/sigma m
(Ri -rfr) /sigma i = corr(Ri, Rm) * (Rm-rfr)/sigma m

now we can interpret the corr(Ri, Rm) as
the ratio between sharpe ratio of the instrument and the sharpe ratio of the market

what is variance?
avg squared deviation from mean
= E[(X-E[X])^2]

can also be thought of as the covariance with itself
= Cov(X,X)
= E[(X-E[X])(X-E[X])]

which brings us to the covariance of 2 diff variables
Cov(X,Y)
= E[(X-E[X])(Y-E[Y])]

Tuesday, December 8, 2015

setup linux vm in 64-bit Windows host without installation

Download binary from lassauge.free.fr/qemu and extract

Create a 40G hard disk by running
qemu-img create hd.img 40G

Install by Linux os with
qemu-system-x86_64 \
-drive file=hd.img,index=0,media=disk,format=raw \
-L Bios -m 1024 \
-cdrom fedora-install.iso

Start your vm with ssh port forwarded with
qemu-system-x86_64 \
-drive file=hd.img,index=0,media=disk,format=raw \
-L Bios -m 1024 \
-redirect tcp:2222::22

ssh -p 2222 localhost
to get to your vm

Tuesday, November 24, 2015

Excel color scheme css

body
{
font-family:Verdana;
font-size:small;
}
table,td,th
{
border-color:#95B3D7 white;
border-collapse:collapse;
font-size:small;
}
th
{
background-color:#DBE5F1;
color:black;
padding:5px;
}
td
{
text-align:right;
padding:5px;
}
td.left
{
text-align:left;
padding:5px;
}

Tuesday, November 17, 2015

python pandas groupby agg percentiles?

use the ways as described in stackoverflow from google search

or

simply use describe:

df.groupby([col1, col2]).describe(percentiles=[.75, .95])

Optionally, you may wanna have the aggregated value display horizontally and round the numbers by appending:

df.groupby([col1, col2]).describe(percentiles=[.75, .95]).unstack().apply(lambda x:np.round(x,0))

Monday, November 9, 2015

Forex Pattern Recognition, algorithmic trading, monte carlo simulation

http://pythonprogramming.net/dashboard/#tab_forexpatrectrading

Thursday, October 1, 2015

Augment histogram with Time Maps: Visualizing Discrete Events Across Many Timescales

https://districtdatalabs.silvrback.com/time-maps-visualizing-discrete-events-across-many-timescales

Friday, September 18, 2015

How to read command line output directly into pandas dataframe?

cmd = r"zgrep abc application.log | perl -pe 's/pattern/subs/'"
# python 2
pd.read_csv(StringIO.StringIO(subprocess.check_output(cmd, shell=True)))
# python 3
pd.read_csv(BytesIO(subprocess.check_output(cmd, shell=True)))

Wednesday, August 26, 2015

From MySQL to pandas df with Python 3

Install mysql connector

# http://conda.pydata.org/docs/faq.html#id1
conda install -n <your python 3 env> mysql-connector-python

Access MySQL from python 3 with mysql connector and put result into pd df

# http://dev.mysql.com/doc/connector-python/en/connector-python-tutorial-cursorbuffered.html

import mysql.connector

# Connect with the MySQL Server
cnx = mysql.connector.connect(user='scott', database='employees')

# note that we'll have to set dictionary=True to get column name into pd and fetchall afterwards
cur = cnx.cursor(buffered=True, dictionary=True)
cur.execute('SELECT now() from dual')
pd.DataFrame(cur.fetchall())

Sunday, August 16, 2015

ipython / jupyter - how to switch kernel?

With ipython and python 2.7 installed using anaconda, how do I switch kernel to use 3.*?

$ conda create -n py34 python=3.4 anaconda
$ source activate py34
$ ipython kernelspec install-self --user
$ ipython notebook --profile=nbserver --script

Tuesday, July 14, 2015

explainshell.com

breaks down and decipher any shell command for u

Tuesday, May 12, 2015

eigenvalue and eigenvector of a matrix (and why we bother)

These 2 links give a good review on it:
http://tutorial.math.lamar.edu/Classes/DE/LA_Eigen.aspx
https://www.math.hmc.edu/calculus/tutorials/eigenstuff/

say we've a matrix A, if we can satisfy this:
A*v_e = lambda*v_e

v_e = eigen vector of matrix A
lambda = eigen value of matrix A

example
| 2 7 | | -1 | | -1 |
| -1 -6 | * | 1 | = -5 * | 1 |

why do we even need this?
see http://math.stackexchange.com/questions/23312/what-is-the-importance-of-eigenvalues-eigenvectors
in a nutshell, it allows us to transform from standard basis, which is sometimes computationally intensive to a different basis to work in, one which simplifies the calculations necessary"

taylor's series application

say we wanna know f(x1), but we only know

x1-x0 is small,
f(x0),
f'(x0), ie first derivative
f''(x0), ie 2nd derivative
higher order of derivatives, etc.

, what do we do?

using taylor's series, we can estimate by
f(x1) = f(x0) + (x1-x0)*f'(x0)/1! + (x1-x0)*f''(x0)/2! + ...

:D

a concrete example, say,

with the current underlying price, x0,
we calculate an option's value, f(x0),
the associated delta, f'(x0),
gamma, f''(x0)

if the underlying price moves a little bit from x0 to x1, how do we estimate the new option price, f(x1), without going through the option pricing model?

Saturday, May 9, 2015

Quantitative methods

Variance
Avg sq dev from mean

Bayes
P(A|B)*P(B) =P(B|A)*P(A)

Binomial distribution
Prob of x successes in n trials
Mean np
Var np (1-p)

Sampling distribution
Population of 1k bonds
Randomly pick 100 to get mean
Pick another 100 to get mean
Repeat x times
Now we have x means forming sampling distribution of the mean

Central limit theorem
Population mean = mean of sample means
Population variance = n * variance of sample means, with n = sample size

Std err of sample means
= Std dev of sample meanS
= population std dev / sqrt (n)
Think about it , more the observations vary, more likely u will get an inaccurate answer
Vice versa for sample size

Well, we don't have population std dev, so will use std dev of sample (NOT sample means)

Putting the above together, we get a point estimate of population mean from samplings.
We can get a confidence interval of our point estimate with the std error

Depending on the availability of population variance and sample size, we may use t distribution instead of normal, ie z.

Hypothesis testing
Is daily option return = 0?
Sample size of 250 days
Mean return = .1%
Sample std dev of return =.25%

Null hypothesis: population daily option return = 0
If the difference between the sample mean and population mean is big enough, then we can reject the null hypothesis and say mean return ! = 0.
How to quantity whether it's big enough?
We have to look at how accurate the sample mean is, ie how close sample mean is to population mean.
Say, the sample mean is 100% accurate, ie sample size = population size, ANY difference between sample mean and the hypothesized population mean is sufficient for us to reject the null hypothesis.
We quantity the accuracy of the sample mean with std err, ie std dev of sample meanS, = population sd / sqrt (n)
=.25%/sqrt (250) = .000158

.1% divided by the above gives 6.33
Tells us that the difference is 6.33 std dev away, which is very unlikely
5% sig interval is at +- 1.96 sd with z distribution

Regression
R^2 = coefficient of determina
= explained variation / total variation
= (total - unexplained)/ total
With
Total = sum of sq dev from mean
Unexplained = sum of sq dev from predicted

Testing a regression coefficient for significance
There's a certain critical t stat value for the regression coefficient +- std err to be within. That value is a function of degree of freedom , n-k-1, sample size - independent variables - 1

Handbook of Exchange Rates - FX Options and Volatility Derivatives: An Overview from the Buy-Side Perspective

Handbook of Exchange Rates

24 FX Options and Volatility Derivatives: An Overview from the Buy-Side Perspective24.1 Introduction
24.2 Why Would One Bother with an Option?
0.4 rule of thumb: ATM call = ATM put =
ul price * 0.4 * vol * sqrt(t)

delta = how fast option price changes as UL price changes
gamma = how fast delta changes as UL price changes (always + for options)
vega = how fast option price changes as vol changes

option, in theory, can be replicated by continuously hedging delta, but for latter, during mkt crashes, no one is willing to buy UL and will suffer

gamma hedging (http://investorplace.com/2010/01/long-gamma-position/)
ex. you're long gamma, ie u own option, since gamma always + for options
1) stock rallies, you get longer delta due to +gamma. (so u sell the extra delta at a higher price to remain delta neutral)
2) stock drops, you get shorter delta due to +gamma. (so u buy the loss delta at a lower price to remain delta neutral)
buy low + sell high = $$$
there's no free lunch. by owning option, you pay theta everyday. u'd better hope vol is high so that u can make $ by gamma hedging.

24.3 Market for FX Options
assumptions to make:
expected realized vol
expected skewness
+ive skew - skew to the right - right has longer tail
+ive risk reversal: 2 OTM option - 25 risk reversal is the vol of the 25 delta call less the vol of the 25 delta put. The 25 delta put is the put whose strike has been chosen such that the delta is -25%. it shows how much demand for upside relative to downside

can't we just use put call parity? no, b/c we have diff strikes
c + x/(1+rfr)^t = p + ul
@t, c + x = p + ul
c ITM
(ul - x) + x = ul
c OTM
x = (x -ul) + ul
expected kurtosis
A high kurtosis distribution - tall and skinny
low one is short w/ fat tails, ie extreme occurrences occur with a probability greater than normal.
expected term structure
PRINCIPAL COMPONENT ANALYSIS shows 3 main components
- // shift - high demand of specific stike/maturity shifts the whole surface
- steepener - changes relative price of short-term and long-term vol
- gull - changes of relative vol of mid term vs short and long term
or diff strikes?

24.4 Volatility
variance swap - pays the diff b/w future realized variance & a predetermined variance strike

how does gamma affects the spot mkt?
say the mkt implied vol is getting higher than historical vol and the ones expecting it to get back to closer to the historical level will sell options and thus take on -ive gamma position
as UL goes up, u get less delta and have to buy spot to hedge and thus amplify the trend in the mkt.
it only applies when u assume the option buyer aren't gonna hedge. that's a valid assumption if the buyers, like equity pm, are buying options to insure against their existing spot position, rather than option MM.
so, if we can identify the buyers, we can formulate a trend following strategy?
on the flip side, if implied vol is a lot lower than hist and we expect the implied vol will reverse, we'll buy options and thus hold a +ive gamma position.
as UL goes down, we have less delta and will buy and effectively dampens the spot trend.

black swan strategy
target fat tail risks

24.5 FX Options from the Buy-Side Perspective
corr swap - exchanges the realized correlation into strike corr multiplied by a notional

Thursday, March 26, 2015

dividend yield

@end of q3, a company announces a div, payable @end of q4.
(let's simplify the question by assuming
* ex-div date = payable date
* company pays div only once a year)

1) is it to better buy later towards the end of q4?
some may think that it's better to buy later toward the end of q4, since they think they're getting X amount of div while holding a shorter period of time and thus increasing the return.
it's actually not the case. let's see why.
@end of q3, a company announces a div of $10.
We wait 'til the day right before ex-div @end of q4 and buy the stock at mkt price of $200.
On ex-div, we'll have
$10 div
the share of stock

it's a win, right?
No, it's not..
by definition, on ex-div open, the stock's price will drop by that div amount, trading at $190. (think about it, you own a company worth $200. after it gives away $10, it's only worth $190, right?)
so basically, you pay $200 to get something that's worth $190 and $10 cash div.
It's actually a loss since you have to pay brokerage, etc.

2) what kinda return am I actually getting anyway if I buy sometime between div announcement and ex-div?

let's look at an example. a company announces $5 div at t3 (end of q3) payable at t4. Say this company for that year pays 5%.
in an ideal world with simple interest, the stock price should go like this:
t0    100
t1    101.25
t2    102.5
t3    103.75
t4    105    => 100 + 5

in reality, at t3, that's when it announces a div of $5. say, coincidentally, it's trading at 103.75.
how SHOULD we calculate the div yield of the co for that year?
1 way is to:
    div / (current price - div*(3/4))    # 3/4 of the year has passed
=    5 / (103.75 - 5 * 3/ 4)
=    5%

in reality,
    we just do
    div / current price
=    5 / 103.75
=    4.82%
the thinking (probably) is that, ignoring the actual cash flow, we assume the stock will pay the same amount of div in the next 12 months
thus, the div yield will be 4.82% instead

Monday, March 9, 2015

where clause to match string (char list)

// if you use column = "xxx"
// you are gonna get
// ERROR: length
// incompatible lengths
// Try:

select from table where column like "xxx"

Sunday, March 1, 2015

top n per group with python pandas

In [11]: df = pd.DataFrame({'cat':pd.Categorical(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']),
   ....:                    'v1':np.random.randint(100, size=(9)),
   ....:                    'v2':np.random.randint(10, size=(9)) })

In [12]: df
Out[12]:
cat v1 v2
0   A 79   7
1   A 97   5
2   A 81   9
3   B 75   3
4   B 43   7
5   B 27   8
6   C 47   6
7   C 23   9
8   C 53   0

In [13]:

In [13]: # top n per group

In [14]: # 1) sort by the "top n" column (s)

In [15]: df.sort(['v1', 'v2'])
Out[15]:
cat v1 v2
7   C 23   9
5   B 27   8
4   B 43   7
6   C 47   6
8   C 53   0
3   B 75   3
0   A 79   7
2   A 81   9
1   A 97   5

In [16]: # 2) group by column of your choice

In [17]: # 3) select the top n of it

In [18]: df.sort(['v1', 'v2']).groupby('cat').head(2)
Out[18]:
cat v1 v2
7   C 23   9
5   B 27   8
4   B 43   7
6   C 47   6
0   A 79   7
2   A 81   9

In [19]: # 4) optionally sort the output nicely

In [20]: df.sort(['v1', 'v2']).groupby('cat').head(2).sort(['cat', 'v1', 'v2'])
Out[20]:
cat v1 v2
0   A 79   7
2   A 81   9
5   B 27   8
4   B 43   7
7   C 23   9
6   C 47   6

In [21]:

Wednesday, January 28, 2015

SQL database / stdout to kdb

// using mysql as an example
// perhaps a good way to just have parsed output (say from perl) to directly go into kdb

q)(::)test:("*TH"; enlist "\t") 0:system "mysql -u root -e 'select \"string\" str, time(now()) time, 99 short from dual' "
str      time         short
---------------------------
"string" 15:55:11.000 99

q)meta test
c    | t f a
-----| -----
str | C
time | t
short| h

Wednesday, January 21, 2015

kdb load csv into splayed table with column type of character string (not symbol)

// using string, ie list of char, instead of symbol for some feedcode
// since domain is not limited with growing expiries
// http://code.kx.com/wiki/JB:KdbplusForMortals/splayed_tables#1.2.7.3_Symbols_vs._Strings
// note that the entire symbol file is loaded into memory

// To load csv into table of list type, use *
(::)Cols:("SNHS*SH"; "|") 0:`$"/text/file.pipe"
//(::)t: flip ColNames!Cols
// .Q.en is to enumerate the symbol columns
`:/kdb/2000.01.01/table/ set .Q.en[`:/kdb;] flip ColNames!Cols
delete Cols from `.

// load the splayed table
\l /kdb