By the numbers...: 2014

Thursday, December 25, 2014

bayes theorem - an intuitive explanation

there are good attempts in explaining bayes theorem, such as yudkowsky (long), betterexplained.com (shorter) and wiki. Here, I'll try to show how we can apply it (or rather, avoid the pitfall of not applying it) basing on those.

Let's get to a simplified version of a common example - test detecting drug use.

In hk, some high schools are considering mandatory test to detect drug use.

HYPOTHETICALLY, in hk, 0.1% students are drug users. Let's say we have a test that can detect drug use with 90% accuracy. I have a student here tested positive (meaning test says this student is a drug user). What's the chance of this student being a drug user?

90%, right? The test is 90% accurate, so that's gotta be the answer, right?

No. No. No.

The 1st objective of this post is to prevent you from ever giving this wrong answer again, even if u aren't interested in learning how to get the right answer.

To let u recognize how absurd that answer was, imagine I now have 100 students and I know 99 of them are drug users.

If I randomly pick a student out of this group, we can all agree that there's a 99% chance that this student is a drug user.

Now, if I randomly pick a student out of this group to have him take the same drug test as before and the test result is positive, what's the chance of this student being a drug user?

It's 90% with the rationale (well, the test is 90% accurate) fron the previous answer.

U see now how absurd the answer is, right?

Basically,

we know the student has a 99% chance of being a drug user
the drug test (90% accurate) he takes confirms it
yet, we actually DECREASE his probability being a drug user from 99% to 90% after the test!

That's absurd!

INTUITIVELY, given that we know the student has a 99% chance of being a drug user (prior), a positive result from a reasonably accurate test (90% in this case) confirming the prior should SLIDE the probability even higher (say, 99.xx%) and we should be more sure of this student being a drug user with the test result.

Hopefully, this example will prevent us giving another seemingly logical, but completely absurd answer again.

Essentially, this is saying

P(tested +| drug user) != P(drug user | tested +)

Will elaborate further in next post.

Sunday, December 14, 2014

how do we valuate a call option?

(post from old site)

1)by establishing a hedged portfolio (ie a portfolio value will not be affected by the underlying price) and work backward!!
2) risk-neutral valuation

1)
for example, we have
an underlying with price @t0 = 100
a call option that can be exercised @t1 at 100
what is the call option worth at t0?

hedged portfolio: buy a fraction of the underlying and sell the call
(hedge ratio)*Underlying - c
h*u-c

let's assume @t1, underlying price will either go up by 15 to 115 (u=1.15) or down by 5 to 95 (d=0.95)
if up
underlying is worth 115
call option is worth 15
if down
underlying is worth 95
call option is worth 0

h=? if we want the portfolio to have the same value no matter the underlying goes up or down?

h*100*1.15-15 = h*100*.95-0
h = (15-0)/[100(1.15-0.95)]
h= 15/20 = 0.75

=> it means that if we establish our portfolio at t0 by buying 0.75 share of the underlying and selling the call option @t0, then
@t1, our portfolio will be worth 71.25 regardless of the underlying price going up or down
up = 0.75*100*1.15 -15 = 71.25
down = 0.75*100 *0.95 = 71.25

ok, so what does the above have anything to do with the call price at t0?
since the hedged portfolio will be worth $71.25 @t1 without RISK (ie, no matter the price going up or down), the portfolio has to earn risk less rate due to no arb pricing

say the risk-free rate is 5% and t1-t0 = 1 year, we discount the hedged portfolio value at t1 with the risk-free rate
hedged portfolio value @t0 = 71.25/1.05 ~= 67.857
solve for c
67.857 = h*u-c
= 0.75*100 -c
c = 7.143

in general, h = (cu - cd)/S(u-d)
= (call value if up - call value if down)/Share price@t0(underlying price up change - underlying price down change)

in the example h= (15-0)/[100(1.15-0.95)]

note that the call price is independent of the likelihood of the underlying going up or down!!!!!!!! (that's quite counter intuitive if u think about it!)

2) risk-neutral approach
continuing with the above example, we assumed that risk-free rate=5% and the underlying/Share we use will either go up by 15% or down by 5% from t0 to t1

so what is the implied probability of it going up and going down?
=> ALL ASSETS should be expected make risk-free return in a risk neutral world. if not, arb opportunities exist.
so the Share should make 5% too =>
5% = 15%(prob of going up) + -5%(prob of going down)
= 0.15(p)+ -0.05(1-p)
p = 50% <- implied prob of going up
1-50% = 50% <- implied prob of going down

going back to the call option, with 50% going up (worth $15) and 50% going down (worth 0), the call option should be worth @t1:
0.5*$15+0.5*0= 7.5
discounting w/ rfr to get call value @t0 => 7.5/1.05 = 7.143

guess what? c is worth the same as with approach 1!! (whew...)

---------------------------------------------------
wait a second! we assumed that the S price will only go from 100 up to 115 or down to 95. that doesn't sound realistic!

u're right. the above simplifies everything with a simple one-step binomial tree. to model the real world, we will have to create multiple-step binomial trees. with the above one step binomial tree, we've got 2 expected values of the S price. As we increase the number of steps (n-> infinity), the binomial distribution would approx to a continuous normal distribution.

Game Theory

"Game theory is a study of strategic decision making. More formally, it is "the study of mathematical models of conflict and cooperation between intelligent rational decision-makers."
Think "Prisoner's dilemma".

Application of or Game theory won him 2012 nobel prize:
http://en.wikipedia.org/wiki/Alvin_E._Roth#Case_Study_in_Game_theory

Stable marriage problem is very interesting. Algo is clear and easy to understand. Interesting part is that there can be multiple solutions and stable may not = optimal:
http://en.wikipedia.org/wiki/Stable_marriage_problem#Optimality_of_the_solution

Forward Price

(post from old site)

What is Forward Price?
the AGREED price to buy something in the FUTURE.

Why is Forward Price > Spot price?
Intuitively:
say the underlying is worth $100. if you enter a forward contract to buy that underlying at $100 in 1 year, u will have the freedom to spend/invest that $100 this year (and make interest). this advantage is priced into the forward price so that's why forward price > spot price.

Mathematically:
Arbitrage.
if that's not the case, you'd:
@t=0
enter into the forward contract
short sell the underlying at spot price
lend the $ from above
@t=1

buy the underlying as per the forward contract at forward price determined t=0
use that underlying to even the short sell
receive the $ + interest (proceed of this is to use to buy the contract in 1. above)
Say if the forward price = spot price at t=0, you'd execute the above and risk-free profit (interest from lending the $)!! :D
So forward price has to = spot price (1 + r) so that no arbitrage to happen.

Put call parity

(post from old site)

C = max{S-K,0}
P = max{0,K-S}

C + Kd(0,T) = S + P
@maturity
If S > K,
S-K + K = S + 0
If K > S,
0 + K = S + K-S

Graphically,
lhs is call curve shift up by K
Rhs is stock curve (45 degree) with left tail shifted up and flattened.

Option Early Exercise

(post from old site)

American call

What happens when you exercise?

you give up the option (whatever that's worth)
realize the cash flow (call: St - K; put: K-St)

What is that european option worth approximately?

some upper bound >= option >= some lower bound
St >= european call >= max{St - Kd(t,T), 0}
Kd(t, T) >= european put >= max{Kd(t, T) - St, 0}

When NOT to exercise?

If we end up with CF lower than the minimum value that a given option is worth, we definitely do NOT wanna exercise.

Minimum value of an American Call

@maturity T, Call_Eu is worth max{S-K, 0}

@anytime t, Call_Eu + Kd(t, T) = St + Put_Eu

So, Call_Eu = max{St + Put_Eu - Kd(t, T), 0}

>= max{St - Kd(t,T), 0}

Call_A >= Call_Eu since American offers early exercise.

so, Call_A >= max{St - Kd(t,T), 0}

@time t

if we don't exercise, we hold on to something that's worth at least St - Kd(t,T)

if we do, we realize St - K.

since K > Kd(t,T), Call_A >= max{St - Kd(t,T), 0} > max{St - K, 0}

we'd realize a cash flow max{St - K, 0} that's less than what it's worth at the minimum ie max{St - Kd(t,T), 0} if we exercise at any time t. That's why we don't ever wanna early exercise a call if there's no dividend.

Note that the above applies on stock options only, not future options.
"Remember that the result about it never being optimal to early exercise an American call option on a non-dividend paying stock only applies to ... stocks. A futures contract is not a stock. In fact, as I said in one of the lecture you can think of a futures contract as being a security that is always worth zero but that it pays a (sometimes negative) "dividend" in every period."

Minimum value of American put?

@maturity T, Put_Eu is worth max{K-ST, 0}

@anytime t, Put_Eu = Call_Eu + Kd(t, T) - St

So, Put_Eu = max{Call_Eu + Kd(t, T) - St, 0}

>= max{Kd(t, T) - St, 0}

Put_A >= Put_Eu since American offers early exercise.

so, Put_A >= max{Kd(t, T) - St, 0}

@time t

if we don't exercise, we hold on to something that's >= Kd(t, T) - St

if we do, we realize K - St.

since K > Kd(t,T), we'd realize a cash flow that's greater than what it's worth at the minimum if we exercise at any time t. It means that we cannot eliminate the potential opportunity to exercise.

Merely beating the minimum does NOT mean it's optimal. Say, if you own an ATM Put that's yet to expire. If you exercise it, you give up the option and you get nothing for CF. That option is yet to expired and is certainly worth something and you do not wanna exercise at that point just because it beats the minimum.

So, when do we wanna early exercise? TBD...

St. Petersburg paradox

(post from old site)

Coin toss game. Pay u 2^n if you get head on the nth toss. How much would u pay to play this game?

U'd probably would like to calculate the expected value of this game...
basically, for each possible outcome, u calculate the outcome multiplied by the associated probability. Then u sum it all up.
50% chance to get head in the first game to give u $2, which gives u an expected value of $1.
within the remaining 50% chance, u have half chance to make $4 in the 2nd game, which gives 0.5/2 *2^2 = 1.
as we continue , nth game would give an expected value of (1/2^n)*2^n =1.
if we sum all possible outcomes, we'll be adding 1 for each possible outcome, which will be infinity.

would u even pay $1k for this game for an expected value of infinity ?

That's quite counter intuitive.

supposedly, the paradox is solved with the introduction of utility function, ie point of diminishing return.

I guess 1 thing to learn is that We should be happy to pay 1k for one game of this rather than playing lottery over and over again...

What?! Option value doesn't depend on the likelihood of the underlying going up or down?

(post from old site)

WHAT?!

yes, I'm saying, @t0, these options are worth the same:

Call option with strike price 100, expiring @t1, on underlying A, which is at $100 now
Call option with strike price 100, expiring @t1, on underlying B, which is at $100 now

even though

underlying A has 99% chance of closing at $110 and 1% at $90 on expiry
underlying B has 1% chance of closing at $110 and 99% at $90 on expiry

Seriously, wouldn't you wanna put your $ on the call option (A) that has a 99% chance expiring in the $, instead of the one (B) only has 1% chance of expiring in the money?

HOW?! How are those options worth the same now?!!

2 words: replicating portfolio

Let's have a portfolio such that it'll replicate the cash flow of the option by investing in the underlying and the cash account, assuming risk free rate = 1%, such that

@t1, the portfolio is worth

110x + 1.01y when underlying =110
90x + 1.01y when underlying = 90

So,

110x + 1.01y = 10 (the option expiring in the $)
90x + 1.01y = 0 (the option expiring out of $)

Solving the equations,

x =0.5
y = -44.55

meaning

long half unit of the underlying, and
short (ie borrow) 44.55 in the cash account at risk free rate

With this portfolio, you'll end up with

$10 if the underlying closes at $110 when option expires
$0 if the underlying closes at $90 when option expires

exactly replicating the option CF.

So, what's this portfolio worth @t0?

0.5*100-44.55(1) = $5.45

2 things to note:

both options, A & B, are worth $5.45, disregarding the different probability of up move and down move.
if either option is priced higher / lower, you can arbitrage with the replicating portfolio.

=======================================

with all that said, think about it..

underlying A is not really worth 100 @t0, given the prob & prices @t1...

underlying B is not really worth 100 @t0, given the prob & prices @t1...

Early exercise on a dice game

(post from old site)

Rules:
you get to throw a fair die up to 3 times
the number of dollars you're gonna get is determined by the number you get on you last throw
How much would you pay to play this game?

=====================================================
First of all, if the game is free, what would your strategy be?

1st throw -----> 2nd throw -------------------> 3rd throw
|-> early exercise |-> early exercise

The above will be the decisions you'd have to make
after first throw, do you
go for a 2nd throw, or
early exercise (ie taking the $ shown as on your first throw)?
Similarly, do you make the 3rd throw if you have done the 2nd one?

Let's determine the expected value (EV) for only one toss. easy enough => 3.5.

part A
If my 2nd throw gets me 1, 2, or 3, I'll go for the 3rd throw. Take the $ (exercise) otherwise.

After my 1st throw, it's a bit trickier. Obviously, I'll go for the 2nd throw if my first throw is 1, 2, or 3, applying the same logic above. If my first throw is 6, I will not go for the 2nd throw as I would have already achieved the highest payoff possible.

How about if you get 5 on your first toss? intuitively, I'd see my chance of getting a 6 on either 2nd throw or 3rd throw => 1 - p(1 to 5)*p(1 to 5), which essentially gives the chance of NOT (having first throw resulting in 1 TO 5 AND 2nd throw resulting in 1 TO 5)
you only have 11/36 to beat getting 5. So, if your first toss gives 5, you'd early exercise

How about if you get 4 on your first toss?
similar calculation, you'd have 20/36 to beat getting 4. In this case, you'd NOT exercise and go for the 2nd throw.

part b
So, that's the strategy. Now, how much would you pay to play this game?

Again, working backward:
right before 2nd throw,
50% chance to go for 3rd throw, which will result in EV of 3.5
50% of getting 4, 5, or 6, giving an EV of 5
=> before 2nd throw, the game is worth 4.25

right before first throw,
66% chance to go for 2nd throw, which gives an EV of 4.25, as shown above
33% chance of getting 5 or 6, giving an EV of 5.5
=> the game is worth $4.66666666...

So, if the game is offered to you below that price, go for it. Or you can offer that game above that price (casino!).

One more note, we could have gotten the strategy by calculating the EV alone. ie, part b alone will be sufficient. since we get the EV of 2nd throw = $4.25. If our first throw is above that number, we exercise. If not, we play on.

How does interest rate affect option price?

(post from old sites)

not as simple as many web sites suggest.

interest rate affects option price in 2 folds:

forward price of the spot stock

as interest rate increases, forward price of the stock underlying, aka no arbitrage price of the stock underlying at option expiry, increases, and in turns, increases call value and decreases put value

cost of carrying of the option

as interest rate increases, cost of carrying the option increases and in turns, decreases the option value
a small amount, as option price is a fraction of the underlying price

FAQ:

how about future underlying?

stock option prices on the spot contract as the underlying. Spot underlying and the interest rate determines the forward price, aka no arbitrage price of the stock underlying at option expiry.

Future options prices uses the FUTURE contract as the underlying. Future price is effectively the equivalent of the forward price of the spot underlying and that's why when interest rate changes, the forward of the future doesn't change. In fact and again, we use future as the underlying, not the forward.

Increase in interest does not drive up the forward price of future, but it should drive up future price because of no arbitrage, right?

Yes, but the change in future price is in fact a change in UNDERLYING price. Perhaps another way to think of it is that the change in interest rate directly affects the underlying price of future option, but not affect the the future option itself.

As opposed to change in interest rate does not directly affect the underlying of a stock option, aka spot price of stock (economist may beg to differ), but directly affects the option itself as it changes the ATM forward.

How about far month future option?

Future option uses simple offset mode instead of forward price. Far month future options still use front-month future as the base contract, which is most liquid. To account for the different in expiry, traders put in offset for different months, which embeds the interest and dividends?

Reference:

Option volatility & pricing by Natenberg

Note:
confirmed using Orc. push up interest rate, then

both index call and put TV go down
stock call TV goes up and put TV goes down

CAPM & Efficient frontier "results in error-maximizing investment-irrelevant portfolios"?

(post from old site)

How? parameter estimation error in expected return and covariance.

Example, with a 2 asset portfolio, let's say you overestimates A1 return by e and underestimates A2 by 2. Your average error is=0, which is pretty good.

With modern portfolio theory, you would invest a lot more in A1 and less in A2 and thus maximizing your error...

Quick review on CAPM, efficient frontier, etc.

In a nutshell, modern portfolio theory with CAPM picks a portfolio of risky assets (efficient frontier) and risk-free asset and maximizes the expected return for any given level of risk (volatility) / minimizes the risk for any given level of expected return.

How to maximize expected return? by investing more in assets with higher expected return.

How to minimize risk? by diversifying among assets have low correlation.

example, if you invest in oil company and airline company, you can minimize your risk due to oil price fluctuation. While both companies expect to make money, if oil price increases, earning of the oil company is expected to increase and that of the airline company is expected decrease due to higher cost and vice versa.

How to improve the accuracy of prediction based on past performance?

(post from old site)

One way is, instead of using the expected value, to calculate the confidence interval of the expected value and use the lower end of the value.

Another way is to find a period (period A)in the past when the conditions and the value similar to present, then use the period A+1 data to estimate the next period.

As usual, back testing with out of sample data.

Value at Risk vs. Conditional value at risk

(post from old site)

VaR
95% VaR is the 95th quantile of loss
Say it's = $1 mil. then it means 95% chance the loss is under $1 mil.

CVaR
Aka expected shortfall or tail conditional expectation
With the above 95% VaR, there is a 5% probability that the loss will be >= 1 mil.
Let's say if we fall into that 5%, what's the loss that we should expect?
Now, that's what CVaR is.

Think about it. 95% VaR is essentially the MINIMUM loss of the portfolio for the 5% worst case scenario.
CVaR will actually tell u the mean (expected) loss for the 5% worst case scenario.

Monty Hall, goats and 3 prisoners

(posts from old site)

Welcome to the game show! I'm Monty Hall. As you can see, there are 3 doors in front of you, A, B and C.

You are in for a prize! Behind one of the doors is $1 mil! For the other 2 doors, there is a goat behind each. :D

Now, pick a door!

... Say you pick door A...

Ok, with door B and C, let me open a door with a goat! (Monty opened door C.)

Now, contestant, let me give you 2 choices. You may stick with door A or switch door B for $1 mil.

What do you do?

------------------------------------------

There are 3 condemned prisoners, with 2 to be executed and 1 to be pardoned.

Prisoner A begs the warden to tell him which of the other prisoners, B & C, will be executed, arguing that this reveals no information to his own fate, but secretly thinking that it'll increase his odds of being pardoned from 1/3 to 1/2.

what do u think?

PHILOSOPHY in pricing derivatives

(posts from old site)

Specify a model under the Q(theta)-dynamics

theta is a vector of parameters, e.g. volatility, drift, etc.
Q() is the risk neutral framework

Price all securities at time t by discounting the next period (t+1) risk neutral prices
Calibrate the model by choosing theta so that market prices of appropriate liquid securities agree with model prices of those securities

this calibration procedure to market prices will incorporate the factors not specified in the model, e.g. policy risk, market expectation, economic outlook, etc.

Friday, December 5, 2014

HK property return forecast - Heya Delight

Regression analysis shows that there's a strong relationship (R^2 = -0.81) between annualized cumulative real return of HK property (40 months ahead) and the difference between US 10-year treasury yield and property rental yield. Basically, the more rental yield exceeds US 10-year treasury yield, a better buy a property is.

The regression result aligns with general investment valuation; as generated cash flows exceed risk-free rate further, valuation goes up.

Let's use a new development project, Heya Delight, as an example to see how the difference changes as the rent per sq. ft. changes:

US has been rumored to increase interest rate for quite a while, but it's been delayed and delayed again. Let's assume interest rate remains relatively stable in the next 40 months.

The rent per square foot around the area of comparable quality gets as high as $35 / sq ft per month. Each line above shows different rental income.

It looks like a very good buy for the unit with the lowest price per sq ft at $9,790. Even for the unit with the highest price per sq ft, it's still okay if you can get a high enough rent.

I guess many people share the same views. (Well, it doesn't hurt to price the unit at "up to 40 per cent lower than a nearby project.") There are 130 units available for that project with more than 4000 applications submitted for ballot.

Unluckily for me, I didn't get a good draw... I'm 3374th person out of 4127 to get to pick an unit. (-_-")

Saturday, November 22, 2014

1+2+3+4+5+6+...= -1/12 ?!!!

1 + 2 + 3 + 4 + 5 + 6 + ... = ?
gotta be infinity, right?

let
S = 1 + 2 + 3 + 4 + 5 + 6 + ...
S1 = 1 - 1 + 1 - 1 + 1 - 1 + ...
S2 = 1 - 2 + 3 - 4 + 5 - 6 + ...

add S1 to S1:
S1 *2 = 1 - 1 + 1 - 1 + 1 - 1 + ...
+ 1 - 1 + 1 - 1 + 1 - 1 + ...
= 1
so, S1 = 1/2

add S2 to S2:
S2 *2 = 1 - 2 + 3 - 4 + 5 - 6 + ...
+ 1 - 2 + 3 - 4 + 5 - 6 + ...
= 1 - 1 + 1 - 1 + 1 - 1 + ...
= S1
= 1/2
so, S2 = 1/4

subtract S2 from S
S - S2 = 1 + 2 + 3 + 4 + 5 + 6 + ...
-[1 - 2 + 3 - 4 + 5 - 6 + ...]
= 0 + 4 + 0 + 8 + 0 + 12 + ...
= 4 *(1 + 2 + 3 + 4 + 5 + 6 + ...)
= 4 * S
S - 1/4 = 4 * S
-1/4 = 3S

so, S = -1/12

this is nuts...

Here are the professors explaining it:

Monday, September 15, 2014

Interactive Charts from R

rCharts at
http://ramnathv.github.io/rCharts/

Monday, September 1, 2014

R 3D plot

suppressMessages(library("plot3D"))

xSeq <- seq(range(df$xSeq)[1], range(df$xSeq)[2], length.out = 30)
ySeq <- seq(range(df$ySeq)[1], range(df$ySeq)[2], length.out = 30)
## persp requires a matrix for z
zSeq <- outer(xSeq, ySeq, function(a,b) predict(glm.fit, newdata=data.frame(xSeq=a, ySeq=b), type="response"))

persp(x=xSeq, y=ySeq, z=zSeq, ticktype="detailed", theta=40, phi=15)
persp(x=xSeq, y=ySeq, z=zSeq, ticktype="detailed", theta=0, phi=0)
persp(x=xSeq, y=ySeq, z=zSeq, ticktype="detailed", theta=90, phi=0)

see http://pj.freefaculty.org/guides/Rcourse/plot-3d/plots-3d.pdf for details.

Wednesday, August 20, 2014

R: what is the column type?

str(d)

## 'data.frame':    5922 obs. of  610 variables:
##  $ Timestamp            : POSIXt, format: "2010-01-04 17:30:00" "2010-01-04 17:35:00" ...
##  $ Variable142OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable142HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable142LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable142LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable143OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable143HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable143LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable143LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable144OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable144HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable144LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable144LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable145OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable145HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable145LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable145LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable146OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable146HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable146LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable146LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable147OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable147HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable147LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable147LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable148OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable148HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable148LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable148LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable149OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable149HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable149LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable149LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable150OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable150HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable150LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable150LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable151OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable151HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable151LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable151LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable152OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable152HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable152LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable152LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable153OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable153HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable153LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable153LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable154OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable154HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable154LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable154LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable155OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable155HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable155LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable155LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable156OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable156HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable156LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable156LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable157OPEN      : num  5.6 5.53 5.52 5.48 5.39 ...
##  $ Variable157HIGH      : num  5.6 5.55 5.52 5.48 5.42 ...
##  $ Variable157LOW       : num  5.53 5.51 5.46 5.39 5.39 ...
##  $ Variable157LAST      : num  5.53 5.52 5.48 5.39 5.42 ...
##  $ Variable158OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable158HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable158LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable158LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable159OPEN      : num  289 290 290 290 290 ...
##  $ Variable159HIGH      : num  290 290 290 290 290 ...
##  $ Variable159LOW       : num  289 290 290 290 289 ...
##  $ Variable159LAST      : num  290 290 290 290 290 ...
##  $ Variable160OPEN      : num  289 290 290 290 290 ...
##  $ Variable160HIGH      : num  290 290 290 290 290 ...
##  $ Variable160LOW       : num  289 290 290 290 289 ...
##  $ Variable160LAST      : num  290 290 290 290 290 ...
##  $ Variable161OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable161HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable161LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable161LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable162OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable162HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable162LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable162LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable163OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable163HIGH      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable163LOW       : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable163LAST      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##  $ Variable164OPEN      : num  9.56 9.48 9.5 9.54 9.54 ...
##  $ Variable164HIGH      : num  9.57 9.5 9.54 9.55 9.54 ...
##  $ Variable164LOW       : num  9.46 9.48 9.5 9.53 9.51 ...
##  $ Variable164LAST      : num  9.48 9.5 9.54 9.54 9.52 ...
##  $ Variable165OPEN      : num  9.98 10.02 10.02 9.98 10.04 ...
##  $ Variable165HIGH      : num  10.1 10.1 10 10 10 ...
##  $ Variable165LOW       : num  9.97 9.99 9.98 9.98 10.01 ...
##  $ Variable165LAST      : num  10.02 10.02 9.98 10.04 10.02 ...
##  $ TargetVariable       : logi  TRUE TRUE TRUE TRUE FALSE TRUE ...
##  $ Variable167OPEN      : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
##   [list output truncated]

Monday, August 4, 2014

kdb ipc - set remote variable = value of local variable

// opening conn
h: hopen `::5000

// assigning a fixed value to a remote variable is easy
h "rv:10"

// local variable = 12
lv:12

// assigning rv = lv is tricky and needs
h ({rv::x};lv)

Sunday, August 3, 2014

kdb: how to select the max of 2 columns

with sql, we just simply

select max(col1, col2) from t

with kdb, we will need to something like:

q)select [5] from t
time sym bid ask
-----------------------------
09:30:00.386 IBM 50.12 51.69
09:30:00.754 IBM 50.69 51.02
09:30:00.871 AAPL 50.2 51.4
09:30:01.548 AAPL 50.84 51.28
09:30:01.921 GOOG 50 51.82
q)select [5] from update greater:max each flip (bid; ask) from t
time sym bid ask greater
-------------------------------------
09:30:00.386 IBM 50.12 51.69 51.69
09:30:00.754 IBM 50.69 51.02 51.02
09:30:00.871 AAPL 50.2 51.4 51.4
09:30:01.548 AAPL 50.84 51.28 51.28
09:30:01.921 GOOG 50 51.82 51.82

Friday, July 18, 2014

R ggplot coord_cartesian(ylim()) vs ylim()

Former is limited to plot display only, while the latter is on the analysis result.

Let's use boxplot as an example.

require("ggplot2")

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 3.0.3

df = data.frame(y = c(0, 50, 100, 501, 600))
median(df$y)

## [1] 100

p <- ggplot(df, aes(y = y)) + geom_boxplot(aes(x = factor(1)))

coord_cartesian(ylim) will

analyze the entire dataset to produce the plot
then, limit the plot to the range specified

p + coord_cartesian(ylim = c(0, 200))

plot of chunk unnamed-chunk-2

ylim alone will

limit the data set within the range
then, analyze the leftover to produce the plot

p + ylim(c(0, 200))

## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

plot of chunk unnamed-chunk-3

as shown above,

the warning tells you that 2 rows were left out, and
the median with ylim alone is 50, rather 100.

Tuesday, July 15, 2014

HSI daily return with kdb+q

select Date, Open2CloseReturn: (Close-Open)%Open, OvernightGap: (Open-prev AdjClose)%prev AdjClose from t

select Date, LnOpen2CloseReturn:log Close%Open, LnOvernightGap:log Close%prev AdjClose from t