In my current job, I needed to find some avatars for a bunch of web pages, but bigger than FavIcon size. I was at a loss until I remembered that Facebook has avatars in a whole bunch of sizes available by API. I decided to write a Python app using the OpenGraph API.

Pass the script a text file containing tab-seperated values of the website name and domain, and it will do a search for the name, and download the avatar with a name as www_domain_com.png

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# !/usr/bin/python
# Copyright 2012 Mark Perdomo
 
 
import os
import re
import sys
import urllib
from urlparse import urlparse
import csv
import time
import json
 
"""Facebook Avatar Downloader
This program, given a list of URLs, searches
for the Facebook UID of the FQDN and downloads
the avatar.
"""
 
def read_urls(filename):
    """Returns a list of titles and FQDNs from a
    file full of URLs.  Screens out duplicates
    and returns the titles and URLs in
    increasing order as a tuple."""
 
    url_list = []
 
    #CSV library allows us to quickly access tab-seperated values
    with open(filename) as tsv:
        for line in csv.reader(tsv, dialect="excel-tab"):
            # Use urlparse to cut up any urls
            try:
                url = line[1]
                o = urlparse(url)
                match = o.netloc
                #If there is a valid URL, append the title/FQDN tuple
                if len(match) > 1:
                    url_list.append((line[0],match))
            except:
                print line + '--> NOT MATCHED!'
 
    return sorted(url_list)
 
 
def lookup_uid(term):
    """Does a FB opengraph search for a supplied
    string and returns the first Facebook UID in
    search"""
 
    data = []
    search_url_head = 'https://graph.facebook.com/search?q='
    search_url_tail = '=&type=page'
    search_url = search_url_head + term + search_url_tail
 
    response = json.load(urllib.urlopen(search_url))
    if response:
        try:
           return response['data'][0]['id']
        except:
           print 'Bad JSON response'
           return 'null'
    else:
        print 'No response'
        return 'null'
 
 
def download_image(item, dest_dir, file_name):
    """Constructs an FB OpenGraph query and downloads
    the images to the directory.  Provide a tuple with
    (FQDN,Facebook uid) and the destination directory"""
 
    url_head = 'https://graph.facebook.com/'
    url_tail = '/picture?type=large'
    img_type = '.png'
    download_url = url_head + item + url_tail
 
    if not os.path.exists(dest_dir):
        os.makedirs(dest_dir)
 
    if item != 'null':
        print 'Retrieving...', download_url
        urllib.urlretrieve(download_url, os.path.join(dest_dir, file_name))
    else:
        print 'There was a problem getting ', file_name
 
def check_missing(site_list, dest_dir):
    """This function double-checks your work and tells
    you if a domain does not have an image"""
    error = []
    for item in site_list:
        file_name = re.sub('\.','_',item[1]) + '.png'
        if not os.path.exists(os.path.join(dest_dir, file_name)):
            error.append(item[1])
    if error:
        print '\n\nNo image for these domains:'
        for url in error:
            print url
    else:
        print '\n\nAll domains accounted for'
 
def main():
    args = sys.argv[1:]
 
    if not args:
        print 'Please specify a file input'
        sys.exit(1)
 
    todir = ''
    if args[0] == '--todir':
        todir = args[1]
        del args[0:2]
 
    items = read_urls(args[0])
 
 
    for item in items:
        site_title = item[0]
        site_url = item[1]
        file_name =  re.sub('\.','_', site_url) + '.png'
        if  not os.path.exists(os.path.join(todir, file_name)):
            site_id = lookup_uid(site_title)
            download_image(site_id, todir, file_name)
            time.sleep(.75)
 
 
    check_missing(items, todir)
 
if __name__ == '__main__':
    main()

File input is a txt file that looks like this:
Title Root Domain
ABC News http://abcnews.go.com/
Al Jazeera http://www.aljazeera.com/
Al-Monitor http://www.al-monitor.com/
The Atlantic http://www.theatlantic.com/
The Atlantic Politics http://www.theatlantic.com/politics/
The Atlantic Wire http://www.theatlanticwire.com/

The code uses the FQDN as a "unique identifier" so when running the script multiple times, you won't generate too many API calls. Also, I throttle the application using time.sleep(). This script saved me a lot of time, and now I only have to sort out bad matches and missing domains.

For our investments class, we had to conceive and test a trading strategy using technical analysis. As a lover of R, I decided to reference some code I had seen earlier on Modern Toolmaking via R-Bloggers:Backtesting a Simple Stock Trading Strategy. The example provided R code that was very helpful in getting me to understand the math behind the testing part (as opposed to the conceptually easy trading rules).

Only one problem stood out: our professor wanted us to use SAS!

SAS is still an industry leader and will be surely be seen in our future jobs, so I took the opportunity to try to use the concept behind the R code and translate it to SAS' language. Though I didn't have time to implement the extra features found in the excellent PerformanceAnalytics package for R, I did manage to replicate the basic rule selection and calculation of cumulative return and win/loss. So with that said, I will start with the R script I started from:

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#inspired by code here: http://moderntoolmaking.blogspot.com/2011/09/backtesting-simple-stock-trading.html
#Set important variables
ticker <- '^GSPC'
start <- as.Date('2005-01-01')
end <- as.Date('2012-04-01')
period <- 10
tradingCost <- .001
riskFree <- .035
#End set important variables
 
library(quantmod)
library(PerformanceAnalytics)
 
bias <- function(x,period){
  biasResult <- ((x-SMA(x,n=period))/SMA(x,n=period))*100
}
 
tradingRule <- function(stock,numDays) {
  close <- Cl(stock)
  volume <- Vo(stock)
  position <- ifelse(((bias(close,numDays) + bias(volume,numDays)) > 0),1,-1)
}
 
 
stock <- getSymbols(ticker, from=start, to=end, auto.assign=FALSE)
position <- tradingRule(stock,period)
 
 
underlyingReturn <- dailyReturn(Cl(stock),type='arithmetic')
 
ruleReturn <- ifelse(Lag(position,k=1)==Lag(position,k=2),underlyingReturn * Lag(position,1),underlyingReturn * Lag(position,1) - tradingCost)
 
 
ruleReturn[1:(period+1)] <- 0;
 
 
names(underlyingReturn) <- 'SP500'
names(ruleReturn) <- 'Trading'
 
charts.PerformanceSummary(cbind(ruleReturn,underlyingReturn), colorset=redfocus)

The code is minimized since it is not too important. Keep in mind that R allows us to download data from Yahoo, so I use quantmod to do that, rather than including it in a data step as I did in SAS. Here is the code minus the data step containing OHLC data to conserve screen space:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/*Set length of moving average and costs of trading*/
%let n=10;
%let cost=.001;
 
/*Find closing moving average*/
data work.GSPC (drop = s);
	retain s;
	set work.GSPC;
	s = sum(s,Close,-lag&n(Close));
	closeMA = s / &n;
run;
 
/*find volume moving average*/
data work.GSPC (drop=s);
	retain s;
	set work.GSPC;
	s = sum(s,Volume,-lag&n(Volume));
	volumeMA = s / &n;
run;
 
/*find cumulative returns*/
data work.GSPC;
	set work.GSPC;
	dailyReturn = (close - lag(close))/lag(close);
	retain cumulativeReturn 1;
	if _N_ >1 then cumulativeReturn = cumulativeReturn*(1+dailyReturn);
	if dailyReturn > 0 then assetWinLoss = 1;
	else assetWinLoss = 0;
run;
 
/*run trading strategy*/
data work.GSPC (drop=temp);
	set work.GSPC;
	retain flag 0;
	retain tradingReturn 1;
	retain tradingwinloss 0;
	temp = (lag(close)-lag(closeMA))*(lag(volume)-lag(volumeMA)); 
 
	/*Find Portfolio Return*/
	if _N_ > &n then
		if temp > 0 then flag = 1;
		else flag = -1;
	if _N_ >1 then tradingReturn = tradingReturn * (1 + (dailyReturn * flag));
	if lag(flag) ~= flag and _N_ >1 then tradingReturn = tradingReturn * (1-&cost);
 
	/*Make a portfolio to test alpha/beta*/
	portfolioValue = 1000 * tradingReturn;
 
	/*Find win/loss of our strategy*/
	if (dailyReturn* flag) < 0 then tradingWinLoss = 0;
	else tradingWinLoss = 1;
run;
 
 
proc means;
	var tradingWinLoss assetWinLoss;
run;

First of all, our data is entirely self-contained within the SAS Code. We use a data step and input cards directly, rather than using code. In the R version, quotes are downloaded directly from Yahoo Finance using an addon called quantmod. Example:

1
2
3
4
5
6
data GSPC;
	input  Date		Open        High         Low      Close       Volume;
datalines;
	1/3/05	1211.92	1217.8	1200.32	1202.08	1510800000
1/4/05	1202.08	1205.84	1185.39	1188.05	1721000000
1/5/05	1188.05	1192.73	1183.72	1183.74	1738900000

To get a moving average, we use an algorithm with a macro variable and retain a rolling selection of length &n observations. We then divide this total by &n to get the simple moving average and drop the temporary variable used to hold the rolling total:

1
2
3
4
5
6
7
8
9
10
11
/*Set length of moving average and costs of trading*/
%let n=10;
%let cost=.001;
 
/*Find closing moving average*/
data work.GSPC (drop = s);
	retain s;
	set work.GSPC;
	s = sum(s,Close,-lag&n(Close));
	closeMA = s / &n;
run;

After that, we also use the same technique to find the simple moving average of volume. After those data steps, we find the daily+cumulative returns of the underlying assets and its win/loss percentage. We do these steps all in separate data steps for ease of debugging.

1
2
3
4
5
6
7
8
9
/*find cumulative returns*/
data work.GSPC;
	set work.GSPC;
	dailyReturn = (close - lag(close))/lag(close);
	retain cumulativeReturn 1;
	if _N_ >1 then cumulativeReturn = cumulativeReturn*(1+dailyReturn);
	if dailyReturn > 0 then assetWinLoss = 1;
	else assetWinLoss = 0;
run;

Our next data step runs the trading strategy. We make the assumption that we will buy at close instantaneously if our rule is triggered at the end of the trading day. We then set a flag that equals our position for that next day. 1 for long, -1 for short, and 0 for no position. We multiply the underlying asset return times the “flag” variable to get our return, then do the same calculation of cumulative return and win/loss percentage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/*run trading strategy*/
data work.GSPC (drop=temp);
	set work.GSPC;
	retain flag 0;
	retain tradingReturn 1;
	retain tradingwinloss 0;
	temp = (lag(close)-lag(closeMA))*(lag(volume)-lag(volumeMA)); 
 
	/*Find Portfolio Return*/
	if _N_ > &n then
		if temp > 0 then flag = 1;
		else flag = -1;
	if _N_ >1 then tradingReturn = tradingReturn * (1 + (dailyReturn * flag));
	if lag(flag) ~= flag and _N_ >1 then tradingReturn = tradingReturn * (1-&cost);
 
	/*Make a portfolio to test alpha/beta*/
	portfolioValue = 1000 * tradingReturn;
 
	/*Find win/loss of our strategy*/
	if (dailyReturn* flag) < 0 then tradingWinLoss = 0;
	else tradingWinLoss = 1;
run;

Finally, we run a proc means to give our win/loss percentage for our trading rules and underlying asset

1
2
3
proc means;
	var tradingWinLoss assetWinLoss;
run;

Returns Distribution:

In my investments class, we have to produce charts and perform technical analysis. Though quantmod has the mucho excellente chartSeries() function, I can't leave well enough alone and decided to try to write some functions that will draw a chart using ggplot and add technical indicators.

I got basic functionality down, but want to continue to add things to the function. call ggChartSeries() and provide an OHLC object from quantmod, along with start and end dates in as.Date() form. It calculates moving averages and after that trims the data series, as opposed to chartSeries(), which has issues with this since it takes the pre-trimmed data as an input.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
require(quantmod)
require(ggplot2)
 
getSymbols('AAPL')
x<-AAPL
start <- Sys.Date()-200
end <- Sys.Date()
 
#Pass an OHLC object into this function
#also pass two dates formatted as.Date()
ggChartSeries <- function(x, start, end){
 
# the below is done redundantly for ease of maintenance later on
#First, strip OHLC data (need to vectorize)
  date <- as.Date(time(x))
  open <- as.vector(Op(x))
  high <- as.vector(Hi(x))
  low <- as.vector(Lo(x))
  close <- as.vector(Cl(x))
 
#Then build the data frame
  xSubset <-data.frame('date'=date,'open'=open,'high'= high,'low'=low,'close'=close)
 
#We want to construct our candlesticks  
  xSubset$candleLower <- pmin(xSubset$open, xSubset$close)
  xSubset$candleMiddle <- NA
  xSubset$candleUpper <- pmax(xSubset$open, xSubset$close)
  xSubset$fill <- ''
  xSubset$fill[xSubset$open < xSubset$close] = 'white'
  xSubset$fill[xSubset$fill ==''] = 'red'
 
#Add Moving Averages
  xSubset$ma200 <- SMA(xSubset$close, 200)
  xSubset$ma50 <- SMA(xSubset$close, 50)
 
#Trim Data
  xSubset <-subset(xSubset, xSubset$date > start & xSubset$date < end)
 
#Graphing Step
  g <- ggplot(xSubset, aes(x=date, lower=candleLower, middle=candleMiddle, upper=candleUpper, ymin=low, ymax=high)) 
  g <- g + geom_boxplot(stat='identity', aes(group=date, fill=fill))
  g <- g + geom_line(aes(x=date, y=ma50))+ geom_line(aes(x=date, y=ma200))
  g 
}
 
#call our graphing function
ggChartSeries(AAPL, start, end)

Todo list:

  • Add titles and labeling
  • Add more TA indicators
  • Tweak colors
  • Add/refine options to the function
  • Add volume bars at the bottom

Working on my personal project today, I had to figure out how to sort out a data frame, and very little straightforward information exists about doing this, so I ended up figuring it out myself from a gaggle of incomplete internet posts.

The key to doing this is using both order() and with(). Order returns the ID numbers of how the vector should look. With allows you to rewrite a new data frame.

My code ended up looking like this:

 

country.trim <- country.trim[with(country.trim, order(-income)),]

I started with my original variable country.trim which was a dataframe containing variables country, income, and debt. I desired to sort the countries by income in descending order. So what we do is sort of "play back" the existing dataframe into its sorted form. with() country.trim, we take the order(). order() will only sort in ascending order, so we make income negative. order() gives us a vector of what the new row names should be, and we play that back into the new country.trim variable.

In the more general case, this is what you want to do:

dataFrame.sorted <- dataFrame.original[ with(dataFrame.original, order(sortCriteria1, sortCriteria2,....)) , ]

Do this where each sortCriteria is one of the members of the data frame. For example, in the example of my code, I don't use order(-country$income). You can have as many sort criteria as you want, and remember to prepend a minus sign to any variables that should be sorted in descending order.

The other week, I posted a simple algorithm to figure out Aumann-Serrano riskiness. The algorithm is slow and not very inventive, so I have been brainstorming all week how to improve it.

convergence illustrated

Convergence for the calculation of A-S Riskiness for weekly AAPL returns

Since we know exactly the value we are trying to reach and the parameters of the output, I figured we could converge on the solution from both sides and arrive at the solution much more quickly.

Thus, I redesigned the algorithm to bounce back and forth between max and min values, dividing by half for each iteration. Here is the source code for my redesigned version of asRisk(). As always, feed it a vector of possible returns. Read the rest of this entry »

Recently, reading an article by Megan McArdle about income inequality, she speculated about the idea that the share of income of the 1% gets worse during a recession. She posted a graph:
Income share of the top 1% in the US from 1913
I wasn't a fan of the graph. The ticks distracted from the data being presented, and recessions were not highlighted on the graph, as they are on graphs from places like FRED. Fortunately the data from the graph is available and we can make a run of it using R and ggplot. Read the rest of this entry »

I completed a preliminary function to calculate Aumann-Serrano riskiness in R

?Download asRisk.r
1
2
3
4
5
6
7
8
9
10
11
12
13
asRisk <- function(x){
  if (mean(x)<0|min(x)>=0){
    return(0) #If expected value is  < 0 or there are no negatives, return 0
  } else {
      asNumber <- 0.00001
      total <- 2
      while (total > 1){
        total <- sum((1/length(x))*exp(-x/asNumber))
        asNumber <- asNumber + .00001
      }
      return(sprintf("%.5f",asNumber))
  }
}

To use this function, input a vector of returns. If AS risk cannot be calculated, the function will return "0". If the gambles can be used, it will calculate AS riskiness to 5 decimal points. If more or less are desired, you can change

Generally, to use this, I would recommend using one of the functions in quantmod such as weeklyReturn() or dailyReturn(). An example of this would be

?View Code RSPLUS
1
asRisk(dailyReturn(AAPL['2010']))

This example will return the AS risk of AAPL stock in 2010. Quantmod uses the TTR package which allows a lot of quick and powerful date selection.

In the future I will add errors/warnings, and maybe make precision adjustable or switch to a solver package. I also need to revise the function to meet a few more of my design parameters. This seems to be a good start and perfect for my research project next semester though!

Currently, I am need of a function that solves for Aumann-Serrano riskiness. AS riskiness is a favorite academic paper of mine, and it establishes a new measure of riskiness developed in response to the financial crisis.

AS Riskiness supposes for each gamble g, there exists a unique positive number R(g) that satisfies {\rm{E}}{e^{ - g/R(g)}} = 1

There are a few conditions though:

  • Gambles must have a positive expected value (you can't use it at Vegas)
  • Gambles must include one negative outcome (you must have skin in the game)

Read the rest of this entry »

HTC from 12-2011 to 2-2012

 

When doing research in foreign equities, I always use quantmod and R to get quotes. Google does not usually support CSV downloads of foreign quotes, but in most every case, Yahoo does. The "getSymbols()" function in quantmod is fully equipped for this, except for one crucial problem: foreign exchanges often use numbers rather than alphabetical identifiers for ticker symbols, especially in Asia. Examples of this are HTC in Taiwan(2498.TW), NCSoft in Korea (036570.KS), and Ping An in Hong Kong (2318.HK). Read the rest of this entry »

Recently, in my financial statements analysis class, I had to perform a valuation of Apple Inc. with a number of different valuation methods.  One of the things that made valuation simpler is the lack of long-term debt on Apple's balance sheet.  This simple fact means that Apple's WACC is equal to the cost of equity.

To find the cost of equity, I use CAPM, which states

E(R_i) = R_f + \beta_{i}(E(R_m) - R_f)\,

where E(R_i) is the expected return on capital, after accounting for the market risk premium.  To find the component pieces  R_f,   R_m, and \beta_{i}, I will use R with the quantmod package, and I will also use the PerformanceAnalytics Package, although I will show you how to avoid using it if you choose.

The sourcecode for the project:

?Download betacalc.r
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#Packages required
require(PerformanceAnalytics)
require(quantmod)
require(car)
 
#Here we get the symbols for the SP500 (GSPC), AAPL, and 5yr Treasuries (GS5)
getSymbols("^GSPC", src = "yahoo", from = as.Date("2008-01-01"), to = as.Date("2011-12-31"))
getSymbols("AAPL", src = "yahoo", from = as.Date("2009-01-01"), to = as.Date("2011-12-31"))
getSymbols("GS5", src = "FRED", from = as.Date("2008-12-01"), to = as.Date("2011-12-31"))
 
#Market risk R_m is the arithmetic mean of SP500 from 2009 through 2011
#Riskfree rate is arithmetic mean of 5yr treasuries
marketRisk<- mean(yearlyReturn(GSPC['2009::2011']))
riskFree <- mean(GS5['2009::2011'])
 
#My professor advised us to use weekly returns taken on wednesday
#so I take a subset of wednesdays and use the quantmod function
#weeklyReturn()
AAPL.weekly <- subset(AAPL,weekdays(time(AAPL))=='Wednesday')
AAPL.weekly <- weeklyReturn(AAPL['2009::2011'])
GSPC.weekly <- subset(GSPC,weekdays(time(GSPC))=='Wednesday')
GSPC.weekly <- weeklyReturn(GSPC['2009::2011'])
 
#Here I use PerformanceAnalytics functions for alpha+beta
#Then we calculate Cost of equity using our calculated figures
AAPL.beta <- CAPM.beta(AAPL.weekly,GSPC.weekly)
AAPL.alpha <- CAPM.alpha(AAPL.weekly,GSPC.weekly)
AAPL.expectedReturn <- riskFree + AAPL.beta * (marketRisk-riskFree)
 
#For my graph, I want to show R^2, so we get it from the
#lm object AAPL.reg
AAPL.reg<-lm(AAPL.weekly~GSPC.weekly)
AAPL.rsquared<-summary(AAPL.reg)$r.squared
 
#Lastly, we graph the returns and fit line, along with info
scatterplot(100*as.vector(GSPC.weekly),100*as.vector(AAPL.weekly), smooth=FALSE, main='Apple Inc. vs. S&P 500 2009-2011',xlab='S&P500 Returns', ylab='Apple Returns',boxplots=FALSE)
text(5,-10,paste('y = ',signif(AAPL.alpha,digits=4),' + ',signif(AAPL.beta,digits=5),'x \n R^2 = ',signif(AAPL.rsquared,digits=6),'\nn=',length(as.vector(AAPL.weekly)),sep=''),font=2)

The code is commented, but I will make some additional comments on specific sections to explain the process for those unsure. I apologize for my unstandardized variable names as well!

First of all, I use the getQuotes() function, which has a few sources. In this example, I use Yahoo data for equity data and FRED for information on 5yr Treasuries. For reference, the ticker for retrieving the SP500 on Yahoo is "^GSPC", and the FRED code for 5yr treasuries is "GS5". Other symbols should be self explanatory.

Next is the issue of regression parameters. To find alpha and beta, I use the capm functions of PerformanceAnalytics, but to find R^2 I read it out of the the regression object using

?View Code RSPLUS
1
2
AAPL.reg <- lm(AAPL.weekly~GSPC.weekly)
AAPL.rsquared <- summary(AAPL.reg)$r.squared

It is possible to do this with beta and alpha, however, I did not do this because I did not originally did not start out to find R^2, and turned to PerformanceAnalytics out of convenience.

Finally, I graphed the results and regression line for the benefit of my teacher, the results of which can be seen here:

S&P500 vs. Apple, 2009-2011

S&P500 vs. Apple, 2009-2011