Thursday, May 23

Basic of R

Objects - variables

use c() to create a collection of data and assign them to a variable.

metallicaNames<-c("Lars", "James", "Kirk", "Rob")

Dataframes

A dataframe is similar to a spreadsheet, an object that containing several objects. To combine different objects.

metallica<-data.frame(Name=metallicaNames, Age=metallicaAges)

To add column to a dataframe

metallica$chidAge<-c(12, 12, 4, 6)

To see the column names of a dataframe

names(metallica)

Create a list: a list of separate objects


> metallica2<-list(metallicaNames, metallicaAges)
> metallica2
[[1]]
[1] "Lars"  "James" "Kirk"  "Rob"

[[2]]
[1] 47 47 48 46

> metallica2[1]
[[1]]
[1] "Lars"  "James" "Kirk"  "Rob"

> metallica2[2]
[[1]]
[1] 47 47 48 46

Dates

To create date type objects

birth_data<-as.Date(c("1977-07-03", "1969-05-24", "1973-06-21", "1970-07-16", "1949-10-10", "1983-11-05", "1987-10-08", "1989-09-16", "1973-05-20", "1984-11-12"))

Coding variable

It is used to indicate different groups for participants, such as "Tablet" and "Phone" group. First we create collection of different numbers to indicate the different group, and then assign them to corresponding factors using factor().


> job<-c(rep(1, 5), rep(2, 5))
> job
 [1] 1 1 1 1 1 2 2 2 2 2
> job<-factor(job, levels=c(1:2), labels=c("Lecturer", "Student"))
> job
 [1] Lecturer Lecturer Lecturer Lecturer Lecturer Student  Student  Student
 [9] Student  Student
Levels: Lecturer Student

Alternative to create coding variables

job<-gl(2, 5, labels=c("Lecturer", "Student"))

Importing data

csv -> dataframe
lecturerData2 = read.csv("Lecturer Data.dat", header=TRUE)

dat or txt -> dataframe
lecturerData2<-read.delim("Lecturer Data.dat", header=TRUE)

to navigate to different directory we could use setwd("xx/xx")

Manipulating data

select a part of data

newDataf <- oldDataf[rows, frames]

lecturerPersonality <- lecturerData[, c("friends", "alcohol", "neurotic")]
lecturerOnly <- lecturerData[job=="Lecturer",]
alcoholPersonality <- lecturerData[alcohol > 10, c("friends", "alcohol", "neurotic")]

stack the dataframe (wide -> long)

select columns to be stacked on top of each other


satisfactionStacked<-stack(satisfactionData, select=c("Satisfaction_Base", "Satisfaction_6_Months", "Satisfaction_12_Months", "Satisfaction_18_Months"))



Sunday, October 7

Measure cache parameters

1. to use gettimeofday()
Note from unix manual, if you use it in C, you have to add "struct" before "timeval xx". Also, include <sys/time.h> instead of <time.h>, because the former is where the struct timeval get defined.

2.

Tuesday, October 2

Step by step https client/server building

Server side with apache, mysql and python scripting on ec2 ubuntu server 12.04; client side with android 2.3 and sqlite built-in. Also, both of them have secure tls connection ability.

I will go directly into the topic. Start with server side. For a basic http server on ec2 using apache, refer here. Note in this article the environment is actually different, instead of using source code to build the apache, I install apache directly using command apt-get install apache2. It surprisingly takes good care of all details and works well at least for now. The configuration is different in these two installation and I suggest you use apt-get.

When done installing, apache is already up to go. Default http configuration file is located at /etc/apache2/sites-available/default. If you fire up http://localhost you should be seeing "It works" page. This is page, index.html is located at /var/www, it serves as your site, where you could put all html files in and if your computer has an ip address, others could see your site by accessing the ip address. Another easier way to test if the server is working well is to use curl command, curl http://localhost and it will return the response of the server, which in this case the default index.html. Curl is easier to use when you want to test the response of server, you don't need any other clients to fire the request.

Now let's go into tls. I assume you already have the key file and cert file on your server. Put them into /etc/ssl/private and /etc/ssl/certs respectively, they are the default dir apache is looking at for key and cert files. Then follow this excellent doc to setup ssl module for apache. There is a default ssl configuration file you could customize: /etc/apache2/sites-available/default-ssl, it includes file directories of ssl request and so on. The default dir is the same as http connection, which is /var/www. If you put a different index.html in it, when you test using curl -k -3 https://localhost, k means accessing without any cert files and 3 is the version number of ssl protocol you are gonna use. This will give this page so that you know you are in a https connection.

Ok, now we have this ssl server working pretty well. There is one more step to go on the server side which is add the handler to deal with different requests. Now all we could request is the default page. We want more. Particularly, I need a handler that takes in a POST method, extract its data, and then put them into a table in mysql db on the server.

First to install mysql. apt-get install mysql-server mysql-client. Note, in ec2, you could directly sudo su, without typing in any password, go into root. just to make things easier, cause most of configurations and commands here need to root. Then, I install mysql python interface, python-mysqldb, you could install whatever you like, php, etc.

Then we will see how to use python script to handle http/https request. We use CGI (Common gateway interface); it is a way to make executable file like scripts request-able at client side. The default dir for cgi scripts is /usr/lib/cgi-bin. Put your scripts there and they should be immediately up for http request. Here is my echo python script:

Basically for a POST request with several key/value pairs, it will print out # of pairs and every pair. It uses a python module called cgi, and cgitb is another module to enable debugging function. Note the line 9 is necessary because it tells the server and client this is valid html text, otherwise client would probably throw out "invalid response" error. In fact CGI is not the best way to do script request, it is highly unstable when scripts get complex. But it is the easiest way to get it going. Now curl --data "key1=value1&key2=value2" https://localhost/cgi-bin/yourscript.py it will and should return the result of the script. Note --data is how you send POST request via curl.

To this point, the server setup should reach a happy ending. Now let's look at the client side. The very first thing you need to do is, in your android project, be sure to include the cert file of your server, maybe at /res/raw; it is required in tls connection. Details could be found here.

Assuming you know how to use httpClient in android, you should be already connecting your server and client. Have fun!






Monday, September 24

Apache http server on ec2

Now I get a little bit involved with server side. Have to setup a server for our projects. While the machine is not ready, I decide to first try setting up the server at aws ec2 platform to just have a taste.

I choose Apache because I know nothing about running a server and also this is the word I heard most of the time when people talk about server stuff. It is totally not hard to install it but there are some wrong turns that I took, turns that some guides would be appreciated.

Therefore, following is the walk to setup httpd on ec2.

First package to download is of course the apache httpd itself, 2.4.3 to-date. Extract it into a folder say ~/httpd. Note there is a subfolder named ~/httpd/srclib, which will be used later.

Second we got APR and APR-Util. They are required for httpd installation, which ec2 doesnt have (at least ubuntu server doesnt). Extract them and put them into subfolders ~/httpd/srclib/apr and ~/httpd/srclib/apr-util respectively. This tells httpd to install them along the way if the system does not have them already.

One side note, during the install process the system would probably ask you for root password, which you dont have if you are using a ec2 instance. Dont worry, just set it: use command sudo passwd root. Set your password you are good to go.

Before going into installation, install PCRE (Perl-Compatible Regular Expression Lib). If you are using the same server as me, just type sudo apt-get install libpcre3-dev.

Now do the old trick: ./configure, make, sudo make install. Note add --with-included-apr in ./configure so that it will look at srclib we prepare for it for apr and apr-util.

The make and install commands will take some time, so relax and waste your time on some stupid videos, like this one, which I quite like.

After installation, use apachectl -k start and apachectl -k stop to test the server. If you install correctly, when you start the server, issue curl http://localhost.com will get you the 'it works' html page, which tells you everything is good. Use locate if you cannot find apachectl.

Thanks for watching. I am talking about the video...

Friday, September 7

http/tls connection in python, android, EC2

Have a side project to build a standard tls package for the team. Ive never tried socket, so start with python just to get a feel. Following is my own experiment, to connect the server side code on EC2, and the client side code on my local laptop.

1. simple http
I use sample code from official doc. It is really simple. All you need to do other than code is configure the port for EC2 instance.

For the instance you are running, configure its security group so that the specific port you want to communicate on is open like 2727 above.

2. simple https
Things get rough with security. Basically, what I know about https, i.e. tls, is that it utilizes a public key identification system to secure the communication via http. The server has a private key, which is only known to itself. It also has a corresponding public key ready to distribute to anyone need to communicate with it. In order for the other side to trust it, the server has to have its public key certified by trusted 3rd party, called a Certificate Authority. Same with client side.

However, if we just want a connection between our own server and client, we could generate keys and certificates ourselves without paying for CA cert file. This is called self-signed certificate, or root CA certificate.


openssl req -new -x509 -days 365 -nodes -out cert.pem -keyout cert.pem


If you have openssl installed on your computer, you could use it to generate keys. In this case, I generate private key and certificate in the same file. Then I just copy it to the other side. Both sides use the same keys. Things are simpler here, for which most cases you might wanna use a more secure authority to certify for you.

Then both sides I use sample scripts from official doc again. Note for https connection you also have to open the port for ec2 instance.

3. simple https with android
With android things are bit complex with certificates. I have this cert.pem file, which is not enough for android. Bouncy Castle encryption is supported well by android, which is the one we are gonna use to generate client side key file.

First is to install Bouncy Castle. Note android is using a different version of it, version 145, not 146 from official site. Find one, download it, a jar file. Put it in the directory '/usr/libexec/java_home/lib/ext', where on mac should be '/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/ext'. Second, add following sentence into the jave.security file also located in lib folder:
Then make it on the classpath, I did this in eclipse. Done.

Having keytool in your machine, do following with the cert.pem file:
Now you will have mykeystore.bks file in raw directory. I here use a der file because android returns 'wrong version of certificate' error. To generate der file from pem:


We are almost done here. Just grab any sample code for https connection in android, using whether httpURLconnection or httpclient, put correct password and file name into place, everything should be fine now.



Wednesday, August 22

iterate, iterate, iterate!

Gather around and learn some iterate stuff again. Great great tutorials here, here, and especially here.

1. iter()
this is a function, takes in an object, return its iterator. It corresponds to the __iter__() method in the class definition. We use it as:

An iterator supports next() methods, which returns the next element in the object. Such object, that supports __iter__() and next(), is called iterable. There will be two different scenario when we call __iter__() on an object:

  1. the class itself has implemented next() and __iter__(). In this case the __iter__() would most likely just return the object itself - the object is the iterator;
  2. __iter__() returns an object of another class, which should be an iterator class. In such case, the object that calls __iter__() does not have next() implemented, only the iterator class has next().

2. iterator
As described above, an iterator is an object that supports next(). Note objects support __iter__() are called iterable, for being able to return an iterator.

Before getting to the iterator, let's meet container. Container mostly time refers to an abstract data type, meaning a collection of arbitrary objects. Lists, dictionaries, arrays could all be called containers. So what does iterator do? As far as I know, iterator is an efficient way to walk through all objects in one container.

Instead of having all objects in the memory, an iterator only call and pick up one object a time by calling next(), until it hits the end of the container, where it will throw a StopIteration error. The for statement in python also call create an iterator automatically.

The life of an iterator is limited. It could only be iterated once, after which it will throw StopIteration if next() is called.

3. generator
The common case using an iterator would be: the iterator returns an object a time, then the program does some stuff on that object until objects are exhausted. In fact, most built-in data type in Python supports returning iterator as describe here.

A generator, on the other hand, could be considered a iterator with the ability to do stuff to objects: it picks up an object a time from a container, then instead of returning it directly, it does some stuff, and return the result of this process. It compresses code when you want to do complicate stuff on objects.

What is yield?

4. yield
Most generators created so far are created using yield. Yield often appears at places where return should be sitting at. Instead of simply returning whatever it follows, yield does an interesting process: the running code will return the statement following yield when it first meet it; at this time, the yield code will be suspended, i.e. paused but all its environment and variables get to survive; when the next time the code calls the generator again, it starts where it left previous time and continue until it hits yield again; then just repeat until the container is exhausted.

The code above, every time the for statement runs, it goes into the square generator, running until yield, return the square result and get back to the for statement.

What the advantage the generator brings is values it generates are creating on the air. It is different from the case that one first create all objects and then return them all or return one each time (iterator!). Generator is dynamic. Sometimes you dont know what to create until you actually run into the situation (gimme an example!).

One last note, both iterator and generator could only be walked once. Here is a subtle bug misusing generator.

5. why bother?
Speaking of why using iterator and generator. One major concern is the case that when you have a container that has so many objects if you put them all in memory it will just overflow, or super slow. If you will only use each object once or even you only want one particular object, it is more efficient to use them. Also, iterator and generator produce more clear and concise code.

By the way, I just know this is actually the predecessor of python. Looking good.

Tuesday, July 24

Euler project: Ulam spiral

Spiral like this:

This spiral is famous because it has the fact that most of prime number are present on the diagonal position of this rectangular.

Euler problem ask for the sum of all diagonal numbers like red ones above. The prime feature could not be used because not all diagonal positions are prime number. The hint is just every turn in this spiral, the length goes like: 1, 1, 2, 2, 3, 3, 4, 4,...

Since every turn is also the diagonal, except the first one, 2, use this length growth to build the sum is easy:

Every first odd turn should be taken care of because the actual diagonal number is the one before the turn number. Also, note the last turn is not included to form the required rectangular.
Fork me on GitHub