Bites to reality: June 2012

Thursday, June 28

Django auth pt1

Nowadays, to use google api correctly in your web applications, most likely you have to use its own authentication oauth 2.0. They say it is simple and easy; to a newbie of web dev it is not. Anyway, I failed to integrate it with my web app, thusly I decide to learn auth from the scratch, and hope one day my web app will be popular enough to demand an authentication system (I mean it!).

Django has its own built-in auth package, which is ideal to use learn the concepts. Most importantly, built-in packages like this always come with built-in views, which could save you a lot of time to get started.

Let's see how to use built-in views: really simple.

In urls.py, we define three extra patterns besides admin ones. One is for the index view, the home page, which will require login authentication to see its content. The other two utilize the built-in views for login and logout. Note for login view you have to define a template to make it go if you don't want to copy the default template from django package; for logout I here use the logout_then_login view, which is pretty self-explained. Since it will redirect to login page automatically, we don't have to add any template for this view.

Now take a look at view.py:

Two things to know: @login_required decorator indicates that this view needs authentication, if currently the user is not logged in, it will take the user to the login view we defined above; if already logged in, it will render the index view as required.

I don't include the template it uses because you could find it in django doc. Only one thing is a little bit tricky: the default login view will pass several variables to the template, including one called next. This contains the url that the app will redirect to after successful login, which should be the url triggered this login (in our case it is "/" for index view). To make the redirect happens, in the template you have to define a redirect to the next, like this if your login page uses a form: <input type="hidden" name="next" value="{{ next }}"/>.

Lastly, remember to add a link to logout url in your index page in order to complete the whole login/logout process.

All done.

Euler project 12: triangle number

I am writing this only because I find these simple math trick is like magic.
Triangle number is ones composed by adding previous number together, like the additive analog of factorial number: Tn = 1 + 2 + 3 + ... + n. Now it asks what is the first triangle number with more than 500 divisors?

There are several ways to generate a triangle number. One is use the plain equation to sum up; other one is clearly use the sum-up equation Tn = n*(n-1)/2. However, since what we basically need here is to iterate all triangle numbers starts with 1 (or some bigger yet still small number), I add a step every time using Tn = Tn-1 + n. I think latter two should be mostly at the same speed.

Another thing is to find the number of divisors for a number. This is just iterate from 1 to sqrt(n), according to many math properties. Also since divisors appear in pairs, one might add two every time it finds a fit one if one uses this method; but minus one if the divisor it finds happens to be the sqrt(n).

Codes:

Monday, June 25

GRE helper project update 1

UPDATE: I cut this project. It turns out giving her a list generated by python script is much faster. Not everything needs a web app.

I start yet another django project on heroku today. It's a rather simple one, just to help my girlfriend memorize vocabulary and prepare her GRE test.

GRE is difficult for people whose first language is not English. The test involves so many non-daily English. You have to memorize, like hardcoded in your brain, thousands of words in order to just understand what this test is talking about, not mention other requirements. A typical preparation for GRE involves at least 4 months of memorizing words repeatedly. I have been through this before, it's a very stressful and boring period of time, especially when you spend several weeks and look back, realize you don't remember one damn word. Sadly I could not turn it into a happy thing; however, I think I could make it efficient and maybe shorten its duration somehow.

What I want it to do is let you type in how many vocabulary lists (do you want to know how many words Chinese have to memorize for just a single test?), then it will generate a calendar filled with each list as task, and distributed based on a predefined memory curve. Basically a customizable repetitive task generator hooked up with google calendar. I know it is still dull, but an organized and well-executed repetition on memorizing could efficiently reduce the amount of time you need to achieve your goal. For the calendar, I don't want to reinvent such a good calendar system, besides Google calendar API has a good python support.

Start with the model. It is pretty simple and common, but I intend to add maybe a list as a instance variable so that every list could have a list of dates on which they needs to be memorized. For the progress field, which is supposed to be within 0 to 100, I add two validators from django.core.validators to prevent overflow.

I make just one page, in which you will have a input area for you to type in how many lists you have, and a display area to show the calendar. The input with the submit button will issue a http POST request to server side, along with some parameters, including the number of your lists. Note django itself disable post method automatically to prevent CSRF attack. To reenable it:

add 'django.middleware.csrf.CsrfViewMiddleware' to the MIDDLEWARE_CLASSES in your settings.py (this should be done by default in the latest version now)
right after your <form> tag that involving POST action, add {% csrf_token %}
in the view associated with this POST request, add context_instance=RequestContext(request) as a parameter in your http response.

In the view, when received a POST request, I will generate List objects according to POST parameters. Note that for django models, when you create it, django will automatically generate an unique id as primary key so that you don't have to worry about it. When you create objects for your model in admin site, you also don't have to explicitly indicate their ids. Therefore when you want to manipulate ids for your objects, you have to explicitly mention it in your constructor or whatever.

TBD...

Sunday, June 24

国内翻墙用google drive, add line numbers for gist

[I decide to keep some tips and tricks in my blog in case I forget them. So if it is in Chinese, then it means you don't have to care about it if you are not one (I mean it).]

1. 翻墙用drive
很简单，目前修改host的方法就可以。打开相应系统的host文件，在尾部添加“74.125.224.231 drive.google.com” 。

host文件位置，第一行为Windows, 第二行为Mac：

低调一点。

2. add line numbers for gist
Ok, this is handy. Solution is here, just paste the code into your css file. Also, here provides the css code to let others copy your code without line numbers.

Lab project update 3: asyncTask, db file export

This is a minor update, to add the function of exporting local db file onto SD card. The reason why we do export instead of directly dragging the db file using some file explorer app it because, when testing on device, unless you root your phone, you will not be able to access your db files using apps.

Looking through android doc (btw, newest official site looks absolutely sexy) and stack overflow, I decide to use AsyncTask class to do this background job. It looks like Service, but is easier to use to communicate with the main UI thread. Unlike Service class, in which you have to care about when to create, start, handle message, stop the thread, this class provides exact functions that wrap up those details, including one before you start the task (onPreExecute), one to do background job (doInBackground), one to update main thread if you want to (onProgressUpdate) and one to return some results after the task is finished (onPostExecute). Details could be found in the doc.

Normally, it requires to override at least the doInBackground method, also in most of the time the onPostExecute method. My second method is nothing new, just making a Toast to indicate whether file has been correctly exported. My first method:

Let's go through each step. Line 3 is to locate the db file you want to export. I use the Environment class to obtain path info. Note in android, the db file of an app is created in the path "/data/data/your.package.name/databases/your_db_name.db". The method getDataDirectory() will return the first "/data" therefore for LOCTABLE_PATH you only need to add the path after it. Line 5 gets the external dic state, which I use in line 6 to detect if the SD card is writable, defined as MEDIA_MOUNTED. If SD card is not available, I will just return a Toast and finish the task.

Line 7 calls the getExternalStorageDirectory() to obtain the SD card dir, which should be "/mnt/sdcard", you could define the dir you would like to save your db file as EXPORT_PATH. The following if statement is to check if the path you want already exists, otherwise create it. Line 11 is to create a file object at your given path. Note currently you haven't created an actual file, you just create an object and make it ready to generate a file. Also, in order to write to the external disc, you have to add following permission in your AndroidManifest.xml:

Lastly the try block is to create the file and copy your db file to it. The copyfile method should be defined by you according to what kind of file you want. I recommend just copying raw content into a .db file and then open/read it in a db browser like this. Raw file copying method in java could be found here.

Saturday, June 23

RSpec: TDD 1

Test Driven-Development, is used to write test for desired functions of your program before you write actual code. General steps would be:

write test code so that they will fail when executed (since you haven't implemented your function)
write the simplest function code to make the test pass
after passing all test, try to fill and refactor your function code

In this case, your tests would lead you through development, to write your functions. Assuming your tests are consistent with your requirement, TDD will reduce bugs in your code because they ensure that you are building the program correctly.

In Ruby, RSpec is used to do this TDD task. Actually it is also involved in BDD, that is why I think TDD and BDD should be worked together. In fact I am not 100% sure what is TDD, what is BDD. Anyway, in rspec, tests are mostly like this:

This is a test for the function to search movie titles in the TMDB. Line 3 tells us it is a test for MoviesController, a controller file. Line 4 is to indicate our desired function, to "search TMDb". Then line 8, 18, 21, sentences that started with "it" are actually desired behaviors for our function. They will represent three blocks as shown, and each of the block will contains a series of smaller test. Note by this time we don't have any codes to realize such behaviors. It looks a lot like Cucumber syntax, but somehow more concise, more focus on general i/o than detail steps. Line 5-7 is a block that would be executed every time in the following three blocks. These are similar to background steps in Cucumber. Inside the before block, we create a fake result using mock method, which create 2 fake Movie object. I think this is another play of convention over configuration because the method does not explicitly state Movie, maybe this is test for MoviesController so it automatically creates Movie object. These fake objects are used to test whether there will be a method called.

In line 8-12, it is the first test, to "call the model method that perform TMDb search". Like I say above, to pass the test, we should have a model method, also if we pass some parameters it could return something, that is basically what line 9-10 says. Note although here is a model method, but we test on our controller. Thus, we have to include a call to a model method (it even does not need to exist in models.rb) in controller explicitly. What is more, even you have a model method with this name in the model file, it will get overwritten by RSpec during the test. All we do is only to pass the test. Line 11 is the action, to make a post request with given parameters.

After the first block, you could see line 13-24 is actually a nested block contains 2 tests. Because they have common steps so we create this to avoid duplicate code. Note in both 2 tests, the requirement turns from should_receive to stub. Stub also creates a model method, but it does not require it to be called. We use it because in the latter 2 tests we don't care about whether the method is called or not ( it is the 1st test's job). In 2nd test for example, we only care about if the corresponding search tmdb template is being rendered or not. For the 3rd test, we use assign method to get whatever the program send to the instance variable @movies (again, convention), because we want to know if search results are correctly sent to the template.

At lase, to make the test run we use rspec spec_file_name, or execute autotest at the project root, so that all tests would be executed automatically every time you change codes that would affect the test result. Also make sure there is a database for test; TDD of course belongs to the test environment.

Codes are not hard. But some concepts are tricky. I will continue tomorrow.

Friday, June 22

Lab project update 2

First, I didn't convert GET to POST as I said before. Turns out it's pretty difficult to use 3rd party POST request to communicate with a django app: django has banned it for safety issues, particularly CSRF attack. It would take a while if one wants to reenable it. Maybe another time I will sit down and get it over in the future.

Today I find another thing though. So Google Places API supports sorted result, i.e., return a list of places that is sorted based on either 'prominence' or 'distance'. Sorted by distance is just what I need because I need to predict where the user is and of course the geographically nearest place is a good start. However, when you use this feature, it requires you to put in at least one of other three options: keyword, name or types. The first two of course do not fit since I don't know where the user is; the last one makes sense only if we include all types it supports (it has a list). Well, unless there is another to sort by distance, I decide to include all types. This takes me 5 minutes using regular expression. But I do not update the function in the android app, therefore places it stores are still sorted by prominence (by default).

Previously the db in my web app only has fields for time, latitude and longitude, since I am gonna find the nearest place for each pair of coordinates, I decide to modify my db to add three fields: place name, place latitude and place longitude. In this case, it is the best time to learn South. South is a db migration tool for django. Db migration is to let you modify db attributes without wiping all current data. Use db is very simple and it has a great doc support. But one thing though, heroku has three environments, every one needs its own migration, but alway you sync between them. Therefore, you have to be careful that all dbs should be in the same stage as you develop. Otherwise, things could get pretty ugly.

Last thing is I finish the web app with functions to draw the original point, the nearest place obtained from google places, remove points and their places and some nice UI from twitter bootstrap. Twitter bootstrap is such an awesome project that no one would realize its awesomeness until you render your site and play with it. Currently my web app is just look-able, I will dig more from this bootstrap later.

Thursday, June 21

Lab project update 1

Today I work on the project for my summer intern in the lab. Previously, I build it so that it could record and store user's location in the local db periodically in the background, also make it work with google places API and 4sq API. Since we are aiming at "guess" user's semantic location (like in a shop, in a campus, etc) based on their coordinates (either from GPS or cellular or wifi), although we haven't come up with a cool algorithm to guess, it would be better to draw our initial guess on the map so we could have a feel about it.

Instead of integrating a google map with the current android app, I choose to build a web app and let the android app send data to the web app so that it would draw the map. The major reason is we won't need a map, live or not, in the app; it would be overkill. The map is more for analyzing and improving our guess algorithm, not a part of implementation. Besides, I am more confident on my python skill in writing algorithms than Java.

Therefore, my second heroku app born with duty. I still use django. This app is expected to be able to receive data from http request ( generated from the android app), store into its own db, then draw markers using such data on google map. That's it. Might improve later but first let's get it going.

For the http communication part, I look into receiving http request first. It's easy in web framework because they are made to receive http request. For data transmission, coordinates, specifically, I choose to add them into the request url on the sender side. Then when the web app receives the url it will parse it and find correct data:
This method is used as the function described above. Track is the model I create to store coordinates data. After parsing, it will redirect to the home page, if necessary, because the request is only for transmitting data, not displaying web page. Yes you could yell at me because I use GET instead of POST to transmit data. But again, first I want to get it going. GET is fast to implement that is why. I promise I will fix it and return you a long long post about how good and safe to use POST instead of GET.

After this on the android app side, there are a lot of tutorials to tell you how to setup a http request in android/Java using bundled apache libs. Anyway, I insert the request function in my service, so that every time it updates the coordinates it will also send them to my web app.

So that's the first part. How to draw them on the map? after some googling I realize the fact that there is simply no easy and once-for-all effort to create a google map in python. Then I just use js API in my django app. Luckily google has a detailed dev guide. Another good news is, as long as you keep your js script within the html file, you could use django template language to call variables of your app in the js script; it will not work in an independent js file. The rest of story is easy, just do a for loop to extract each pair of coordinate out from the db and create new marker.

I am gonna test it maybe this weekend, after I make some improvements.

Wednesday, June 20

heroku migration

UPDATE: I setup a template project for django project bootstrapping on heroku; take a look and fork it.

Finally got my original site up. Phew...

Today I continue my migration work. The major problem is, again, serving static files. I am not sure if other web frameworks have such problem in serving static files too, but this one really bugs me, every time.

Anyway, searched a lot, not much helpful, stuck with it for quite some time. During this I find a weird question though, that people use different project structure for django. As default, when you run startproject command, you will create a project like this:

project_name:
|-- manage.py
|-- project_name:
    |-- __init__.py
    |-- settings.py
    |-- wsgi.py
    |-- urls.py

It creates a subdir inside your project dir with the same name to store configuration files. Somehow people tend to move them up to the project dir and delete the subdir. I am not sure why they are doing this but in my opinion it would be better to stay as the original because thus you could separate project configuration files with other miscellaneous ones like Procfile. Actually some posts I find about how to serve static files use such "flat" structure. This actually leads to a difference in settings.py. So in the file we have to indicate the path for static files, templates, media so that django knows where to find those files. You could of course hard code them if you know every bit of the path for your files, but an easier way would be using a built-in python module os:

import os

SITE_ROOT = os.path.dirname(os.path.realpath(__file__))

STATICFILES_DIRS = (
    ...
    os.path.join(SITE_ROOT, '../static'),
)

TEMPLATE_DIRS = (
    ....
    os.path.join(SITE_ROOT, '../templates')
)

The method above would return the absolute path for this particular file, then we could append specific folder name for different use. This is especially useful when you are doing web dev, because it is likely that you will have at least two environment, development and production. To hardcode for two different environment would be tedious. Now just let the script do the job.

Anyway, you notice the .. before each folder name, because I use the default project structure, my static and templates folders are outside of the folder that contains settings.py. In order to locate them correctly, I have to first go up one level in the dir, then find those folders.

The project structure is not a trivial thing because it will get messy if you have a big project with multiple apps and possibly many external apps. I also find some great tutorials about complex project structure; might need them some day.

Anyway, I keep this original structure, and find a great post about using this structure and Amazon S3 to serve static files for django project in Heroku. It is step-by-step tutorial and most important it works (at least for me). Using S3 is surprisingly simple; now I know why django users all like it. When you setup the environment, every time you add new static files, you use django command collectstatic locally, which will automatically upload new ones to your S3 bucket and serve for your heroku app. Also it will collect admin static files for you as long as you keep them in a child folder admin in your static folder.

After solving static file issue, things get faster. I copy some codes, move some files. Then, I got to say, after shutting down my site for several days, it is good to see it's back live. I mean, I know there is few people would accidentally stumble across it, but for me it is a place I build from the scratch. Now I have this free power of heroku; let's build more.

Tuesday, June 19

Learning Python: pyDoc, scope, arguments

Today's first topic is doc in python. Python provides a series of methods for us to understand objects, functions, etc. Here I write down some common shortcuts.

import sys
dir(sys)

This will display all possible attributes (including functions and data items) for one particular module. Some come with double underscore both at the beginning and the end, meaning that they are operation overloading functions.

Second is help. Use as the same as the dir but this will display all possible documentation/long comments for a module.

Function in python is really two steps: create a function object, assign it to a name. Just like normal assignments. Python also support creating function at runtime like this:

if a == True:
  x = 1
  def func2():
    # do someting

Polymorphism in Python lies in the functions: what will the function do totally depends on what type of input it takes in every time. From my algorithm course, my professor tells us most of the time an algorithm goes wrong because type mismatch: you give it a wrong type of input. Now the book tells us not to set any type limitations on your function, let it shine and raise error if something goes wrong. Theoretically to prevent type mismatch you have to be clear what king of inputs this function is able to deal with, which goes opposite of the freedom of Python. Maybe Python is trying to find the balance: instead of strictly define the type of input (like C) or take in whatever you give me, Python coders might keep a clear mind about several types of input they could afford to. But,

"If it does, it will be limited to working on just the types you anticipated when you wrote it, and it will not support other compatible object types that may be coded in the future. "

What? Anyway, this needs to be figured out sometime.

Scope is a boring topic. Everyone is trying figure out which belongs to whom. Python defines its own scope rule.

Only E makes me somehow confusing: as book says, this is for the nested def or lambda functions to find their local variables, because in some cases their locals might be outside of themselves; it will go up until it finds the first matching one.

"a function object that remembers values in enclosing scopes, even though those scopes may not be around any more. "

This behaves similar to blocks in Ruby, but little bit different. The book states that this should not be a normal case in everyday programming for Python.

Moreover, outer variables might be overrode by inner ones with the same name. Consider following case:

def hider():
open = 'spam' # Local variable, hides built-in
...
open('data.txt') # This won't open; turns into a local variable

To clearly use global variable in a local env, say in a function, one could use global statement. It is used to define that, ok, x here should be the global one, not anyone else.

Next topic, arguments in functions. Mainly about passing arguments. Python supports 4 ways of passing arguments, enabling us to pass exact number, arbitrary number of parameters with or without default values:

def func(a=1,b,c):
  #do something

>>>func(1,2,3)
>>>func(a=1,c=3,b=2)
>>>func(2,3)

def func(a, *args):
  #do more

>>>func(1,[1,2,3],"A")
>>>func(1)

Replacing * with ** then extra arguments will be taken into and group as a dictionary, otherwise list. Note if you use **, the input must be explicitly indicated with their keys.

Python has a very clear series of rules on how to use and understand each of these ways:

Assign nonkeyword arguments by position.
Assign keyword arguments by matching names/keys.
Assign extra nonkeyword arguments to *name tuple.
Assign extra keyword arguments to **name dictionary.
Assign default values to unassigned arguments in header.

At last, about how those arguments got passed - for immutable ones, only their values got actually passed into the function; for mutable ones like list, the book calls they are passed "by pointer", which means they could be changed in the function. There is no magic there, works just normal variables.

Using Google places API in Android

Google places API is a service provides information about places. Basically, it will take coordinates and other additional parameters (like city, limits, etc) as input, return information related to the input such as name, address, types of nearby places. Think about Google map and their street-view car; the accuracy of this API should not be bad. I take this as an alternative to the Foursquare API to obtain places information. Actually, I expect it outperforms 4sq because most of places in 4sq are created by users, which leads to inevitable noise.

Anyway, I look into the java library of places API since I need them in my android app. Turns out the bad news is they don't have any specific java lib; moreover, 3rd party java libs I find on github are poorly supported. The good news is, Google provides a general "Google API console" and also its java lib, google-api-java-client. Although this lib does not support places api specifically, build a wrapper using this and places api is sufficient for me. The more good news is, I find this awesome blog and its corresponding sample on github. However the blog is written one year ago thus some methods it uses are deprecated in today's new version. I modified the sample to make it compatible with the latest java client, find it on my github.

Since the blog actually tells us everything we need to know about Places API, I will just skip the basics and write down how the process works in case I forget. Generally, to build your own application using this API, you need to always look into three web pages: the blog, the java doc of Google API and the official doc of places API.

Basically there are two steps: to request places information using Places API and selective parameters, then parse the output into strings we need. First step, use HttpFactory as parameter to generate a Http request; the HttpFactory object contains a json parser and an arbitrary header. Then we use request.getUrl() to put every option we want to customize our request and send it using request.execute(). Now the second step will automatically be executed when the server returns the result. The result is in json, the json parser in httpFactory would parse it into a java model. Then the blog propose a very clever method (at least to me, a java newbie): create some classes to catch some specific strings from models. With the decorator "@Key" we could define what part of that result we want, e.g. name, types, etc. This is really efficient. Then you do whatever you want with the result.

Places API provides three search URL (I only look into search part), general search (return a list of nearby places), search detail (return the detailed information of a place), search autocomplete (predict and return places based on input). I only need the first two, but the third one is really cool: adding appropriate processing blocks, it could become a real-time prediction search just like google instant search.

Finally, there is one thing holds me back several times. The AndroidManifest.xml file. NOW REMEMBER: you have to explicitly indicate them if you add following components to your project:

permissions, internet, gps, etc;
service, including intent services;
content providers

Every time I use these components I forget to add them and then stuck with it for a while. Now it is done here and I will never make the same mistake again. Also, any networking thing (http GET/POST, for example) is not allowed to be done in the main thread; you have to use a service or similar technique.

Monday, June 18

Heroku migration pt1

I got an app work on heroku yesterday. But it is just a part of the story. Let's dig a little bit deeper.

Some useful commands. You should always use them to check the status of your app before you want to modify anything. First is to check the basic information of your app in the root dir:

(your_virtualenv_name)your_device:hellodjango your_username$  heroku info
=== afternoon-sword-7524
Addons:        Shared Database 5MB
Database Size: 296k
Git URL:       git@heroku.com:afternoon-sword-7524.git
Owner Email:   your_email
Repo Size:     85M
Slug Size:     12M
Stack:         cedar
Web URL:       http://afternoon-sword-7524.herokuapp.com/

This lists the name, add-ons, database, etc. Give you a general look.

Second is to check all your processes for this app, or as heroku calls, dynos. There are two types of dynos, web dyno, which dealing with web request; work dyno, dealing with your assignment/scheduled jobs for your app. More web dynos means better performance and faster respond when there are a lot of requests to your app; more work dynos should facilitates your work (IMO). When you create a web app on heroku it automatically assign one web dyno for you for free. After that, they will approximately charge you $35/mo for 1 additional dyno. You could see I only assign 1 web dyno for my app:

(your_virtualenv_name)your_device:hellodjango your_username$ heroku ps
=== web: `gunicorn hellodjango.wsgi -b 0.0.0.0:$PORT`
web.1: up for 39m

Third is to check the log of your web app, see what did it do:

(your_virtualenv_name)your_device:hellodjango your_username$  heroku logs
...
2012-06-18T20:23:33+00:00 heroku[web.1]: State changed from starting to up
2012-06-18T20:23:35+00:00 heroku[web.1]: Process exited with status 0
2012-06-18T20:23:42+00:00 heroku[run.1]: State changed from created to starting
2012-06-18T20:23:44+00:00 heroku[run.1]: Starting process with command `python manage.py syncdb`
2012-06-18T20:23:44+00:00 heroku[run.1]: Awaiting client
2012-06-18T20:23:45+00:00 heroku[run.1]: State changed from starting to up
2012-06-18T20:23:47+00:00 heroku[run.1]: Process exited with status 0
2012-06-18T20:23:47+00:00 heroku[run.1]: State changed from up to complete
...

Then we have heroku keys/apps/addons to check our ssh keys, all available apps and add ons for this particular app respectively. in general, it is easy and convenient to check basic information of your apps using heroku CLI tool.

Now for the django project, tutorial gives details so I will not repeat it. Just a problem I come across: work with local database using postgreSQL.

As mentioned in the tutorial, we should use postgreSQL as the production database and actually when we create the app heroku automatically create that db for us. The problem is how to make the app know this is the db it should use. Tutorial use the dj_database_url package which sadly not work for me. I find this postgresify very easy to use. It only needs two steps to set the production db:

from postgresify import postgresify

DATABASES = postgresify()

Now we have the production db but how about local development db? The postgresify seems will not detect default one. When I use it alone I get various error. According to this post, turns out I have to override the production db into local db if django finds it is in the development environment. I use the code form the post and it works finally.

Until now I still cannot restore my site on heroku; it might take longer time.

Sunday, June 17

Hello heroku

Heroku is a web app hosting platform introduced by the SaaS course I take. It's a lot like GAE, but somehow different. For now all I know is it works like a git remote, every time you update/deploy your web app you just push it to heroku remote. One of its biggest feature is scalability: you could start with free account, 5mb/app for storage, no dyno (a work unit in its world), and then choose whatever combination you like to scale up during the development.

I absolutely start with free account. Heroku has a very thorough setup post about django app. But before django I come across this blog about python development environment. It suggests to use virtual environment for every python project I do, because different project might require different version of one package. Make sense. Therefore I use this virtualenv package, followed the excellent-written blog to build up a virtual development for my django project. Virualenv is very easy to use, it comes with pip: every time you setup a completely clean environment, then add packages you want for this particular project using pip.

Done configuring the virtual environment, I start doing this heroku app. Detailed steps are in the tutorial post of heroku. During the setup I meet this problem with ssh keys. It killed me. I have an ssh key previously for github, I didn't realize that its owner has been changed into root. Then when I tried to do git push using this key I keep getting error saying Permission denied (without sudo, for ssh normally you don't need sudo). It took me a while to figure out the root problem from the fact that I could not regenerate the key. Anyway, finally I chown the key also ~/.ssh folder back to my control and regenerate the key.

After that I got a strange problem saying:

-----> Heroku receiving push
 !     Heroku push rejected, no Cedar-supported app detected

I find I misplaced my git into a subdir of my django project. Also, the requirements.txt generated by the virtualenv should be in the git dir. Now everything is done. My first Heroku app.

In the heroku tutorial I find this link to a collection of .gitignore files on github, which is used to ignore useless/auto-generate files of your project during git commit. It provides ignore files fore different languages, very handy.

Heroku use this push strategy, making me feel like coding at home, because I own the environment. Now I could install whatever python packages I like and then push them all up. Beside it has a series of very detailed tutorials on how to get started with different app frameworks also how to use heroku toolset. I think hosting on heroku is not a bad idea. Tomorrow I will try to get back my site on it.

See you AMO

Today I cancelled my tiny plan on the web hosting service A small orange. Lets sum up and move on.

To be honest, AMO provides a decent price, complete toolset, and especially, very responsive helping desk. You know the pain to set up a django project in production (and now I know too), I come across several problems during the setup; guys at AMO gave me a lot help. Thank you all.

But back to the setup, compared with webFaction, my first try, AMO is not optimized for django. It does have a step-by-step wiki for us (which is very detailed and clear), but it does not cover the whole set. I mean there is a chance you could succeed following the wiki, but situations are different in different cases. Moreover, it does not address the static file issue. Django itself does not have the ability to let the server know where to find static files it needs, js, css, pics, etc. The final solution I used is to put all static files under the default dir where I used to serve my static pages. It does not seem to be instinctive because it is separated from the django project. Anyway, I am still new to django so we will see.

At last, anyone likes to build their own website I still recommend AMO. Its tiny plan is a great start ($35/yr for 250MB storage); not to mention their super helpful support guys. Maybe when I try all services online, I will go back to them.

Friday, June 15

Learning Python 5: if and loops

Some control flows.

Before that, more about boolean. In Python, we could use and and or to connect two boolean statement like if a == 3 and b == 4. Turns out they have one more usage. If we simply type in 2 or 3, the interpreter would return 2; also for 2 and 3, it returns 3. How it works:

1. for or test, Python evaluates objects from left to right and return the first one that is True;
2. for and test, Python evaluates objects from left to right and returns the first one that is False.

This becomes handy when we have a situation if a exists then return a, else return b. Using or/and test we could reduce the code.

Moreover, Python does not support switch/case control flow, for that you have to use either multiple if/else, or a dictionary. How to use a dictionary?

a = {1:'ham', 2:'pie', 3:'cheese'}print a.get(2, 'choice is not available')

I think it's nice.

1. for/while

In Python, looks like while is never a popular choice. The book constantly states that while this problem could be solved by while it is often more efficient to use for loop. Noted neither while nor for loop require indexing, so that those who get used to use index inside the loop would be dead already. Most of the time actually programmers do not need to worry about the index details themselves, but if you need to get your hand on the index you could use range(). Also, there is a built-in function called enumerate comes to rescue:

>>> S = 'spam'>>> for (offset, item) in enumerate(S):
... print item, 'appears at offset', offset

... s appears at offset 0 p appears at offset 1 a appears at offset 2 m appears at offset 3

It turns out else could be combined with while/for loop to ease our coding job in several situations. An else block after the loop actually means, assuming there is a break statement in the loop, if the break does not get executed, i.e. the loop finishes normally, then the else block will be executed. One major scene would be setting flag to check if something happens; now it looks like this:

x = y / 2while x > 1: if y % x == 0: print y, 'has factor', x break x = x-1 else: print y, 'is prime'

Previously you have to set a flag and change it if the if statement happens, now it turns into an else outside.

2. iterator

Iterator, an object with a next method to traverse a certain container. Iterator in reading file:

for line in open('your/file', 'r'):
print line

This is considered to be the best way to read a file in book, because it will not load the file into the memory.

Dictionary comes with iterator like D.keys(); it is also a iterator itself: for keys in D. Other iterators like list comprehension, sorted(), any(), all() method, map and zip function, etc. From the book, I am not so sure about what is an iterator and what is not. Would update this one later.

Learning Python 4: other types, expressions

There are some other types, sets, files, etc.

For files, you could open/read/write/close as usual. Just a note that, you have to convert everything you try to put into a file into string before you put them in, including your objects. And also when you read them from the file, they are strings. There is a way to maintain their characters: use pickle module. Say you have a list a = [1,2,3] and you want to store it in a file:

import pickle
aFile = open("path/to/your/file","a")
pickle.dump(a, aFile)
b = aFile.load(a)

Then b is still a list. What does it do is taking care of this conversion so you don't have to worry about; it's called object serialization.

Then there is boolean. Actually in Python, every object (so does every variable) has a boolean property: they are interpreted as either True or False. Two general rules: (1). numbers are true if nonzero; (2). other objects are true if nonempty. As a result, in a block started with if "", nothing would be executed in it. However, "" == False will return False because "" is not False; it is just interpreted as False. It turns out that bool, the boolean type of Python is a subclass of int, therefore if you use 1 == True, the interpreter will return True.

Overall, here is a type hierarchy I captured from the book:

Then about syntax and expression in Python, one thing is the indentation stays in every block, which means if one block of yours use tab as indentation and another use space it is still fine. To be honest, after I know this rule exists, I follow it whenever and wherever I could. It just builds more readable code.

For expressions, there are some new tricks: A < B < C, equivalent to A < B and B < C; a += 1, which is faster that a = a + 1 for a will only be evaluated once in the former.

Thursday, June 14

Learning Python 3: String&Tuples

Short one. Strings and tuples.

1. String

Backslash is used to escape some characters like newline, tab or single quote. Such mechanism could be turned off by adding letter r before your string. Double quote and single quote here are really doing the same thing, to create a string.

Most of us use triple quote to write long comment. But triple quote is really used to create a multi-line string. Therefore, when you write down a comment using triple quote, although it would be ignored during execution, you actually create a string object in the environment.

You thought a dynamic language like Python would something like automatically converting one of those two variable in the operation "43" + 9, no; Python is very strict on type actually. Although sometimes there are scenes like using + to concatenate strings, but that are also type-stricted, but in a higher level. So in Python, built-in data types are actually classified once more into three major categories:

Numbers: Support addition, multiplication, etc.

Sequences: Support indexing, slicing, concatenation, etc.

Mappings: Support indexing by key, etc.

Now it becomes clear, all types in Sequence category, list, string, support those operations. Turns out this strictness on type is actually a major characteristic of Python.

Since string is immutable, to change it you have to use methods like slicing, concatenation or method replace. Or you could turn string into list (list), change it, then convert it back (str).

2. Tuple

Tuple is created, partly because the creator of Python believes this is a valid math model, partly because we want a immutable list. Yes, tuple acts just like list. Only one thing, it is created by () instead of [].

========
I feel this process is great. I learn some basic stuff while not waste time on things I already know. Also because of being somehow familiar with Python, I find reading is not a tedious thing for this book. However today I come across this question about how to level up from an apprentice to a guru in python in stack overflow. It offers several good ways to advance in Python, which I think I would possibly choose one or two to do in the future when I also done this book.

Learning Python 2: List&Dictionary

THEY ARE MUTABLE. Big difference. So some methods used on them would change the value of themselves instead of creating new objects.

1. List

slice method (a[1:2] = [4,5]) could be treated as two parts: (1). delete the subset referred by the left side; (2). items on the right side are instead into the place where elements are deleted. But this is not the real case, because if it really happen in sequence, a[2:5]=a[3:6] would be invalid.

append method only accepts single object. What this matters is even if you use something like a.append([1,2]), Python will only insert the list as a single object to a instead of add two elements. To achieve the goal of appending a series of elements, one could use concatenation directly. Since List is mutable, methods like append and sort would not create new objects, thus when you call them Python will only return None.

Python builds these data structure to immensely reduce the work people need to do. For example, when we have list, we could implement a stack simply using append and pop method. For a queue, append and sort(0) is enough. Bye bye low level.

Note there are several ways to remove one or more elements in a list: a.remove(value), del a[index], a.pop(index=-1), a[offset_left:offset_right]=[].

2. Dictionary

Dictionary utilize the key/value pair instead of index to manipulate data. Dictionary is used in many many fields where Python could apply to, such as cgi.

Dictionary is unordered, every time you print out a dictionary its items might be in a different order. The book says this is for performance, fast lookup needs keys set in randomized memory. Make sense, like hashing. Anyway, items in dictionary will never have order by nature.

You could explicitly call the key list or value list of a dictionary by a.keys() and a.values(). But if you use it directly in places like a loop, for element in a, it will return its keys one by one.

Dictionary also support pop method, but calling it on a dictionary will delete the key/value pair and return the value.

In order to be the key in dictionary, a variable should be immutable, i.e. the object it refers to has to be types other than list or dictionary. But the value of dictionary could be anything, since it is mutable.

There are two novel usage of dictionary, one is to create dynamic list, the other is to represent sparse data structure. List in Python, though supports dynamically lengthen itself by append method, could not accept append to a place that is 2 or more steps than its end, like for len(a) = 10, you could not assign a[100] = 1. This is could be just like what dictionary is doing. Also, suppose you have sparse matrix, even if it could be done in list, the visualization would be a mess with tons of zeros. However, use dictionary you could just assign a[(2, 4, 5)] = 1, and then assign any other indices to be zero using if...else, All zero elements would not display in this structure, pretty clean.

BDD and Cucumber

Behavior Driven Development (BDD) to me is like we develop the software to achieve some specific behaviors that are derived from customers requirements. It is popular because it clears the goal of development: what you need to do is to make the software behaves as required.

There are generally three steps.

1. engineers and customers work together to create user stories. They often look like this:

Feature: Add a movie to Rotten Potatoes

As a movie fan [a kind of stakeholder]

So that I can share a movie with other movie fans [achieve some goals]

I want to add a movie to Rotten Potatoes database [by doing some tasks]

They focus on "what could the software do" but with more details. Actually there is a theory called SMART user story that requires participants to create stories that are Specific, Measurable, Achievable, Relevant and Timeboxed.

2. expand user stories into steps that involve several scenarios and lo-fi UIs. The former is specific to the Cucumber, the BDD tool used by Rails. In Cucumber, an user story is defined by several steps like:

Feature: User can manually add movie

Scenario: Add a movie

Given I am on the RottenPotatoes home page

When I follow "Add new movie"

Then I should be on the Create New Movie page

When I fill in "Title" with "Men In Black"

And I select "PG-13" from "Rating"

And I press "Save Changes"

Then I should be on the RottenPotatoes home page

And I should see "Men In Black"

A feature could behave differently in different scenarios, that is why we could have multiple scenarios in one feature. Lo-fi UIs are sketches to achieve two goals: (1). to obtain a look for this feature; (2). to connect this feature with others.

3. Iterate cucumber and improve your code until it accepts all steps.

For the course SaaS, it actually teaches us BDD before TDD, result in that we are building features based on what we code instead of building code based on features. Therefore in this post, I only write about how cucumber works.

Cucumber is actually a gem that needs to be included in your Rails project. By including that we will have a new directory called features, in which we store .feature files. Features directory has a subfolder called step_definitions, in which we store .rb files that define every step we will take. The step definition uses regular expression to catch a step in feature file, and then define the acton to test this step.

Some notes:

1. At the first time to execute cucumber, you have to create a test database by rake db:test:prepare. Cucumber runs in the test environment of Rails.

2. To reuse and compress the code, when different scenarios have several steps the same, they could be extracted into a new section called Background under that feature.

3. In Cucumber you are not dealing with objects and data you created for production or development environment anymore. Every time it runs it generates data it needs.

4. To debug in cucumber, set debugger flag as usual in rb files. In debug mode, use page.body, which is the HTML page generated by Capybara (the test tool used by Cucumber) to check your data instead of model object.

Wednesday, June 13

Learning Python 1: Internal, Object, Types

Having used and loved python for 2+ years, my python knowledge is still missing a lot of basic points. Upon this uncertain intersection of job hunting and continuing PhD pursuit, I decide to learn it again, trying to build a solid foundation. btw I move my blog from tumblr to blogger, simply because I realize tumblr is really a place to share pictures, vids, not long, dull self-rant. Blogger is just simple more like a blog.

1. Internal

The way python works is different from compile/static language like C. This is a great illustration:

There is a Python interpreter to do this whole job; this interpreter, in the major Python version we use, which called CPython actually, is written in C. It will first compile .py file into .pyc file, called compiled python source. Then it reads this file in the Python Virtual Machine (PVM). The recompilation is to speed up the execution, like cache. The thing here is, although the interpreter also compiles the file, it only compile it to a byte code level, which then will be read line by line in using PVM. Compared with C, which compiles C code into machine code, directly read by chips, python loses its speed race mainly because of this.

2. Everything is object

Python has this similar idea with Ruby (looks like I am an ace of ruby trying to learn python, sadly it is not the case). In Ruby, if I remember it correctly, everything you use is an object. a.b is just calling method b on object a, even a = 3 is calling method = on object a passing argument 3 (I thought this was pretty cool when I saw it first time). Ruby is like an extreme case: in my world, only objects and their methods exist.

However python is little bit different. For a = 3, what python does is actually: (1). create an object (also allocate some memory) to represent value 3; (2). create the variable a if it does not already exist; (3). point a to the object 3. Now the difference is a is not an object, but a pointer that pointing to an object. Similarly, b = [1,2,3,4], [1,2,3,4] is a list type object, b is just a variable pointing to this object. So in python when you type a = 3, you are not calling a method, but linking them. But still this gives Python a lost of dynamic, such as you don't declare the type of a variable; it will only be decided by the object that the variable is pointing to.

The thing is, when you do something like this:

a = 1233445678
a = 'spam'

Now the variable is pointing to the new object, where did the first object go? The answer: reclaimed/cleaned by the auto garbage collection mechanism in Python. What this mechanism does is, for every object created, it sets a counter to count the number of pointers (variables) the object has. When the counter becomes zero, it immediately got erased and the memory space it takes up is reclaimed. But here is an exception, Python allows objects whose value is small integers and strings cached in memory instead of being erased, so that they could be reused. So even their counters drop to zero they will still be there. Considering those objects will be used more often, it is reasonable to do so.

There is a thing called share reference that worth mentioning. It is also a result of pointers.

From above you can see, until the second line of code, a and b are still referring to the same object 3, but when we assign a new object to a, b is still pointing to 3. Also, if you do something like this:

a = 3
b = a
b += 2

b is turing into 5, but a is still 3. This is just like you have one object 3, after 3 lines of code, you have 2 objects, 3 and 5, pointed by a and b respectively.

Lastly, one more thing about object. There are two ways to compare two variables in Python: == and is. First one compares if their value is the same; latter one compares if they are pointing to the same object. Thus, you would expect following situations have two different result:

L = [1, 2, 3]
R = [1, 2, 3] or R = L
L == R
L is R

However, if L and R refers to small integer or string, then in both cases == and is would return True. It is because small integers and strings are saved so that for one value there will be only one object.

3. Built-in types

Numbers, Strings, Lists, Dictionaries, Tuples, Files and others. Here only lists and dictionaries are mutable, i.e. when you manipulate them, you are actually changing the value that variable is referring to; otherwise python would just create a new object to hold the new value obtained from your manipulation. Even the integer, e.g. 3, is immutable. Say you have a = 3, if you type in a = 5, object 3 is still there, you just create a new object 5.

Python has a mechanism called dynamic typing. It is because variables themselves do not have types, only objects do. So you don't declare what type of that variable is; the interpreter will only check the type of variable when it executes assignments, like a = 3, it treats a as integer.

But Python has a strict rule on what methods could be used by which type of objects. You cannot call methods that does not belong to this type. Some methods span across several types, like [], +, etc. But types have their type-specific methods and you cannot use them on other types.

Tuesday, June 12

Bites to Blogger

Is this a new home?

Bites to reality