Friday, July 6

Learning postgresql pt1

My graduate course database system teaches me many interesting things, like working in a push environment just like that in a real company, building web app, using AWS, but not database itself. In fact, until now I am still unable to write down a complex SQL query and assure you it is correct. Considering building web app alone requires me constantly manipulate database, its models and write queries,  it is time for me to roll back and study database from the scratch. I know MySQL is just a popular choice; but since Heroku uses postgresql, I might as well start with it.

As far as I know, it is an open source object relational database management system, using SQL. Relational database actually means database with tables. Other database systems like noSQL, does not require tables. ORDMS should work like: columns -> rows -> tables -> databases -> database cluster, which is stored on the database server.

Assuming you know what is relational database, what is "select xx from xx where xx", and other basics in database, let's begin. Most of codes below is from the official postgresql manual. It is great and easy to read, by the way.

Create a new table using standard sql:
CREATE TABLE cities (
city varchar(80) primary key,
location point
);
CREATE TABLE weather (
city varchar(80) references cities(city),
temp_lo int,
temp_hi int,
prcp real,
date date
);
view raw creattable.sql hosted with ❤ by GitHub

Every column has a type, int, varchar, real (for floating number), etc. Brackets indicate the maximum number of this type of value could be put in this column; dashing provides description.

Line 2 indicates the primary key of the weather table. Primary key is just a constraint, which is equal vent to UNIQUE NOT NULL, i.e. the value of this column should be unique and should not be blank. By indicating this constraint, the table gives us this column as the unique identifier to identify any single row in it. One table should have at most one primary key; but a key could be a combination of several columns, as long as they do not violate the constraint. Also, it is optional for a table.

Line 7 indicates a relationship between two tables, called a foreign key. A relationship is another constraint. Now weather table will refer to the cities table for available city name. For example, now if you create a row in weather table with a city name that is not in the cities table, it will throw out an error.

Now we have two empty tables. To insert values into them:
INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
COPY weather FROM '/home/user/weather.txt';
COPY weather TO '/home/user/weather.txt';
view raw populating.sql hosted with ❤ by GitHub

Note now you have to first create a city in cities table and then refer it in weather table. If you have a large amount of data, you could format it in a file and use COPY command to import them into the table at once. Similarly, use COPY * TO file you could export your data in the table to a local file. Note first is the file should be publicly accessible, and second this is only allowed in the server side, i.e. you could not use COPY in a client query.

There are various ways to query data from a table:
SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
SELECT DISTINCT * FROM weather
WHERE city = 'San Francisco' AND prcp > 0.0;
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
SELECT max(temp_lo) FROM weather;
SELECT city FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
UPDATE weather
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
WHERE date > '1994-11-28';
DELETE FROM weather WHERE city = 'Hayward';
view raw querying.sql hosted with ❤ by GitHub

They should be pretty self-explanatory. Note the 3rd and 4th queries join two tables to fetch data, which means they are querying multiple tables or multiple rows in one table at the same time. The 4th one is a special join, which means every row in the table mentioned on the left side of the join operator will be displayed at lease once. The output should be something like this:
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(3 rows)
view raw join.sql hosted with ❤ by GitHub

Note the first row, it does not meet the constraint that "city = name" but still present, because it is in the left table. There are also some aggregate functions you could use up and go like max, min, count. However they are not allowed in the WHERE clause unless it is nested in a subquery like line 15.

To remove data from your table, you could indicate rows you want to delete by WHERE clause in a DELETE FROM query like line 22. If you don't assign any constraints, it will simply delete all rows in the table.

TBD.

No comments:

Post a Comment