二次啟航: 2016

Sunday, December 18, 2016

加州景点小记

双城记 2015/07/26-07/30

和爸妈从广州飞洛杉矶12个小时。到了LA机场老爸的第一感觉就是赤地千里……我们住在杜比剧院后边的旅馆，环境不错，第一天晚上就去星光大道上溜达去了。第二天LA一日游，逛了市政厅、天主教堂、好莱坞、比弗利山庄。第三天早上坐越南大巴八个小时长途跋涉到三藩市，坐我们旁边的南海大妈成为我们的记忆笑点。当天晚上去吃传说中的岭南小馆，味道真的非常赞，还巧遇了董老先生一家人。第四天去了一趟金门大桥，中国城，晚上再去岭南小馆，还遇到爸爸的朋友。

1号路之旅 2015/12/12-12/13

早上6点老厉送我们去纽瓦克机场开会。在飞机上我看了一章的limits of thoughts，两部电影，graduate，life of Pi。下午到了机场取完车也就2点不到。直接从SFO机场杀到了Bixby Creek bridge，又继续往Big Sur方向开，夕阳实在太美了，红绿色的多肉在落日余晖下色调丰富。看完落日后返程到Monterey住一晚。第二天十七英里恰巧在沙滩上遇到狂风暴雨，记忆深刻。雨后和老同学在餐厅见面，一起参观了google，后面又去了斯坦福。
这次开会也是吃了两回岭南小馆。开会结束去了伯克利参观。

海岸之旅 2016/12/16-12/17

和曾哥、龙哥和miao姐周六上午开车前往Point Reyes国家海岸。绿色海岸，红色多肉，牛群，灯塔，真的是太美了。说实话，我更喜欢这里多过Big Sur。

晚上在小镇上逛着，去吃了泰国菜，airbnb也比较干净舒服。第二天返回三藩的路上我如愿以偿去了密尔红杉公园，和大树合影。密尔是国家公园之父。

Monday, December 5, 2016

[wget] only the directory and subfolders

wget -r -np -nH --cut-dirs=(N-1) url -P path.
N is the depth of the folder.

Monday, November 21, 2016

主题阅读和写作

这个学期上老板的seminar课，对如何针对一个专题进行学习更有体会了。我突然意识到这种主题学习方式比被动、泛泛地读书可能来得更加有效。我思考一下这样的流程也许可以实现：

想出一个话题（mind map 发散思维）
快速检索这个话题

Youtube
Ted talk
知乎、豆瓣上的书籍讨论区

找出书单、电影单检视阅读
挑出个别章节、电影深度阅读、观看
写作（博客、影评）

时间分配上，可以给每一天晚上平均分配时间。（又想起了我的健身和打坐和Jump start，哭死）周六上午是最容易无聊的时间，我应该用来做3、4、5。只可惜我现在连专业的文章和书籍都没有时间读哎。

反正呢，人生不应该虚度。未经过审视的人生不值得活！

Sunday, November 20, 2016

[CDO] CDO installation

Struggling with anaconda cdo, I try to install on my own based on this website: http://www.studytrails.com/blog/install-climate-data-operator-cdo-with-netcdf-grib2-and-hdf5-support/

Packages downloads:

Download CDO from https://code.zmaw.de/projects/cdo/files.
Download NetCDF from http://www.unidata.ucar.edu/downloads/netcdf/index.jsp. Use the C version.
Download Grib API from https://software.ecmwf.int/wiki/display/GRIB/Releases.
Download Jasper from http://www.ece.uvic.ca/~frodo/jasper/#download.
Download HDF5 and zlib from ftp://ftp.unidata.ucar.edu/pub/netcdf/netcdf-4.

Building the programs one by one:

Create a directory that will hold the installation libs and include files. In this case, it's DIR=/home/Programs/cdo-1.7.2-install.

Install zlib using

./configure –prefix=$DIR
‘make’, ‘make check’ and ‘make install’

Install HDF5 using (The most key package, affecting netcdf4 and onwards. Please refer to https://code.zmaw.de/boards/2/topics/4398?r=4399)

./configure –with-zlib=$DIR –prefix=$DIR CFLAGS=-fPIC --enable-shared --enable-hl --enable-threadsafe --enable-unsupported
‘make’, ‘make check’ and ‘make install’

Install netcdf4 using (https://code.zmaw.de/boards/1/topics/1445)

CPPFLAGS=-I/DIR/include LDFLAGS=-L/DIR/lib ./configure –prefix=DIR CFLAGS=-fPIC
‘make’, ‘make check’ and ‘make install’

Install Jasper using

./configure –prefix=DIR CFLAGS=-fPIC
‘make’, ‘make check’ and ‘make install’

Install grib using

./configure –prefix=DIR CFLAGS=-fPIC –with-netcdf=DIR –with-jasper=DIR
‘make’, ‘make check’ and ‘make install’

Install cdo using

./configure -prefix=DIR CFLAGS=-fPIC -with-netcdf=DIR -with-jasper=DIR -with-hdf5=DIR -with-grib_api=DIR
‘make’, ‘make check’ and ‘make install’

FINALLY, write DIR/bin to alias to call cdo globally

Monday, July 11, 2016

How to make ipython notebook slideshows?

1. Select the slideshow mode

2. In terminal: ipython nbconvert file.ipynb --to slides --post serve

Tuesday, June 14, 2016

[pygrads] beware of timestamp

这两天遇到一个非常二的bug。我一直都用pygrads来下载dods/opendap格式的数据，一直用得好好的，这几天用ga.export死活export不成功。pygrads一直以来都没有什么documentation, 宝宝心里苦宝宝不说…所以只好按图索骥找原因，后来发现是日期出错了。原来数据提供方挺固执的，坚持己见呢，别人时间一般都是凌晨0点，而他全都设置成中午12小时。一开始以为仅仅是extract data record函数里的条件判断没过，后来发现时间对不上会导致在pygrads里面报错才纠正过来的。

Monday, May 30, 2016

[Linux] server

1. mount the server
log in as root
$ su
$ mount /home/{hostname}
2. unmount the server
$ umount /home/{hostname}

[python] install python and R using miniconda

Two years ago, the spring semester of my first year, I was transitioning from matlab to python at that time. I am not a programming person, and even learning matlab during my undergrad was a pain for me. People who use matlab, according to a Ted talk speaker, are being conservative and taking default as it is, something like stuck in the mud. It is true, in matlab you can always click on a bottom if you forget the command, while typing in terminal (Linux/bash) is something quite different from the Windows OS and something that I can't really get used to, even for today. So it took me a long time from being comfortable with matlab to falling in love with python.

dependency:
libcom_err.so <- kerberos (conda install pykerberos)
scipy.misc.imread <- pil (conda install pil)

Thursday, May 12, 2016

Papers·使用手记（持续更新）

Papers具有令人叹为观止的综合性、流畅性，谁用谁知道，这一点就可以秒杀所有的文献管理软件。

2016/05/12 更新

关于papers使用体验的更新（用了这个软件那么久，也算是有些话语权吧）
1）其实在半年之前的过去两年里，文献自动下载功能运作非常好，之前出现的那些导入不了下载不了的问题都大大改善
2）然而半年以来，最近不少出版商（Wiley，Springer等等，甚至AMS系列）开发了一种叫做epdf的interface，还算蛮酷炫的吧，但直接导致了文献自动下载作废了。。。所有的链接打开，都是问你要打开standard pdf还是 enhanced pdf。我觉得既然拥有了自己的local library，实在没有必要在不同的出版商的世界里又搞一发，浪费时间精力不说，又有啥用呢。
我今天一怒之下仔细看了一下epdf，发现这半年时间，epdf也有很大改进了，虽然做标注做笔记等无法储存，但一些自动跳到图片和文献的功能还是比普通pdf有用的。
小结一下：

epdf是一个比pdf阅读效率更高的格式
epdf必须借助于网页
不同出版商有不同类型的epdf界面，library也应该不共享

目前我没有想到什么特别好的方法，都是被迫做出的选择。。。

在Papers里面搜索一篇文献，点开epdf界面，先浏览图和结果。

一般的文章不下载spdf，保存这个epdf的界面
重要的文章既保存epdf界面，也下载pdf，然后开始在上面做标注。

也许这是个契机——减少对spdf的执念，不要总觉得只有白纸黑字的pdf才觉得真的收录了这篇文献。

3）去除重复文献和作者，都需要手工完成。去除重复文献是show duplicate papers and then merge; 去除重复作者是到作者那个tab里，直接用鼠标拖动重复作者实现merge。依然不是很方便，但也能够忍受吧。
4）dropbox的同步问题我没再测试过

代码经验（待续）

墨菲定律。
如果代码出了问题，往往肯定是自己最不确定，写的最随便最模糊的地方出bug。就算这次不出，迟早也会出！

老板刷刷两下就判断出bug在哪并且fix掉了。
我除了目瞪口呆，五体投地之外，更多是想我应该怎么改进才是。
当数据出现问题的时候，老板想的是回溯思路，上一步到这一步，有没有问题？不要跳跃式胡思乱想。
debug就是这种思路，跑程序到出错的前一步，看是什么东西导致出错。
程序要写的易读易改，程序关键之处，如核心运算啊，time step，I/O等等要写的显眼，而且最好把主要关键步骤写在简介里，这样改起来也容易一些。

自从上了APC 524之后，我开始将代码模块化。只要是做同一样任务的东西，全都写成一个个函数；只要是读入同一批数据，全都弄在同一个class里面。这样的话，不仅调代码快，也便于成熟的代码局部复制。

最后，就是熟能生巧，多写多改。

[Python] Regrid/Remap using python/cdo/gdal

Spatial resampling or sometimes we call it regridding and remapping, or even interpolation depending on whether we upscale or downscale grid cells, is something that we did quite often while dealing with large scale datasets.

If we want to use scipy, here are the functions I found relevant, but not good enough because they are disconnected from their geographic coordinates.
scipy.ndimage.map_coordinates
scipy.interpolate.RectBivariateSpline
I use the second one and found several issues. First, the latitude and longitude seems get wrong. While the other problem is more problematic. That is the boundary is wrong.

Another way to work around is using cdo remapgrid. CDO has a wrapper of SCRIP (Spherical Coordinate Remapping and Interpolation Package), which could be found on line (Los Alamos National Laboratory). I strongly recommend use this functionality of CDO, a powerful and fast tool based on Fortran. It has bilinear, bicubic, distance-weighted average, nearest neighbor, conservative (box-average), and largest area fraction interpolations.

I haven't really looked at gdal, but I guess it takes more time to figure out the commands from the unfriendly gdal manual...

Barometric Law

Wednesday, April 13, 2016

copula

I heard of copula-bivariate statistics many times, but rarely understood what people are calculating when they talked about "fitting a copula". Hopefully I've already got some basic understanding on empirical distribution before I finished the copula homework. Now I've done the "quantiles" part (see the last log), which I assumed would be enough, maybe... My goal is to understand how copula works, by that I mean the principles, and then apply it to the precipitation-temperature case, which I've read some articles about.

First thing first, what do we use copula for, and what is copula function?
Copula describes how the two marginal distributions are linked together to form the joint distribution, and the dependency structure between marginal variables. The idea is to separate the copula fitting from the univariate probability distributions fitting. In other words, copula is used for constructing multivariate probability distribution from univariate probability distributions.
According to Sklar's theorem, there exists a "copula" function C(u, v) such that

Fxy(X,Y) = C[Fx(x), Fy(Y)] = C(u, v)

1. C(u, v) is a 2D function defined on [0, 1]^2, given Fx(X) and Fy(Y) are CDFs.
2. C(u, v) doesn't depend on the marginal distributions
3. C(u, v) is a joint CDF with marginal distributions rescaled to uniform distributions, u=Fx(X), v=Fy(Y).

What are the procedures of fitting a copula distribution? (Gao et al., 2007)
1. fit marginal distributions to each variable using some type of parametric univariate distributions: Fx(X), Fy(Y)
2. fit the copula: C(Fx(X), Fy(Y))
3. sample from the copula using Monte Carlo simulations
3.1 unconditional
(1) generate random samples of u uniformly from [0, 1]
(2) generate v|u using the inverse conditional CDF Cv|u^(-1)
(3) generate x,y
3.2 conditional:

Friday, February 12, 2016

[python] pygrib installation

pygrib is the python interface of Grib-API (ECMWF).

1. install Grib-API.
(1) Download the source for Grib-API (https://software.ecmwf.int/wiki/display/GRIB/Releases).
(2) Follow GRIB API Autotools to install Grib-API (https://software.ecmwf.int/wiki/display/GRIB/GRIB+API+Autotools+installation)
path: ~/local

2. install pygrib
(1) Originally I want to install using pip, but it failed. so I install from source
(2) Download the source for pygrib (https://pypi.python.org/pypi/pygrib)
(3) Set environmental path:
$ export GRIBAPI_DIR=/usr/local/grib_api_dir
$ export JASPER_DIR=/usr
(4) Go to the download folder:
$ tar zxvf pygrib-x.x.x.tar.gz
$ cd pygrib-x.x.x python setup.py
$ install –user
$ python test.py

二次啟航

Pages