Array.fromfile (pylab) is a intrinsic package to load the binary file in the local disk, eg. *.bin file.
What it does is read machine data into arrays.
(1) read file
Before you read your binary file, you should figure out what format it is.
I am not an expert at data types (check it out: Numpy Data Types). There are many ways you can go and test the format.
In my case, I directly open the file (matlab, python, or any fast program) and check the machine values. The first thing to look at is whether it is string, interger, or float (double, single). After that, I can estimate whether it is 8, 16, 32, or 64 bit according to the range of the data, or file size.
(A) It becomes rather easy as long as you know the format. Say it is float32,
> from pylab import *
> file = fromfile("filename.bin", dtype = 'float32')
# the extension doesn't matter, it can be bin, dat, whatever, even no extenstion.
(B) You can also do some list comprehension to process your data simpler and nicer.
In: file = [fromfile("filename_%s_%s.bin" %("tas", str(year)), dtype = 'uint16').reshape(12,180,360)[3:5,:,:] for year in xrange(1900,2001)]
Out: [array([...],dtype=float32), ..., 101 of them]
The example shows how to read the global temperature data from 1900-2000 among the period of April to June. So this becomes a list of arrays, maybe it is not what you want.
> df = np.hstack(file)
which can simply do the combination of all the arrays in the list into a multi-dimensional arrays.
(C) We can still use traditional loop to read the data.
for var in forcing:
for year in xrange(styr, edyr+1):
input = fromfile('%s_%s.bin' % (var, str(year)), 'float32').reshape(-1, 180, 360)
input = input[::-1]
# use concatenate, or append
data = np.concatenate((data, input))
# append
data = np.append(data, input)
for year in xrange(styr, edyr+1):
input = fromfile('%s_%s.bin' % (var, str(year)), 'float32').reshape(-1, 180, 360)
input = input[::-1]
# use concatenate, or append
data = np.concatenate((data, input))
# append
data = np.append(data, input)
(2) write file
2. NetCDF file: use netcdf4 package
(1) Install
Use anaconda to install package for netcdf4:
$ conda install netcdf4
(2) read file
> file = Dataset("path/filename.nc").variables["variable_name"][:]
This method will automatically fill out the missing values.
So far, I use the numpy method, "file.data" to get the original data, and then set the original missing value to np.nan.
> file = Dataset("path/filename.nc").variables["variable_name"][:].data
> file[file==-9999] = np.nan
3. String TXT file:
3.1 readlines
3.2 pandas read_csv
3. String TXT file:
3.1 readlines
3.2 pandas read_csv