When using Geomatica’s Platform Python API for custom processing of an image file (reading, manipulating, writing), it can be beneficial or even required to process the image by pieces at a time. That is to say, to access a particular subset, manipulate it, write it, then move on to the next subset of the image, and continue this until all pixels in the image are processed. This tutorial will demonstrate how simple this process is with Geomatica’s Platform Python API.
When working with large files, process all the pixels at once can cause your system to slow down as you are taking away memory resources from other applications or it may cause an out of memory error. Processing an image by iterating over a subset of pixels in the image can ensure that you do not run out of memory when processing very large files or do not negatively affect the performance of other applications that the user may be working with.
For example, you could define that files should be processed using 1024x1024 tile dimensions, which for 3 channels at 8-bit per pixel, would only require about 1024 x 1024 x 3 x 8 = 25165824 bits (3MB) of memory.
- Geomatica 2016 or later installed
- Python 101 (basic understanding of python coding concepts)
- How to access modules, classes, attributes and methods
- How to call a function
- How to use with statements
- Some experience programming with Geomatica (recommended)
- Some experience with numpy (recommended)
- Click Here to download the data required for this tutorial
To demonstrate how to process rasters by iterating over tiles or subsets of the input raster, we will perform the following: Access a Landsat-7 MS image (in .pix format), read the Landsat-7 pixels for the blue and green channels using 256x256 tiles, add the pixel values of the blue channel with the associated pixel values in the green channel and then write that tile to file. This process will be repeated until all pixels in the input image are processed.
1. Import necessary modules & setup inputs/outputs
The first step is to import the Geomatica’s python modules we will be calling in this tutorial and setting up the variables that will point to our input file(s) and output file(s).
In the above code block, line 2 is used to import numpy, which is an open source library for storing and performing computations on large complex arrays. Ultimately, the custom calculations will be performed with numpy. Line 3 imports gobs and datasource modules from Geomatica’s platform api. These modules are used to read in and write out the raster pixels, geocoding information and coordinate system.
On line 6, working_dir is a variable that contains the path to the directory where your tutorial data is stored. You will need to change this to the directory on your computer where your tutorial data is saved.
2. Open input and output datasets for reading and writing
In this step we will use the with-statement to open the input and create the output datasets, read in the first two channels, coordinate system and geocoding information, and define the tile size for processing.
On line 11 we use the with-statement and call the ds.open_dataset() function and pass the input_l7 variable as its only argument, and use the as-statement so that we refer to it as read_dataset in the following block. On the same line, we also call the ds.new_dataset() function and use the as-statement so that we refer to it as write_dataset in the following block, while passing the following arguments:
- out_file variable as its first argument, variable holding the path to the output file
- ‘PCIDSK’ string as the second argument, which defines the file format
- ‘’ an empty string as the final argument, which could be used to define file format options
On line 12, we create an object called reader_l7 by calling ds.BasicReader() class. Pass the read_dataset variable as the first argument and [1, 2] as the second argument, which defines the image channels we want to read in. Line 13 and 14 are used to get the coordinate system and geocoding information of the input file, which are assigned to in_coord_sys and in_geocode variables, respectively. These will be used later in the script.
On line 16, we create a raster object called in_raster by calling the read_raster() function and defining the dimensions as arguments. The first two arguments are the offset, which are both set to 0. The last two arguments are the width and height of the raster in pixels, which are set to reader_l7.width and reader_l7.length, which means that it should read in the entire input file.
On line 19, we define the tile size that should be used to read pixels from the input file and process them, by defining the reader_l7.tile_size attribute. In this case we set the tile size to 256 by 256.
3. Create the writer
At this point we will stay in the same with-statement block (same indent level) and create the writer object and output file.
On line 22 we create an object called writer by calling ds.BasicWriter() and passing write_dataset as the only argument. On line 25, we will define the necessary information for creating the output file by creating an object called writer_info by instantiating the class gobs.RasterInfo(). The arguments are as follows:
- width: defines the width of the file in pixels
- height: defines the height of the file in pixels
- 1: the number of channels in the output file
- DT_16U: the data type for the output channels
On line 26, we create the output file by calling writer.create() function and pass the output file information held in the writer_info object as its only argument.
4. Iteratively process each tile
Still within the original with-statement block (same indent level), we will create a for-loop to iteratively process each 256x256 tile.
On line 28, we create a variable i and set it to 1, which will be used to keep track of each iteration. This is only done to help illustrate the concept and is not required in your actual code.
On line 30, we create a for-loop, where we will loop through each tile in the reader_l7 list.
On line 31, we create a print-statement that will print the tile number and dimensions of the tile being processed, which is only done to help illustrate the concept and is not required in your code.
On line 34, we access the first element in the in_raster object (in_raster.data[:, :, 0]), which is the blue channel and assign it to the blue_chan variable. On line 35 we convert the blue_chan numpy array to a 16bit unsigned integer, using astype. Line 36 and 37 do the same thing as above, but for the NIR channel.
Note: Instead of using astype, consider using vectorize to optimize performance
On line 40, we perform the addition calculation and assign the resulting array to the sum_array variable.
Line 43 creates a raster object for the sum_array called sum_raster, by calling the gobs.array_to_raster() method. This formats the array so that it can be written to an output file by Geomatica’s platform API.
Note: array_to_raster() does not copy the values and is optimized for performance
5. Iteratively write each tile to file
In this final step, we will stay within the for-loop block (same indent level) and write the results of our addition calculation to an output file, one tile at a time.
On line 46, we call the write_raster() method to write the pixel values to the file held in the sum_raster raster object. On lines 47 and 48, we call the writer object’s crs and geocoding attributes to write the coordinate system and geocoding information from the input file (held in in_coord_sys and in_geocode variables) to the output file.
Finally, on line 50, we add a value of 1 to the variable i for counting (i += 1). This is only done to help illustrate the concept and is not required in your actual code.
When running the code, the following will be printed to screen as each tile is iteratively written to file.
After the code has finished executing you will have created a single output file called landsat7_sum.pix with the output results stored in channel 1.