Moran’s I in PostGIS

Just thinking out loud here.  I’ve always been bothered by the complexity of Moran’s I.  Actually, it’s not complex, it’s just math.  And, it’s really nothing more than Pearsons Correlation Coefficient tricked into a spatial context.  So, even calculating Pearsons by hand is a pain.  But really, it is nothing more than simply performing a correlation on two arrays.  In this case, the arrays are the values of adjacent features.  Nowadays, we have great tools for calculating the correlation coefficient, so if you can get two arrays representing the adjacent data, you simply wrap you query into that.  Take a look here as I revisit the Figure 15.4 of my textbook An Introduction to Statistical Problem Solving in Geography:

SELECT corr(a.pctwhite, b.pctwhite)
FROM cleveland AS a, cleveland AS b
WHERE st_touches(a.geometry, b.geometry)
AND a."OID" < b."OID"

We are simply finding those census tracts that are adjacent to one another and obtaining their respective pctwhite values.  That returns two columns, which we pass into the correlation function (corr).

The results are nearly identical to ESRI’s Morans I index.

What do our statistician friends think?

If you want to learn how to write spatial SQL code, work with Postgres, or understand statistics and geography, check out my courses here

Finding the farthest distance inside a polygon

On the Manifold GIS discussion board, someone asked about updating a polygon layer with the distance of the farthest pair of vertices for each polygon.  I thought that was a fun little conundrum to try out.  You can see my Manifold SQL answer here.

That made me think about how to write the same query in PostGIS.  It’s actually fairly easy, as the code below shows:

SELECT max(ST_Distance(A.g,B.g)) , A.gid
(SELECT gid, (ST_Dump(geom)).geom as g
FROM floodzone) AS A,
(SELECT gid, (ST_Dump(geom)).geom as g
FROM floodzone) AS B
WHERE A.gid = B.gid

Now, I am doing it for a layer called floodzone.  The first thing you have do, as illustrated in lines 3-6 is get the gid and geometry field from the floodzone.  We do this twice, because we have to fake Postgres into thinking their are two tables we are selecting from (A and B).  The ST_Dump function dumps out all the coordinates for each polygon as a geometry dump.  So, the .geom reference turns the geometry dump into actual geometries. So, at the end of lines 3 and 4, you have a table with the gid of the polygon and all the individual point geometries. Continue reading

QGIS GPU Processing Update

Today my students got to present some of their work at the Maryland Geographic Information Committee Quarterly meeting.  On Monday we head to the University of Maryland at College Park to present our work with the Computer Science Department.

If you recall, I outlined our objective for the summer here, and provided some updated information here.  Also, recall that we’ve posted the code on Github here.  We’ve made good progress this week, including modifying the code to utilize the GDAL libraries for reading the raster DEM, and also have things working as a valid QGIS plug-in as well as a command line function.  Stop by the github site and have a look at the code, or try running it on your own data.

If  you recall, my concern was that GIS processes are big on data volume, but light on calculations per data element. With that concern in the back of my mind, we decided to generate a hillshade as it appears to be more computationally expensive than slope.

Well, some early results shed some light on what is going on.   As you can see from Table 1, the CUDA version of our slope computation is around 1.5x faster than the serial version.  That is good, but nothing to write home about.  You’ll notice that the Total Time is less than the sum of the parts because we are not only using the GPU to perform parallel processing for the slope computation, but are also using multi-threading techniques to simultaneously read in data while we are ALSO performing the computations and writing out the data – so, this QGIS plug-in is doing massive GPU parallel processing and multi-core processing! Continue reading

More QGIS/GPU progress – it’s getting faster…

This summer I am working with three talented undergraduates as part of a National Science Foundation (NSF) program called Research Experience for Undergraduates (REU): Alex Fuerst (Xavier University), Charlie Kazer (Swarthmore College), and Billy Hoffman (Salisbury University).  This post continues to present their work in performing parallel processing with QGIS.

Today I want to continue discussing our project to create a QGIS plug-in for terrain analysis.  In my previous post, I discussed our goals and some early results.  Our first cut showed that pyCUDA was running about as fast as a serial C++ version of our code.  And, at that time, QGIS was still about as fast as our own serial C++ code and our pyCUDA code when generating slope.

My students decided to rewrite the code to create a scheduler to manage the I/O, CPU and GPU activities.  The schedule is divided into three modules:

dataLoader – The dataLoader takes in an ASCII  GeoTIFFinput file and sends data one line at a time to a buffer it shares with gpuCalc.

gpuCalc – The gpuCalc grabs data from the shared buffer with dataLoader and moves it into CUDA pagelocked memory. When the memory is full, the program transfers the data to the GPU and executes the CUDA kernel for the calculation.  When the calculation completes, the program writes the results to a second buffer it shares with dataSaver.

dataSaver – dataSaver gets the data from shared buffer in gpuCalc and writes the results to an ASCII file GeoTIFF.

An example of the architecture we are using is shown below:


The results indicated that the GPU was running faster for slope, but still not enough to make me too happy. It was about 25% faster, and I wasn’t going to be satisfied until we were an order of magnitude faster.

We tried running our algorithms with ESRI ASCII files because it was easier to read than a GeoTIFF. Unfortunately, the input time to read the file was horrendous (like 15 minutes!). So, we spent a little time writing the algorithm to work with GeoTIFFs (much thanks to some generous souls on who helped Alex figure out the GeoTiff read/write), and found them to run substantially faster.  Also, we decided to run the Hillshade algorithm which includes many more computations than a simple slope or aspect.  In this case, the results are shown below.

We had a breakthrough with PyCUDA, so I want to wait a another day or so to rewrite the serial version in C++, but for now, we’ll use QGIS and the terrain plugin to calculate the hillshade on a 8.3GB raster file (1.5GB as a GeoTiff):

                                                      PyCUDA                                          QGIS
Input                                                8:30                                                 2:30
Output                                             3:48                                                 5:00
Computation                                  5:09                                                 9:00
TOTAL TIME*                               7:03                                               15:30

We’ve placed a version of the code up on gitHub here.  I hope you get a chance to try it out, and maybe even collaboratively help us make this a legitimate plug-in for QGIS.

* the PyCUDA total time is not the sum of its parts because as we are using multi-threading in our code so that while we are reading data in, we are also writing data out in another thread, and also performing the computations using another thread.



6 Online GIS Courses for $30 or less

Just wanted to announce that there are now 6 online courses for learning GIS along with the coupon codes for a discounted price –  you can get all of them for $30 or less!  Where else are you going to find comprehensive online courses in statistics and geography, enterprise GIS with Postgres, Internet mapping with Geoserver, spatial SQL with PostGIS, Manifold GIS, and spatial data processing with GDAL?  Stop on over at the site and check out the courses.

The student evaluations have been incredible, so I am psyched to add more courses – let me know what courses you want me to develop next (and yes, there will be a Manifold 9 course when the product is released)?

Parallel Processing with QGIS

Once again, I am continuing my role as a mentor in a National Science Foundation (NSF) Research Experience for Undergraduate program.  This year we’ve decided to build a QGIS plug-in for terrain analysis, as it is embarrassingly parallel (slope, aspect, etc.).   We are doing four things as we generate slope for different size digital elevations models:

  1. A pure python implementation (for an easy plug-in)
  2. A serial-based C++ implementation (as a baseline for a non-parallel solution)
  3. A pyCUDA implementation (using the GPU for parallel processing)
  4. A C++ based parallel solution using the GPU

We plan to put our results on a shared GitHub site (we are in the process of cleaning up the code) so that people can start experimenting with it, and use our example to begin generating more parallel solutions for QGIS (or GDAL for that matter).

Here are some early results: Continue reading

Some comments from my open source GIS class

We are now into our 4th week of my Open Source GIS class at the University of Maryland at College Park.  We’ve covered desktop GIS with QGIS, and worked on basic meat-and-potato vector analysis (overlay, buffer, selections, etc.), spatial interpolation (TIN, IDW), cartography, and multi-criteria map algebra.

The last two weeks we’ve covered Enterprise GIS with Postgres, PostGIS, and QGIS.  The students built their own enterprise GIS, with different users, software platforms, triggers, constraints, groups and roles.  They even tested out simultaneous multi-user editing.

So, 4 weeks in, what do the students have to say?  Here are a couple of samples: Continue reading

Advanced Editing Capabilities with PostGIS

Today I want to introduce you to another one of my really great undergraduate students: Jeff Scarmazzi.  Jeff participated in the SUSRC conference that I posted about previously, but unfortunately, his photo and poster got deleted from my camera, so I never posted information about his project – I may eventually get around to posting Jeff’s poster, but for now it will have to wait.


Jeff is an exceptional student, and at this point can run circles around me with PostGIS, Leaflet, and Postgres – at the moment, I am still reserving the right to say that my spatial SQL is better than his, but that may not be true.  And, now that Jeff secured a prestigious internship with NASA over the summer, once the Fall rolls around it definitely won’t be true!

What I wanted to show you was a short presentation that Jeff gave on his work with Postgres and PostGIS in my Advanced GIS class.  For this project I had students create a geodatabase with ArcGIS that includes domains, subtypes, topology rules, etc.  Jeff decided to extend his work on this assignment and recreate the same kind of features using Postgres and PostGIS.  This presentation is just a small piece of what Jeff and his fellow students (Austin Barefoot, Carl Flint, and Jordan Roose) created, and focuses specifically on writing triggers and creating roles to automate and quality check digitizing procedures using Postgres.

The presentation is only about 15 minutes long, and you’ll be amazed to see what you can do with PostGIS to creating robust editing rules.

Warning Shameless plug: In case you wanted to learn how to implement PostGIS in an enterprise setting, check out my online course here.  Also, I have another course on how to write spatial SQL with PostGIS here.  

The Web Duel: part 3, Open Source

The Open Source Implementation

This is a continuation of Mark Balwanz’s blog posts on his creation of web mapping sites using both ESRI and Geoserver.  Today he will talk about his experience creating the site using open source technology.

After completing the ESRI version of my project (Web GIS Duel: Part 2), I turned my attention to the open source version of my project.  I have some experience with Open Source GIS from previous graduate courses and have spent some time working with Leaflet in the past, but this project was definitely going to be a learning experience for me.   Below you will find a list of all the technology I used to create this web mapping application.

Continue reading

Open Source GIS Course at UMD – the Syllabus

UMD provided me with enrollment numbers, and we are getting a pretty full class.  However, a few people on the “outside” have asked for a little more depth into what will be covered.  So, I put together a quick outline on each of our class meetings.

Keep in mind, in addition to the class meetings, there will be weekly discussions and short assignments.  Each class session is 2.5 hours long.  Those sessions colored orange are classes that I will be on the campus live (please note that this is tentative, and the dates may slide around a bit), giving the lecture and assisting students in the lab.  The other class sessions are a combination of online video lectures or in laboratory exercises.  For those with full time jobs, most of the material can be performed at home on you own devices, although being in the lab with the TA will be helpful.

Again, please feel free to email to get more information about signing up for the class. Continue reading