Scotland GIS tour – please join me!

cam_about-us-1_old_college_0I’m getting very excited about my upcoming trip to Scotland.  I’ve got 3 talks scheduled:

  1.  A seminar on spatial SQL at Stirling University (October 31)
  2. A lightning talk at the 6th Annual QGIS User Group Meeting (November 3)
  3. A seminar on parallel processing at Edinburgh University (November 4)

The Seminar at Stirling University is a talk I’ve given in the past related to spatial SQL being a language for geographers.  However, it will be focused more on biology and natural resources as it is more fitting to the audience.

I’m really excited to attend the QGIS User Group Meeting – looking over the topics, I think this is going to be one of those conferences that I get more than I give.   My talk will be more technical and get into some of the code we developed to create a QGIS Plug-in using GPU technology for terrain analysis.  We made some great progress, but we also discovered we have a lot more to learn :-)

The talk at Edinburgh University should be a fun one.  I’ve titled it Exploring the potential of using your teenager’s gaming computer as a high performance computing (HPC) GIS workstation, and it will cover not only the GPU work I’ve been involved with, but also multiprocessor work I’ve done with Spatial Hadoop and the python multiprocessor object.  The seminar is free and open to the public, and I understand there will be drinks afterwards.

I’ve also scheduled some time with a couple of Professors at St. Andrews University on November 1.

If any of my GIS friends are near the UK during that time, it would be great to get together.  I think I’m also going to take advantage of my down time (although if I agree to any more talks, there won’t be much down time left!), and do a lot of sight-seeing in these beautiful cities.

Finally, I don’t know if the new version of Manifold will be out by then, but if it is, I will try to squeeze in some time to talk with people about it.

Do video lectures help students – Yes!

Have you ever sat in a lecture and been totally confused?  I know I have.  I wished the Professor would either slow down, or that I had a time machine to revisit the lecture.  That made comprehension really difficult.

Recently, I’ve cut down on my peer review publications, and focused more on undergraduate teaching materials.  This included a very readable and applied textbook in statistics and geography for under $50, and a supplementary workbook for under $20.

Last year I decided to create online video lectures for my course Statistical Problem Solving in Geography  through Udemy.  These are abbreviated lectures for all of my book chapters, and they also include hands-on demonstrations of certain calculations.  I offer these lectures free of charge to my students at Salisbury University so that they can supplement the class lectures (I warn them that they are not an excuse for skipping lecture!).  You can see examples of the lectures here.

Over the last 6 years, the first exam in my course is a bit of an eye opener for students.  The class average had ranged anywhere from 62% – 71%.  For the last two semesters, I have provided my students with the supplementary lectures, and the average for the first exam was 82% and 84%!  That is a huge improvement.

Also, what I found when giving my final practicum, these sophomore level undergraduates were able to perform multivariate regression analysis where they not only knew how to eliminate dependent variables that were not statistically significant, they also were able to identify multi-collinearity and using the sequential sums of the squares, determine which explanatory variable should be tossed out – to be honest, I couldn’t even do that in graduate school!!

So yes, this is a great way to supplement student learning, and I really believe that understanding statistical principles is critical for any geographer wanting to perform quantitative analysis or even GIS.

* By the way, if you are interested in learning the material covered in my college level quantitative geography course, you can take Statistical Problem Solving in Geography for the same $25 price I offer it to college students outside of my University by clicking here (the course is normally $75, and I often provide coupons for $30).   You can check discounts out my other online geospatial courses here.

Breaking a vector file up based on a unique attribute

It’s been awhile since I’ve done any Manifold stuff, so I thought I would show you how to break up a vector file based on a unique attribute.  A question was posted on the Manifold GIS forum  asking how to create new drawings (for those of you in an ESRI world – think shapefiles) based on the unique attributes in the database.

This is pretty easy to do, and I think might work really nicely within a Python paradigm (especially if you use psycopg2 that allows you to connect to Postgres).

 

Sub Main
  Set roads = document.ComponentSet("D")
  Set qry = document.NewQuery("Q",true)
  Set qry2 = document.NewQuery("Q2",true)
  qry.text = "select tp from D group by tp"
  for each rec in qry.Table.RecordSet
    thetype = rec.Data("tp")
    Set newdrw = Document.NewDrawing(thetype,roads.CoordinateSystem,true)
    qry2.text = "update D SET [Selection (I)] = true WHERE tp = " & chr(34) & thetype & chr(34)
    qry2.runEx true
    roads.copy(true)
    newdrw.paste
    roads.selectnone
  Next
End Sub

So what’s happening here?  Lines 1 and  17 just start and stop the subroutine.

Lines 2 – 4 set up three objects: a drawing called “D” that we set as a variable called roads; a query component called “Q”, and another query component called “Q2”.  In Manifold, a query component allows you to write SQL queries.

Line 5 – this is a basic SQL query where we are selecting a field called “tp” that has names of the road types (i.e. Primary, Secondary, etc..).  The GROUP BY statement basically just returns the unique road type names.  So, the result would be a table with the unique road types.

Line 6 – This is the start of a for loop.  when we say qry.Table.RecordSet, what we are actually doing is retrieving an object that is the result of the query we issued.  So, we are going to loop over that list of names.  The object we return is called rec and that represents the individual record in the list.

Line 7 – now that we are in the loop, the first thing we do is grab the first unique road type by getting the data from the rec object.  So, for that record, we grab the data value for the “tp” field.

Line 8 – This line creates a new drawing, using the name we got from the record in line 7, and we assign it the coordinate system for the original file we are working on.

Line 9 and 10 – This line creates a query that select all the records in the original drawing that have the unique name in the “tp” field, and then runs the query.  The result is an updated table where the selected records are highlighted.

Lines 11 – 13 – Here we do a little cut-and-paste gymnastics.  Line 11 copies the selected features in the original drawing (remember, these are just the features that have the “tp” name we are interested in for this part of the for loop.  Line 12 pastes those features into the new drawing we created in line 8, and line 13 unselects the lines.

Line 14 – When we get here, we’ve just created our first new drawing with the features that match the first road type.  The Next directive sends us back to the beginning of the for loop and we then process the next road type name.

That’s it.  It is really easy with Manifold scripting.  I’d post the Python solution, but I’m teaching GIS Programming this semester, and I think I’ll leave that to my students to do!

If you are interested in learning how to program with Manifold, Postgres/PostGIS, or build an enterprise GIS, check out my online courses here.

 

 

Moran’s I in PostGIS

Just thinking out loud here.  I’ve always been bothered by the complexity of Moran’s I.  Actually, it’s not complex, it’s just math.  And, it’s really nothing more than Pearsons Correlation Coefficient tricked into a spatial context.  So, even calculating Pearsons by hand is a pain.  But really, it is nothing more than simply performing a correlation on two arrays.  In this case, the arrays are the values of adjacent features.  Nowadays, we have great tools for calculating the correlation coefficient, so if you can get two arrays representing the adjacent data, you simply wrap you query into that.  Take a look here as I revisit the Figure 15.4 of my textbook An Introduction to Statistical Problem Solving in Geography:

SELECT corr(a.pctwhite, b.pctwhite)
FROM cleveland AS a, cleveland AS b
WHERE st_touches(a.geometry, b.geometry)
AND a."OID" <> b."OID"

We are simply finding those census tracts that are adjacent to one another and obtaining their respective pctwhite values.  That returns two columns, which we pass into the correlation function (corr).

The results are nearly identical to ESRI’s Morans I index.

What do our statistician friends think?

If you want to learn how to write spatial SQL code, work with Postgres, or understand statistics and geography, check out my courses here

Finding the farthest distance inside a polygon

On the Manifold GIS discussion board, someone asked about updating a polygon layer with the distance of the farthest pair of vertices for each polygon.  I thought that was a fun little conundrum to try out.  You can see my Manifold SQL answer here.

That made me think about how to write the same query in PostGIS.  It’s actually fairly easy, as the code below shows:


SELECT max(ST_Distance(A.g,B.g)) , A.gid
FROM
(SELECT gid, (ST_Dump(geom)).geom as g
FROM floodzone) AS A,
(SELECT gid, (ST_Dump(geom)).geom as g
FROM floodzone) AS B
WHERE A.gid = B.gid
GROUP BY A.gid

Now, I am doing it for a layer called floodzone.  The first thing you have do, as illustrated in lines 3-6 is get the gid and geometry field from the floodzone.  We do this twice, because we have to fake Postgres into thinking their are two tables we are selecting from (A and B).  The ST_Dump function dumps out all the coordinates for each polygon as a geometry dump.  So, the .geom reference turns the geometry dump into actual geometries. So, at the end of lines 3 and 4, you have a table with the gid of the polygon and all the individual point geometries. Continue reading

QGIS GPU Processing Update

Today my students got to present some of their work at the Maryland Geographic Information Committee Quarterly meeting.  On Monday we head to the University of Maryland at College Park to present our work with the Computer Science Department.

If you recall, I outlined our objective for the summer here, and provided some updated information here.  Also, recall that we’ve posted the code on Github here.  We’ve made good progress this week, including modifying the code to utilize the GDAL libraries for reading the raster DEM, and also have things working as a valid QGIS plug-in as well as a command line function.  Stop by the github site and have a look at the code, or try running it on your own data.

If  you recall, my concern was that GIS processes are big on data volume, but light on calculations per data element. With that concern in the back of my mind, we decided to generate a hillshade as it appears to be more computationally expensive than slope.

Well, some early results shed some light on what is going on.   As you can see from Table 1, the CUDA version of our slope computation is around 1.5x faster than the serial version.  That is good, but nothing to write home about.  You’ll notice that the Total Time is less than the sum of the parts because we are not only using the GPU to perform parallel processing for the slope computation, but are also using multi-threading techniques to simultaneously read in data while we are ALSO performing the computations and writing out the data – so, this QGIS plug-in is doing massive GPU parallel processing and multi-core processing! Continue reading

More QGIS/GPU progress – it’s getting faster…

This summer I am working with three talented undergraduates as part of a National Science Foundation (NSF) program called Research Experience for Undergraduates (REU): Alex Fuerst (Xavier University), Charlie Kazer (Swarthmore College), and Billy Hoffman (Salisbury University).  This post continues to present their work in performing parallel processing with QGIS.

Today I want to continue discussing our project to create a QGIS plug-in for terrain analysis.  In my previous post, I discussed our goals and some early results.  Our first cut showed that pyCUDA was running about as fast as a serial C++ version of our code.  And, at that time, QGIS was still about as fast as our own serial C++ code and our pyCUDA code when generating slope.

My students decided to rewrite the code to create a scheduler to manage the I/O, CPU and GPU activities.  The schedule is divided into three modules:

dataLoader – The dataLoader takes in an ASCII  GeoTIFFinput file and sends data one line at a time to a buffer it shares with gpuCalc.

gpuCalc – The gpuCalc grabs data from the shared buffer with dataLoader and moves it into CUDA pagelocked memory. When the memory is full, the program transfers the data to the GPU and executes the CUDA kernel for the calculation.  When the calculation completes, the program writes the results to a second buffer it shares with dataSaver.

dataSaver – dataSaver gets the data from shared buffer in gpuCalc and writes the results to an ASCII file GeoTIFF.

An example of the architecture we are using is shown below:

gpufig

The results indicated that the GPU was running faster for slope, but still not enough to make me too happy. It was about 25% faster, and I wasn’t going to be satisfied until we were an order of magnitude faster.

We tried running our algorithms with ESRI ASCII files because it was easier to read than a GeoTIFF. Unfortunately, the input time to read the file was horrendous (like 15 minutes!). So, we spent a little time writing the algorithm to work with GeoTIFFs (much thanks to some generous souls on gis.stackechange.com who helped Alex figure out the GeoTiff read/write), and found them to run substantially faster.  Also, we decided to run the Hillshade algorithm which includes many more computations than a simple slope or aspect.  In this case, the results are shown below.

We had a breakthrough with PyCUDA, so I want to wait a another day or so to rewrite the serial version in C++, but for now, we’ll use QGIS and the terrain plugin to calculate the hillshade on a 8.3GB raster file (1.5GB as a GeoTiff):

                                                      PyCUDA                                          QGIS
Input                                                8:30                                                 2:30
Output                                             3:48                                                 5:00
Computation                                  5:09                                                 9:00
TOTAL TIME*                               7:03                                               15:30

We’ve placed a version of the code up on gitHub here.  I hope you get a chance to try it out, and maybe even collaboratively help us make this a legitimate plug-in for QGIS.

* the PyCUDA total time is not the sum of its parts because as we are using multi-threading in our code so that while we are reading data in, we are also writing data out in another thread, and also performing the computations using another thread.

 

 

6 Online GIS Courses for $30 or less

Just wanted to announce that there are now 6 online gisadvisor.com courses for learning GIS along with the coupon codes for a discounted price –  you can get all of them for $30 or less!  Where else are you going to find comprehensive online courses in statistics and geography, enterprise GIS with Postgres, Internet mapping with Geoserver, spatial SQL with PostGIS, Manifold GIS, and spatial data processing with GDAL?  Stop on over at the gisadvisor.com site and check out the courses.

The student evaluations have been incredible, so I am psyched to add more courses – let me know what courses you want me to develop next (and yes, there will be a Manifold 9 course when the product is released)?

Parallel Processing with QGIS

Once again, I am continuing my role as a mentor in a National Science Foundation (NSF) Research Experience for Undergraduate program.  This year we’ve decided to build a QGIS plug-in for terrain analysis, as it is embarrassingly parallel (slope, aspect, etc.).   We are doing four things as we generate slope for different size digital elevations models:

  1. A pure python implementation (for an easy plug-in)
  2. A serial-based C++ implementation (as a baseline for a non-parallel solution)
  3. A pyCUDA implementation (using the GPU for parallel processing)
  4. A C++ based parallel solution using the GPU

We plan to put our results on a shared GitHub site (we are in the process of cleaning up the code) so that people can start experimenting with it, and use our example to begin generating more parallel solutions for QGIS (or GDAL for that matter).

Here are some early results: Continue reading

Some comments from my open source GIS class

We are now into our 4th week of my Open Source GIS class at the University of Maryland at College Park.  We’ve covered desktop GIS with QGIS, and worked on basic meat-and-potato vector analysis (overlay, buffer, selections, etc.), spatial interpolation (TIN, IDW), cartography, and multi-criteria map algebra.

The last two weeks we’ve covered Enterprise GIS with Postgres, PostGIS, and QGIS.  The students built their own enterprise GIS, with different users, software platforms, triggers, constraints, groups and roles.  They even tested out simultaneous multi-user editing.

So, 4 weeks in, what do the students have to say?  Here are a couple of samples: Continue reading