-
How different is urban GIS?
- •A bit of an artificial separation
- •Physical sciences / processes – raster?
- •SocSci / human processes – vector?
- •Not exactly, and besides urban processes intersect both.
- •But still, there are some unique issues / challenges to urban GIS
-
Commonly used data sources
- •Census
- •Land records
- •Cadastral data
- •Property records
- •Utilities / infrastructure•Image data
-
Parcels are used for:
- Researching title
- Assessing historical patterns in land administration, land use, etc.
- Public service delivery and record keeping (taxation, schooling, zoning, public health, etc).
- Emergency services
-
How are parcels established?
- 1.Approval by various gov‟t agencies
- 2.Record with Register of Deeds, County Recorder, etc.
- 3.Add identifying information for tax assessor
- PIN, address, map sheet #, etc.
- 4.Update all related hard copy and digital records in local gov‟t system
-
What attribute information is associated with parcel boundaries?
Dimensions
- Administrative classifications (ID, zoning, land use, etc)
- Taxable value, estimated resale value
Characteristics of structures on it (size, condition, type, address)
People who live there or are linked to it (i.e. registered taxpayer)
Ownership history, transfers, prices
-
Challenges associated with parcel data for GIS applications
Format (may not always be digital)
Availability (varies widely, as do rules/costs for obtaining)
- Maintenance (hard to keep current in large urban areas!)
- Quality (see above – highly variable)
Augmentation (adding our own additional information to parcel boundaries?)
Confidentiality (lots of potential issues!)
-
Parcels: Special challenges
Condos & PINs
Separated rights (i.e. mining, transportation authorities, air or subsurface water rights)
Encumbrances (roads, utilities, easement between sidewalk and street)
Zoning vs. Land use
Re-platting & other changes
-
Why are addresses so complicated?
Associated with many types of features in real world
Associated with different spatial objects (with different geometries) in GIS
Relationships between addresses and real world phenomena may be one-to-one or many-to-one
-
Complex associations with addresses:
- Addresses (might)
- Have sub-address
- Have zone
- Be assigned to a building/parcel (or more than one)
- Be assigned to a landmark
All of these associations describe relationships between different entities and attributes
Solution: Relational Databases!
-
Elements of addressing systems
House or building number
Street name
Street type
Directional component
- Zone information
- City, state, ZIP, postal code, province, etc.
-
Why do we need zones, anyway?
Multiple cities may have same address
Sort/Index large data set
Search/Processing efficiency(sometimes) Used to geocode or join
Tip: Zones are a great place to set a domain range in an ArcGIS geodatabase!
-
Structuring address ranges on streets:
To/from node
Or, in U.S. Census & TIGER files:
From Address Right
To Address Right
From Address Left
To Address Right
-
Geocoding
Descriptive location (i.e. address) -> exact geographic reference
- The process involves:
- Input (the location you are geocoding)
- Reference data set („base map‟)
- Processing algorithm (math / geometry)
- Output (usually an x/y pair, not always)
-
Geocoding process:
Normalizes & standardizes input
- Searches reference data set
- User-determined tolerances
Returns geocoded output OR user notification of failure to match
[If no match: adjust tolerance, try again]
-
Sources of Error- Geocoding
- Tolerance is set too low/relaxed
- Input is matched to the wrong location
- Reference dataset is inaccurate
- Output is not in the correct „real world‟ location, even though it is correctly located on the map.
- Parcels not same size / address range is wrong
- Output is in the wrong place on the map & the assigned geog. Coordinate is not the „real world‟ location
-
How much error is too much?- Geocoding
What is your application? How will you use the output data set?
Are you combining output data set with other data?
What is the scale of your analysis and/or map?
-
Special challenges- Geocoding
Informal settlements
PO boxes & other „non standard‟ addresses
Multi-family dwellings (apts, condos)
Industrial / commercial areas
-
Geocoding in U.S. cities: Reference data options
- TIGER/Line files (street centerlines)
- Full US coverage, free
- Lower accuracy, esp for local applications
- Street centerlines from municipal gov‟t
- More current than TIGER
- May not cover whole study area
- Metadata & accuracy vary widely
- Parcel address points
- Highly accurate, but not available everywhere
- Expensive
-
New geo-coding capabilities
At first, addresses only.
- Now, ZIP codes, parcels, „natural language‟ descriptions
- But, parcels are less accurate / reliable!
- Why?
- More (& better) digital reference data sets
- More robust interpolation algorithms
But…declining gov‟t funding for development of reference data sets
-
Practical tips and common problems- Geocoding
- Spelling errors or typos
- May be in input or in reference layer
- Missing segments
- Your address not in existing ranges
- Street is named differently along a portion
- “Cicero” vs. “Hwy 50”
Try a „select by attribute‟ to highlight the entire street
Have a close look at your attribute table
-
Why study administrative aspects of spatial data?
Most everyday work of gov’t uses spatial data
- Practices of creating, sharing, using spatial data vary widely by:
- Scale of gov’t (local, regional, state, national, etc)
- National context
The social and political construction of data affects how we perceive the world to be, and how we try to act to change it.
-
The ‘social construction’ of data
Spatial data are not purely ‘technical’
The data themselves, as well as access, use, implications are determined by more than the just the technologies themselves
Data <-> Society
-
For ex: data development, use, impacts affected by:
- Political contexts
- e.g. access to gov’t data, U.S. vs E.U.
- Cultural contexts
- Rules, beliefs, preferences, accepted behaviors
- e.g. norms about data privacy,
- Organizational / institutional contexts
- Budgets, methods, staffing, technologies, ‘political capital’
- GIS adoption in Milwaukee WI, vs. Cochise Cty, AZ
-
Understanding data sharing practices:
Context matters – type or size of organization, its attitude toward sharing, its national or regional situation
Social and political connectivity and relationships matter
Formal codes and rules matter, but so do informal rules, unwritten expectations, and individual quirks
Policies often have unexpected consequences
-
Spatial data infrastructures (SDIs)
The human, technological, and informational resources used to manage and share large collections of spatial data
Includes data, metadata, networks, regulations, people, and organizations
SDIs have been the predominant model for governmental spatial data handling in the US
Most SDIs are national level – local SDIs have proved harder to implement
-
Technical aspects of data sharing: Interoperability
Compatibility between different data or computing environments that allow us to integrate or move between them.
- Data, systems, network
- In GIS, data interop is biggest challenge
Also called „semantic interoperability‟
- Difficult to achieve because of high levels of semantic heterogeneity
- „green‟, „green‟, „green‟.
-
Sources of semantic heterogeneity
Data collection methods
- Purposes of data collection
- Institutional differences
- Classification schemes
- Even the same classification scheme may be applied differently…
-
Socio-economic analysis with GIS
Commonly uses census data
Thematic mapping, spatial statistics
- Used for
- Defining political districts
- Allocating public funds
- Policy making / programming decisions
-
Censuses change over time…
Questions asked, not asked
- Changing terminologies / definitions
- Race; „head of HH‟ vs „householder‟
- Change in how you can answer the Qs
- 2000 –multiple race identifiers accepted
- Enumeration units
- Watch out if you are doing historical analysis
-
Contemporary census data issues
- 2000 was first fully digital/GIS-able censuses (but only in some countries!)
- But early digital spatial data: DIME, 1970s
Devolution of gov‟t = devolution of data
- Growing concern about the data
- Mistrust, privacy
- Undercounts
- Concern about attributes and their definitions (i.e. who is a household?)
-
What you need to know to find & use US Census data with GIS:
Data collection & aggregation
Census geographies
- Data organization / structure
- Tables & variables
-
Data collection – pre 2010
- Short form (all)
- Race, Hispanic origin, age, sex, housing tenure, household relationships
- Long form (1 in 6)
- Income, language, work status, home values, telephone, etc etc…
-
Data collection – 2010 and on:
- Short form (all)
- Race, Hispanic origin, age, sex, housing tenure, household relationships
- No more long form!
- American Community Survey to obtain detailed data
-
Data aggregation
Original data aggregated at many different summary levels
- Governmental units
- “Seattle”, “Guam”, etc.
- Statistical units
- “Tract”, “Block Group”, “MSA”
- At block level, short form data only
- Why??
-
Census geographies: Statistical units
US->Region->Division
State->County->County subdivision
Place->Census tract->Block group->Block
-
Finding specific census areas & linking census data: FIPS Codes
- “Federal Information Processing Standard”
- Example: 060710036021003
-
Other census geographies you might encounter
MSAs
But in New England = NECTA
TAZs
Voting districts
Tribal lands
[PUMA – public use microdata areas]
-
Census geographies are provided as TIGER/Line Files
These are the „spatial data‟ for the census
Roads, census administrative units, streets addresses, points of interest, other administrative units, hydrography, physical features.
Organized as line, landmark, polygon features
Key feature: TOPOLOGY!
-
Topological rules in TIGER system
Must not overlap
Must be covered by (class of)
Must cover each other
Can be used to describe ALL relationships between census geographies!
Features are defined as 0-cell (point), 1-cell (line), 2-cell (polygon)
-
Elements of TIGER/Line topology
- General case
- •From-Node
- •To-Node
- •Left-Polygon
- •Right-Polygon
->
Specific TIGER terms
- •From Address Right
- •To Address Right
- •From Address Left
- •To Address Right
-
TIGER/Line “features”
- •Line
- •Landmark
- •Polygon
- •But: these are not „features‟ like we talk about them in a GIS sense! Loosely, feature is something that has real world identity here.
-
Locating the right data, Step 1: Look in the correct Summary File
A summary file is a collection of data from the Census
SF1 = all short form data
SF2 = all short form data, by race
SF3 = all long form data
SF4 = all long form data, by race
- PUMS = raw data, long form
- Requires special permission to access
-
Locating the right data, Step 2: Find the right column heading
Each attribute has a numerical code (i.e. “H903633”), not a logical name.
Pxxxxxx is always population
Hxxxxxx is always housing
Pxxxxx1 is almost always total pop
Hxxxxx1 is almost always total number of households
Census metadata tells you what the codes stand for!
-
Census 2010 – Breaking news…
Short form only - no long form
- American Community Survey (ACS) to collect info formerly on long form
- Continuously, rather than ever 10 yrs
2 new questions to try to get at short-term household members (i.e. „anyone who sometimes lives somewhere else?‟
This will have implications for how Summary Files are organized/released!
-
Data quality issues & solutions
- 10-year gap between censuses
- American Community Survey
- Long form, annually, to sample of HHs
- More follow-up with respondents
- Undercounts
- Demographic data (births, deaths, estimates of undoc‟d immigrants, etc.)
- Post census surveys (Post-Enumeration Survey)
-
Can we do better than Census data? High resolution socio-economic data
- Integrate other administrative data
- IRS, County Registrar, Postal Service, etc.
Harvest & assemble online personal data
- Barriers / Challenges
- Legal / societal
- Technical
-
Census is supposed to be a full count, but not everyone responds…
Response rate typically about 66%, based on master address file (MAF)
In-person enumeration brings this to 97%
So how do we come up with the other 3%?
Were all items completed by the other 97%? Probably not…
-
Imputation
Use of statistical methods to adjust for missing and inconsistent responses
- “Hot deck imputation” - sampling a given set of responses from a particular geographic area over and over and using those responses to do several things:
- Correct for non-response
- Perform edits
- Ensure confidentiality
Depends on assumptions about homogeneity (near things are more similar than far…)
- Two major types:
- Allocation (some missing values entered based on other reported information for the person or household, or similar others)
- Substitution (all information for a person or household is created from others with similar characteristics
-
Imputation rates are uneven, socially and spatially:
Less likely to respond to Census: highly transitory HHs, non-English speakers, undocumented individuals.
Not all parts of US have similar rates of imputation (pop and race more likely to have been allocated in SW)
What does this mean for you as a user of US Census data?
-
Simple network-based applications
- Shortest path
- Shortest/fastest way to get home from school?
- Closest facility
- Which post office is closest to my house?
- Service area
- If people who can get to our store within 30 minutes will shop here, what is our service area?
-
Network analysis allows us to:
- Analyze transit "cost"
- How long does it take to navigate through this network?
- Select optimal routes
- What is the least cost path through this network (time, distance, traffic…).
- Solve resource allocation problems
- What is the portion of the network that should be considered the „territory‟ from a particular starting point (e.g. each fire station…)
-
Case #1: Do neighborhoods have different levels of access to parks?
Context: Gov‟t sets benchmark for green space per capita, urban and regional planners must meet this standard
But this may not reflect ability of population to travel to available sites
Buffering the sites or calculating straight-line distance doesn‟t represent how people actually travel to the sites
A network-based solution is required
-
Do different n‟hoods have different access? How to solve this problems in GIS:
Define “access points” for the green spaces
Establish centroids for the census polygons (or whatever areas you are using as the „source areas‟ that people are coming from)
For all points in demographics layer, calculate distance to nearest point in parks layer, following the network.
Assess how the "nearest park" distances may differ for different population groups
-
Representing and Analyzing Networks in GIS
Network: a system of linear features connected at intersections and interchanges
A network is composed of a set of nodes and the links that connect them
Networks might be used to represent: roads, streams, airline flight paths, railroads and so on.
-
A square is a rectangle, but not all rectangles…
Not all sets of lines in a GIS constitute a network
- Defining a network involves adding additional information that tells us:
- "Cost" to travel each link
- Connectivity between links
- Direction that may be traveled on each
- Limits on transfer from one link to another
-
All network applications start with the network itself:
- If it is a road network you need systems for:
- Representing overpass/underpass
- Accounting for time taken in a turn
- All possible turns at an intersection
- One-way and other street direction limitations
-
To create network, you need to create:
- Overpass & underpass
- a) Can simply cross the arcs with no node (easy)
- b) Can insert two nodes with elevation values to show which is the overpass and which is the underpass (harder)
-
For more realistic modeling, you also need
- Link impedance – "cost" (time) of traversing a "link" (a segment separated by 2 nodes)
- To a certain degree, length of link is used
- May depend on speed limits, traffic conditions, etc
Turn impedance – „cost‟ of transitioning from one arc to another in the network (a turn). Requires a „turn table‟, because different possible turns may have different impedence (right vs. unprotected left, for instance).
-
More complicated network applications
- Modeling events that happen along the network
- Clusters of traffic accidents, flight delays
- Multiple modes of travel on the network
- Bike to the train station, train to work
- Attributes of the network
- Risk of a toxic spill, traffic volumes
-
More complicated impedance models:
- Distance and speed are not the only factors influencing route selection
- Road, amenities, terrain/view, weather, elevation, etc.
Conventional impedance models limited for transportation planning
- But, additional attributes can be handled as impedance factors, combined
- Sadeghi-Niaraki et al. 2011.
-
Dynamic segmentation
A data model built on lines of a network
Uses relational database to store network geometry (intersections, streets) and info about traversing the network (turn tables, link impedance tables, etc)
You can add other attribute data to your model (pavement conditions - dry, wet, icy; risk of an accident; frequency of monsoon flooding, etc.)
-
Using more complicated network applications for urban research & policy
Modeling multi-mode travel
Dispatching „toxic trucks‟ along less risky routes
Evacuation planning
-
Existing trip-based models are limited
- One kind of transportation
- Single trip only
- Different start/stop points
- Many travel behaviors not covered bythis kind of model…
-
One solution: Super-networks
Multiple copies of networks, one for each mode of travel (driving, pub transit, walking etc.)
The networks intersect @ nodes where you can switch from one mode to another (transition links)
locations on network have attributes based on mode of travel and type of trip.
-
Application issues and limits: Super Networks
Size of network model is untenable
Simplify with user input selecting parts of the network?
- Place-based differences in actual travel options and patterns
- Areas w/o public transit
-
Another case example: Routing „toxic trucks‟
How do we model risks along a road network so we can route vehicles on the least risky path?
Involves multi-criteria modeling
- Assessing „risk‟ is tricky, esp in GIS
- How much is too much? Costs?
- Does risk depend on toxicity, likelihood of event, # of people effected, something else?
-
Assessing risk along the network
- Created measures for these variables
- Population density, traffic flow, traffic speed, emergency response time, sparse population…
Each variable given an ordinal rank
Scores are weighted to calculate composite risk; risk score assigned to each segment
Routing algorithm uses criteria scores to determine optimal route
-
Assessing impacts/harm along the network
Buffer the „at risk‟ road segments
Identify vulnerable facilities w/in the buffers (kids, elderly, high pop density facilities)
Clip / interpolate socio-economic data using the buffers
-
Example 3: Evacuation planning
Networks, as part of mathematical and spatial models.
- Challenges:
- Real-time data? [sensors]
- Uneven info access / unpredictable behavior
- Damage to network
- Multi-institutional data collection and sharing
-
Related applications: Can we deliver information to people based on location?
Location-based services
GPS-enabled devices (mostly cell phones, also cars)
- Where is the device in the network, where is it relative to other locations in the network?
- Location-based services
-
All LBS need to be able to model movements / locations in space
- Most use street networks as a basis:
- Where is person or object X in relation to street network? How near or far?
- What is the shorter route between two places on this network? Fastest?
- How many people/objects can move along this network under certain conditions?
|
|