Kenneth M. Cruikshank
Powershell Script to Process XML/JSON files

Kenneth M. Cruikshank
Kinect Projects

Many times there are Internet data files that are updated regularly that you may want some information from. These files are often in XML or JSON formats. Here I show a simple Powershell script to get an XML or JSON file, parse the file and display what we are interested in. The illustration works with USGS earthquake information, and one with WaterML

Step 1: Know your file formats

JSON and XML are 'popular' text file formats for data records. That is, they contain multiple sets of related information. An earthquake, for example, has a location, time, depth, and other pieces of information. Each earthquake record contains of the same set of data with different values. A simple layout of the data is efficient to use. One could use a spreadsheet like file format (a comma or tab delimited file), but there the data can be out of context. XML and JSON help keep data organized and in context. Google XML or JSON to see more about the file formats in general.

The structure of the XML or JSON file is determined by whom put it together. You can work out the format on the fly (I often do), but it is wide to look at the data file in an appropriate editor, I often use my IDE (Visual Studio) for this. Some formats we are interested in are:

Step 2: Know where the file is

We want to get a file in a specific format (here JSON or XML, and so we need first the file format documentation, and secondly the URL (Uniform Resource Location) of the file. The URL here is a web address.

Step 3: Get the file as a .NET object

Below is an example of getting a JSON and XML format data files for the last 4 days of earthquake data. The USGS has depreciated the QuakeML format, so you may only be able to get the GeoJson format file...

# Get the raw feed (JSON format), and convert to Powershell object
$feed=(new-object system.net.webclient).downloadstring("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.geojson") 
$data = $feed -join "`n" | ConvertFrom-Json 
# Get the raw feed (XML format), and convert to Powershell object
$feed=[xml](new-object system.net.webclient).downloadstring("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.quakeml")          

Step 4: Work with data in the file

Now that your have a $feed variable containing all the file information, you can use a script window to examine the contents of the file. It can be a good way to learn the structure of the JSON or XML file.You can think of $feed as being a databse table that you can query to get selected data).

XML has well defined time conversions, the JSON time format is a little awkward (I have seen a fair amount of grumbling on the net about that. Anyway, once the file, XML or JSON is in the $feed variable, we can extract and print what we want

The entire scripts are below. For the JSON example, I have added some comments on other fields you can display.

We simply display it here. The material could be moved into a different file format, or even into a database. If you are transforming XML files, you should see my example of using XSLT with Powershell to transform an XML file.

JSON

clear
# =========================================================================================
# Get the raw feed (JSON format), and convert to Powershell object
$feed=(new-object system.net.webclient).downloadstring("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.geojson") 
$data = $feed -join "`n" | ConvertFrom-Json

# =========================================================================================
# Time/Date Conversion Factors
# JSON date is number of milliseconds from January 1, 197O UTC
# .NET uses 100 nanosecond clock ticks, since the year 0 UTC
#
# Compute the number of clock ticks until the start of the the JSON time
$my1970 = Date("1970/01/01 00:00:00")
$offsetTicks = $my1970.Ticks
# We also need to re-scale the JSON ticks to .NET ticks
$JSONtoNETticks = 10000

# =========================================================================================
# Header Information
" "
$myLine = $data.metadata.title
$generated = Date($data.metadata.generated * $JSONtoNETticks + $offsetTicks)
$myLine += " (Updated: {0} UTC)" -f $generated
$myLine

# =========================================================================================
# Repeating data -- the earthquakes ...
$data.features | ForEach-Object { 
$eventTime = Date($_.properties.time * $JSONtoNETticks + $offsetTicks)
$myLine =  "{0,3} {1,3} {2,20:yyyy/MM/dd hh:mm:ss} UTC -- {3,-50} "         -f $_.properties.mag, $_.properties.magType, $eventTime, $_.properties.place
$myLine += "({0,7:N2}°E {1,7:N2}°N {2,8:N2} km Depth)" -f $_.geometry.coordinates
$myLine
}
# Wait for user input
" "
" "
$wait = read-host -prompt "Press any key to exit"

# =========================================================================================
# Used parameters
#$data | Get-Member
# Non - repeating data
#    $data.metadata
#    $data.bbox
#    $_.id, 
#    $_.type
#    $_.geometry.type
#    $_.properties.types       # List of main data fields in .properties
#    $_.properties.updated
#    $_.properties.tz
#    $_.properties.url
#    $_.properties.detail
#    $_.properties.felt
#    $_.properties.cdi
#    $_.properties.mmi
#    $_.properties.alert
#    $_.properties.status
#    $_.properties.tsunami
#    $_.properties.sig
#    $_.properties.net
#    $_.properties.code
#    $_.properties.ids
#    $_.properties.sources
#    $_.properties.nst
#    $_.properties.dmin
#    $_.properties.rms
#    $_.properties.gap
#    $_.properties.type
#    $_.properties.title
             

 

XML

clear
$m2Km = 0.001;  # Converts metres to kilometers

# =========================================================================================
# Get the raw feed (XML format), and convert to Powershell object
$feed=[xml](new-object system.net.webclient).downloadstring("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.quakeml")

# =========================================================================================
# Print out header information 
" "
"4.5 or greater magnitude earthquakes in the last day"

$date = $feed.quakeml.eventParameters.creationInfo.creationTime
	"Updated: {0} ({1} Local time)" -f 
	([System.DateTime]$date).ToUniversalTime().ToString("R"), 
	([System.DateTime]$date).ToString("g")
	" "

# =========================================================================================
# Make a list of earthquake information
$day = [System.Int16]0
$feed.quakeml.eventParameters.event | ForEach-Object { 
	$date = Date($_.origin.time.value)
	$mag = $_.magnitude.mag.value
	$magType = $_.magnitude.type 
	"{0,20:yyyy/MM/dd hh:mm:ss} UTC -- {1,7:N2}°E {2,7:N2}°N {3,8:N2} km Depth -- Magnitude: {4} {5}" -f 
	$date, #([System.DateTime]$date).ToUniversalTime().ToString("R"), 
	[float]$_.origin.longitude.value, 
	[float]$_.origin.latitude.value,
	(([float]$_.origin.depth.value) * $m2Km),
	$mag,
	$magType
}

# =========================================================================================
# Wait for user input
" "
" "
$wait = read-host -prompt "Press any key to exit"


 

 

Geology Department
http://www.pdx.edu/geology

Copyright © 1994-2015 · K.M. Cruikshank ·
http://geomechanics.research.pdx.edu