common_recipes.md

March 27, 2019 ยท View on GitHub

  1. Setup phony targets

    .PHONY: all clean 
    
    all: $(GENERATED_FILES) 
    
    clean: 
        rm -Rf finished/*
    
  2. Downloading a zip directory

    parcels.zip: 
        wget --no-use-server-timestamps \ 
        http://maps.indiana.edu/download/Reference/Land_Parcels_County_IDHS.zip -O $@
    

Notice the use of --no-use-server-timestamps. If you didn't use this argument, this file would have a last-touched timestamp of the file on the server. By using this argument, the file will have a timestamp of when it was downloaded.

  1. Unzipping a zip directory

    .INTERMEDIATE: chicomm.shp
    chicomm.shp: chicomm.zip 
        unzip -o $<
    
  2. Converting excel to csv

    .INTERMEDIATE: parcel_survey.csv
    parcel_survey.csv: parcel_survey.xlsx 
        in2csv $< > $@
    
  3. Grabbing select columns from an excel doc, & creating a csv with a new header

    school_id_lookup.csv: School_data_8-3-14.xlsx 
        in2csv $< |\ 
        csvcut -c "1,2" |\ 
        (echo "school_id,school_name"; tail -n +2) > finished/$(notdir $@)
    
  4. Join csvs, using an implicit rule

    %hourly.joined.csv: %hourly.csv stations.csv 
        csvjoin -c "3,4" $< $(word 2,$^) > finished/$(notdir $@)
    
  5. Substitute many versions of the same thing, such as a different URLs for each year of an annual report, into a common recipe

    BASE_URL=www.mydatais.cool
    URL_2010=$(BASE_URL)/2010/summary.csv
    URL_2011=$(BASE_URL)/2011/summery.csv
    URL_2012=$(BASE_URL)/2012/data-summary.csv
    
    YEARS=2010 2011 2012
    
    COOL_DATA=$(patsubst %, data_%.csv, $(YEARS))
    
    
    data_%.csv : 
        wget --no-use-server-timestamps -O $@ $(URL_$*)
    

Call make $cool_data or set $(COOL_DATA) as a dependency of another target to automagically run your pattern recipe for all defined URLs.

This pattern is also useful when grabbing data from large datasets where column names change over time.

Read more on patsubst in the Make docs.