Configuring the API service

August 19, 2024 ยท View on GitHub

Raw Data API can be setup using two configuration options. You can choose based on your convienience

  • config.txt : You can follow config.txt.sample in dir and documentation below to set your configurations
  • .env : Another option is from OS Environment variable , You can export all your env variables ( They are same as you put in config without blocks ) and pass it to API , API will pick it up automatically.

What you need to start?

The default configuration file is an ini-style text file named config.txt in the project root.

Users Table

Users table is present on backend/sql/users.sql Make sure you have it before moving forward

psql -a -f backend/sql/users.sql

& Add your admin's OSM ID as admin

INSERT INTO users (osm_id, role) VALUES (1234, 1);

Sections

The following sections are recognised.

  • [DB] - For database connection information. Required.
  • [OAUTH] - For connecting to OpenStreetMap using an OAuth2 app. Required.
  • [CELERY] - For task queues on Redis. Required.
  • [API_CONFIG] - API service related configuration. Required.
  • [EXPORT_UPLOAD] - For external file hosts like S3. Optional.
  • [SENTRY] - Sentry monitoring configuration. Optional.
  • [HDX] - HDX Exports related configuration. Optional.

The following are the different configuration options that are accepted.

Config optionENVVARSectionDefaultsDescriptionRequired?
PGHOSTPGHOST[DB]nonePostgreSQL hostname or IPREQUIRED
PGPORTPGPORT[DB]5432PostgreSQL connection portOPTIONAL
PGUSERPGUSER[DB]nonePostgreSQL user/roleREQUIRED
PGPASSWORDPGPASSWORD[DB]nonePostgreSQL user/role passwordREQUIRED
PGDATABASEPGDATABASE[DB]nonePostgreSQL database nameREQUIRED
OSM_CLIENT_IDOSM_CLIENT_ID[OAUTH]noneClient ID of OSM OAuth2 applicationREQIRED
OSM_CLIENT_SECRETOSM_CLIENT_SECRET[OAUTH]noneClient Secret of OSM OAuth2 applicationREQIRED
OSM_PERMISSION_SCOPEOSM_PERMISSION_SCOPE[OAUTH]read_prefsOSM access permission for OAuth2 applicationOPTIONAL
LOGIN_REDIRECT_URILOGIN_REDIRECT_URI[OAUTH]noneRedirect URL set in the OAuth2 applicationREQUIRED
APP_SECRET_KEYAPP_SECRET_KEY[OAUTH]noneHigh-entropy string generated for the applicationREQUIRED
OSM_URLOSM_URL[OAUTH]https://www.openstreetmap.orgOSM instance Base URLOPTIONAL
LOG_LEVELLOG_LEVEL[API_CONFIG]debugApplication log level; info,debug,warning,errorOPTIONAL
RATE_LIMITER_STORAGE_URIRATE_LIMITER_STORAGE_URI[API_CONFIG]redis://redis:6379Redis connection string for rate-limiter dataOPTIONAL
RATE_LIMIT_PER_MINRATE_LIMIT_PER_MIN[API_CONFIG]5Number of requests per minute before being rate limitedOPTIONAL
EXPORT_PATHEXPORT_PATH[API_CONFIG]exports?Local path to store exportsOPTIONAL
EXPORT_MAX_AREA_SQKMEXPORT_MAX_AREA_SQKM[API_CONFIG]100000max area in sq. km. to support for rawdata inputOPTIONAL
USE_CONNECTION_POOLINGUSE_CONNECTION_POOLING[API_CONFIG]falseEnable psycopg2 connection poolingOPTIONAL
ALLOW_BIND_ZIP_FILTERALLOW_BIND_ZIP_FILTER[API_CONFIG]trueEnable zip compression for exportsOPTIONAL
EXTRA_README_TXTEXTRA_README_TXT[API_CONFIG]``Append extra string to export readme.txtOPTIONAL
ENABLE_TILESENABLE_TILES[API_CONFIG]falseEnable Tile Output (Pmtiles and Mbtiles)OPTIONAL
ENABLE_SOZIPENABLE_SOZIP[API_CONFIG]falseEnables sozip compressionOPTIONAL
DEFAULT_QUEUE_NAMEDEFAULT_QUEUE_NAME[API_CONFIG]raw_daemonOption to define default queue nameOPTIONAL
ONDEMAND_QUEUE_NAMEONDEMAND_QUEUE_NAME[API_CONFIG]raw_ondemandOption to define daemon queue name for scheduled and long exportsOPTIONAL
ENABLE_POLYGON_STATISTICS_ENDPOINTSENABLE_POLYGON_STATISTICS_ENDPOINTS[API_CONFIG]FalseOption to enable endpoints related the polygon statistics about the approx buildings,road length in passed polygonOPTIONAL
ENABLE_CUSTOM_EXPORTSENABLE_CUSTOM_EXPORTS[API_CONFIG]FalseEnables custom exports endpoint and importsOPTIONAL
POLYGON_STATISTICS_API_URLPOLYGON_STATISTICS_API_URL[API_CONFIG]NoneAPI URL for the polygon statistics to fetch the metadata , Currently tested with graphql query endpoint of Kontour , Only required if it is enabled from ENABLE_POLYGON_STATISTICS_ENDPOINTSOPTIONAL
POLYGON_STATISTICS_API_URLPOLYGON_STATISTICS_API_RATE_LIMIT[API_CONFIG]5Rate limit to be applied for statistics endpoint per minute, Defaults to 5 request is allowed per minuteOPTIONAL
WORKER_PREFETCH_MULTIPLIERWORKER_PREFETCH_MULTIPLIER[CELERY]1No of tasks that worker can prefetch at a timeOPTIONAL
DEFAULT_SOFT_TASK_LIMITDEFAULT_SOFT_TASK_LIMIT[API_CONFIG]7200Soft task time limit signal for celery workers in seconds.It will gently remind celery to finish up the task and terminate, Defaults to 2 HourOPTIONAL
DEFAULT_HARD_TASK_LIMITDEFAULT_HARD_TASK_LIMIT[API_CONFIG]10800Hard task time limit signal for celery workers in seconds. It will immediately kill the celery task.Defaults to 3 HourOPTIONAL
USE_DUCK_DB_FOR_CUSTOM_EXPORTSUSE_DUCK_DB_FOR_CUSTOM_EXPORTS[API_CONFIG]FalseEnable this setting to use duckdb , By default duck db is disabled and postgres is usedOPTIONAL
CELERY_BROKER_URLCELERY_BROKER_URL[CELERY]redis://localhost:6379/0Redis connection string for the brokerOPTIONAL
CELERY_RESULT_BACKENDCELERY_RESULT_BACKEND[CELERY]redis://localhost:6379/0Redis/psotgresql connection string for the the result backend, eg : db+postgresql://username:password@localhost:5432/db_nameOPTIONAL
FILE_UPLOAD_METHODFILE_UPLOAD_METHOD[EXPORT_UPLOAD]diskFile upload method; Allowed values - disk, s3OPTIONAL
BUCKET_NAMEBUCKET_NAME[EXPORT_UPLOAD]noneAWS S3 Bucket nameCONDITIONAL
AWS_ACCESS_KEY_IDAWS_ACCESS_KEY_ID[EXPORT_UPLOAD]noneAWS Access Key ID for S3 accessCONDITIONAL
AWS_SECRET_ACCESS_KEYAWS_SECRET_ACCESS_KEY[EXPORT_UPLOAD]noneAWS Secret Access Key for S3 accessCONDITIONAL
SENTRY_DSNSENTRY_DSN[SENTRY]noneSentry Data Source NameOPTIONAL
SENTRY_RATESENTRY_RATE[SENTRY]1.0Sample rate percentage for shipping errors to sentry; Allowed values between 0 (0%) to 1 (100%)OPTIONAL
ENABLE_HDX_EXPORTSENABLE_HDX_EXPORTS[HDX]FalseEnables hdx related endpoints and importsOPTIONAL
ENABLE_METRICS_APISENABLE_METRICS_APIS[API_CONFIG]FalseEnables download metrics related endpoints , Require different setup of metrics populatorOPTIONAL
HDX_SITEHDX_SITE[HDX]'demo'HDX site to point , By default demo site , use prod for productionCONDITIONAL
HDX_API_KEYHDX_API_KEY[HDX]NoneYour API Secret key for hdx upload , should have write access and it is compulsory if ENABLE_HDX_EXPORTS is TrueCONDITIONAL
HDX_OWNER_ORGHDX_OWNER_ORG[HDX]NoneYour HDX organization IDCONDITIONAL
HDX_MAINTAINERHDX_MAINTAINER[HDX]NoneYour HDX Maintainer IDCONDITIONAL
DUCK_DB_MEMORY_LIMITDUCK_DB_MEMORY_LIMIT[API_CONFIG]NoneDuck DB max memory limit , 80 % of your RAM eg : '5GB'CONDITIONAL
DUCK_DB_THREAD_LIMITDUCK_DB_THREAD_LIMIT[API_CONFIG]NoneDuck DB max threads limit ,n of your cores eg : 2CONDITIONAL
HDX_SOFT_TASK_LIMITHDX_SOFT_TASK_LIMIT[HDX]18000Soft task time limit signal for celery workers in seconds.It will gently remind celery to finish up the task and terminate, Defaults to 5 HourOPTIONAL
HDX_HARD_TASK_LIMITHDX_HARD_TASK_LIMIT[HDX]21600Hard task time limit signal for celery workers in seconds. It will immediately kill the celery task.Defaults to 6 HourOPTIONAL
PROCESS_SINGLE_CATEGORY_IN_POSTGRESPROCESS_SINGLE_CATEGORY_IN_POSTGRES[HDX]FalseRecommended for workers with low memery or CPU usage , This will process single category request like buildings only , Roads only in postgres itself and avoid extraction from duckdbOPTIONAL
PARALLEL_PROCESSING_CATEGORIESPARALLEL_PROCESSING_CATEGORIES[HDX]TrueEnable parallel processing for mulitple categories and export formats , Disable this if you have single cpu and limited RAM , Enabled by defaultOPTIONAL

Note : HDX_API_KEY

In order to generate HDX_API_KEY , You need to be logged in to https://data.humdata.org/ . Follow following navigation to generate tokens :

  • Your profile section > User settings > API Tokens

API Tokens have expiry date, It is important to update API Tokens manually each year for hosted api service !

Which Service uses which settings?

ParameterConfig SectionAPIWorker
PGHOST[DB]YesYes
PGPORT[DB]YesYes
PGUSER[DB]YesYes
PGPASSWORD[DB]YesYes
PGDATABASE[DB]YesYes
OSM_CLIENT_ID[OAUTH]YesNo
OSM_CLIENT_SECRET[OAUTH]YesNo
OSM_PERMISSION_SCOPE[OAUTH]YesNo
LOGIN_REDIRECT_URI[OAUTH]YesNo
APP_SECRET_KEY[OAUTH]YesNo
OSM_URL[OAUTH]YesNo
LOG_LEVEL[API_CONFIG]YesYes
RATE_LIMITER_STORAGE_URI[API_CONFIG]YesNo
RATE_LIMIT_PER_MIN[API_CONFIG]YesNo
EXPORT_PATH[API_CONFIG]Yes (Not needed for upload_s3)Yes
EXPORT_MAX_AREA_SQKM[API_CONFIG]YesNo
USE_CONNECTION_POOLING[API_CONFIG]YesYes
ENABLE_TILES[API_CONFIG]YesYes
ENABLE_SOZIP[API_CONFIG]YesYes
ALLOW_BIND_ZIP_FILTER[API_CONFIG]YesYes
EXTRA_README_TXT[API_CONFIG]NoYes
INDEX_THRESHOLD[API_CONFIG]NoYes
MAX_WORKERS[API_CONFIG]NoYes
DEFAULT_QUEUE_NAME[API_CONFIG]YesNo
ONDEMAND_QUEUE_NAME[API_CONFIG]YesNo
ENABLE_POLYGON_STATISTICS_ENDPOINTS[API_CONFIG]YesYes
POLYGON_STATISTICS_API_URL[API_CONFIG]YesYes
POLYGON_STATISTICS_API_RATE_LIMIT[API_CONFIG]YesNo
DEFAULT_SOFT_TASK_LIMIT[API_CONFIG]NoYes
DEFAULT_HARD_TASK_LIMIT[API_CONFIG]NoYes
USE_DUCK_DB_FOR_CUSTOM_EXPORTS[API_CONFIG]YesYes
DUCK_DB_MEMORY_LIMIT[API_CONFIG]YesYes
DUCK_DB_THREAD_LIMIT[API_CONFIG]YesYes
ENABLE_CUSTOM_EXPORTS[API_CONFIG]YesYes
ENABLE_METRICS_APIS[API_CONFIG]YesNo
CELERY_BROKER_URL[CELERY]YesYes
CELERY_RESULT_BACKEND[CELERY]YesYes
WORKER_PREFETCH_MULTIPLIER[CELERY]YesYes
FILE_UPLOAD_METHOD[EXPORT_UPLOAD]YesYes
BUCKET_NAME[EXPORT_UPLOAD]YesYes
AWS_ACCESS_KEY_ID[EXPORT_UPLOAD]YesYes
AWS_SECRET_ACCESS_KEY[EXPORT_UPLOAD]YesYes
SENTRY_DSN[SENTRY]YesNo
SENTRY_RATE[SENTRY]YesNo
ENABLE_HDX_EXPORTS[HDX]YesYes
HDX_SITE[HDX]YesYes
HDX_API_KEY[HDX]YesYes
HDX_OWNER_ORG[HDX]YesYes
HDX_MAINTAINER[HDX]YesYes
HDX_SOFT_TASK_LIMIT[HDX]NoYes
HDX_HARD_TASK_LIMIT[HDX]NoYes
PROCESS_SINGLE_CATEGORY_IN_POSTGRES[HDX]NoYes
PARALLEL_PROCESSING_CATEGORIES[HDX]NoYes

Compulsory Configuration

Create config.txt inside root directory.

It should be on the same place where config.txt.sample

Prepare your OSM Snapshot Data

Initialize rawdata from here OR Create database "raw" in your local postgres and insert sample dump from

/tests/fixtures/pokhara.sql
psql -U postgres -h localhost raw < pokhara.sql

Put your credentials on Rawdata block

[DB]
PGHOST=localhost
PGUSER=postgres
PGPASSWORD=admin
PGDATABASE=raw
PGPORT=5432

Setup Oauth for Authentication

Login to OSM , Click on My Settings and register your local galaxy app to Oauth2applications

image

Check on read user preferences and Enter redirect URI as following

http://127.0.0.1:8000/v1/auth/callback/

Grab Client ID and Client Secret and put it inside config.txt as OAUTH Block , you can generate secret key for your application by yourself

[OAUTH]
OSM_CLIENT_ID= your client id
OSM_CLIENT_SECRET= your client secret
OSM_URL=https://www.openstreetmap.org
OSM_PERMISSION_SCOPE=read_prefs
LOGIN_REDIRECT_URI=http://127.0.0.1:8000/v1/auth/callback/
APP_SECRET_KEY=your generated secret key

Configure celery and redis

API uses Celery 5 and Redis 6 for task queue management , Currently implemented for Rawdata endpoint. 6379 is the default port . if you are running redis on same machine your broker could be redis://localhost:6379/. You can change the port according to your configuration for the current docker compose use following

[CELERY]
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0

Finalizing config.txt

Insert your config blocks with the database credentials where you have underpass ,insight and rawdata in your database along with oauth block

Summary of command :

Considering You have PSQL-POSTGIS setup with user postgres host localhost on port 5432 as password admin

  export PGPASSWORD='admin';
  psql -U postgres -h localhost -p 5432 -c "CREATE DATABASE raw;"

  cd tests/fixtures/
  psql -U postgres -h localhost -p 5432 raw  < pokhara.sql

Your config.txt will look like this

[DB]
PGHOST=localhost
PGUSER=postgres
PGPASSWORD=admin
PGDATABASE=raw
PGPORT=5432

[OAUTH]
OSM_CLIENT_ID= your client id
OSM_CLIENT_SECRET= your client secret
OSM_URL=https://www.openstreetmap.org
OSM_PERMISSION_SCOPE=read_prefs
LOGIN_REDIRECT_URI=http://127.0.0.1:8000/v1/auth/callback/
APP_SECRET_KEY=jnfdsjkfndsjkfnsdkjfnskfn

[API_CONFIG]
LOG_LEVEL=debug
RATE_LIMITER_STORAGE_URI=redis://redis:6379

[CELERY]
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0

Tips : Follow .github/workflows/unit-test If you have any confusion on implementation of config file .

Optional Configuration [ You can skip this part for basic installation ]

You can further customize API if you wish with API_CONFIG Block

[API_CONFIG]
EXPORT_PATH=exports # used to store export path
EXPORT_MAX_AREA_SQKM=100000 # max area to support for rawdata input
USE_CONNECTION_POOLING=True # default it will not use connection pooling but you can configure api to use to for psycopg2 connections
LOG_LEVEL=info #options are info,debug,warning,error
ALLOW_BIND_ZIP_FILTER=true # option to configure export output zipped/unzipped Default all output will be zipped
RATE_LIMITER_STORAGE_URI=redis://localhost:6379 # API uses redis as backend for rate limiting
INDEX_THRESHOLD=5000 # value in sqkm to apply grid/country index filter
RATE_LIMIT_PER_MIN=5 # no of requests per minute - default is 5 requests per minute

Based on your requirement you can also customize rawdata exports parameter using EXPORT_UPLOAD block

[EXPORT_UPLOAD]
FILE_UPLOAD_METHOD=disk # options are s3,disk , default disk
AWS_ACCESS_KEY_ID= your id
AWS_SECRET_ACCESS_KEY= yourkey
BUCKET_NAME= your bucket name

Sentry Config :

[SENTRY]
SENTRY_DSN=
SENTRY_RATE=

You can export config variables without block as system env variables too