README.org

July 19, 2019 · View on GitHub

Confluence to ElasticSearch

A collection of business processes used to mine Confluence, discover information, intelligently index it and expose all of it as a lightning fast search.

*** Prerequisites

Git (of course).
JRE 8 must exist in PATH (JRE 7 is fine too, but might be deprecated).
Boot 2.x: installation instructions are avaliable [[https://github.com/boot-clj/boot#install][here]].

*** Configuration

There are 3 files: a standard ~~resources/log4j.properties~~ and SeazMe specific ~~config.edn~~ as well as ~~mapping.edn~~ which need to be configured properly.

*** Usage

After making sure that software from Prerequisites is installed, clone this repo and bring up REPL with ~~boot repl~~ command (it might take a few minutes to download all dependencies when run for the first time).

Examples: #+BEGIN_EXAMPLE export BOOT_JVM_OPTIONS="-Xmx12g -XX:+UseSerialGC"

./ -a reinit -c context-demo-twiki -d elasticsearch-prod ./ -a scan -c context-demo-twiki -d elasticsearch-prod -s twiki-demo-prod ./ -a scan -c context-demo-confluence -d confluence-demo-cache -s confluence-demo-prod ./ -a scan -c context-demo-confluence -d elasticsearch-prod -s confluence-demo-cache

./ -a scan -c context-demo-twiki -d datahub-prod -s twiki-demo-prod ./ -a scan -c context-demo-confluence -d datahub-prod -s confluence-demo-prod

./ -a update -c context-demo-confluence -d datahub-prod -s confluence-demo-prod

./ -a update -c context-hbase -d elasticsearch-prod ./ -a update -c context-hbase

./ -a patch -c context-demo-jira -d datahub-prod -s jira-demo-prod -p "key in (DATAHUB-1,DATAHUB-2)"

#+END_EXAMPLE

Where ~~executable~~ is either ~~sources~~ or ~java -jar ~~sources-.jar~~ build with ~~build~~. Add ~~-Dlog4j.configuration=file:resources/log4j.properties~~ to java if logging is desired.

Once above is completed (should take a few minutes), a subdirectory db is created. It contains both copy of Confluence and other data sources.

*** crontab example

~~runme~~:

#+BEGIN_SRC #!/bin/bash cd ${0%/*} java -Dlog4j.configuration=file:log4j.properties -jar sources.jar -a update -c context-12h #+END_SRC

#+BEGIN_SRC SHELL=/bin/bash PATH=/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin

50 * * * * /usr/bin/flock -n /tmp/sources.lockfile seazme-sources/runme &>> runme.log #+END_SRC