ArtSearch ๐ [](https://opensource.org/licenses/MIT)
May 17, 2025 ยท View on GitHub
A local search system implementation using Elasticsearch for Wikipedia data indexing and retrieval.
Table of Contents
Features โจ
- Multi-language support for Wikipedia data
- Elasticsearch-powered search backend
- CLI interface for index management and queries
- Configurable search parameters
Prerequisites ๐ ๏ธ
- Python 3.12
- Java 11+ (for Elasticsearch)
- 30GB+ free disk space (for data storage)
Installation โ๏ธ
1. Download Elasticsearch Engine
# Create data directory
mkdir -p data && cd data
# Download and extract Elasticsearch
wget -O elasticsearch-8.17.3.tar.gz \
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.17.3-linux-x86_64.tar.gz
tar zxvf elasticsearch-8.17.3.tar.gz
rm elasticsearch-8.17.3.tar.gz
cd ..
2. Download Wikipedia Data
Download specific language version of Wikipedia dataset:
# Default English dataset (November 2023)
modelscope download --dataset wikimedia/wikipedia \
--include 20231101.en/* \
--local_dir ./data/wikipedia
Example data structure:
{
"id": "1",
"url": "https://simple.wikipedia.org/wiki/April",
"title": "April",
"text": "April is the fourth month..."
}
Usage ๐
Folder Structure
โโโ data # Data folder
โ โโโ elasticsearch-8.17.3 # Elasticsearch engine
โ โโโ wikipedia # Wikipedia data folder
โ โโโ 20231101.en # English data
โ โโโ 20231101.zh # Chinese data
โ ... # more language data
โ
โโโ es_wiki_build.py # Scripts for build wiki index
โโโ es_wiki_test.py # Unit test for elasticsearch
โโโ README.md
โโโ requirements.txt
โโโ wiki_searcher.py # Search client for wiki data
Building Index
# Build index for default language (en)
python es_wiki_build.py
# Build index for specific language (e.g., French)
python es_wiki_build.py --language fr
Performing Searches
# Default search setting
python es_wiki_test.py
# Direct query execution
python es_wiki_test.py \
--language en \
--query "Paris 2024 Olympic Games"
Configuration โ๏ธ
Setting environment variables for Elasticsearch configuration:
export ELASTIC_PASSWORD="changeme"
Development ๐ง๐ป
# Install dependencies
pip install -r requirements.txt
Contributing ๐ค
We welcome contributions! Please follow these steps:
- Fork the repository
- Create your feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add some amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
License ๐
This project is licensed under the MIT License - see the LICENSE file for details.