README.rdoc
July 11, 2010 · View on GitHub
= KMeans
Attempting to build a fast, memory efficient K-Means program.
== Install
gem sources -a http://rubygems.org sudo gem install k_means
== How To Use require 'rubygems' require 'k_means'
data = [[1,1], [1,2], [1,1], [1000, 1000], [500, 500]] kmeans = KMeans.new(data, :centroids => 2) kmeans.inspect # Use kmeans.view to get hold of the un-inspected array => [[3, 4], [0, 1, 2]]
== Custom Centroids require 'rubygems' require 'k_means'
Your custom centroid needs to have #position and #reposition methods
class CustomCentroid attr_accessor :position def initialize(position); @position = position; end def reposition(nodes, centroid_positions); end end
custom_centroids = [] 2.times { custom_centroids << CustomCentroid.new([1,1]) }
data = [[1,1], [1,2], [1,1], [1000, 1000], [500, 500]] kmeans = KMeans.new(data, :custom_centroids => custom_centroids)
== Distance Measurements
KMeans uses the Distance Measures Gem (http://github.com/reddavis/Distance-Measures) so we get quite a range of distance measurements.
The measurements currently available are:
euclidean_distance
cosine_similarity
jaccard_index
jaccard_distance
binary_jaccard_index
binary_jaccard_distance
tanimoto_coefficient
To specify a particular one to use in the KMeans algorithm, just provide it as an option:
KMeans.new(@data, :distance_measure => :jaccard_index) KMeans.new(@data, :distance_measure => :cosine_similarity) KMeans.new(@data, :distance_measure => :tanimoto_coefficient)
You get the idea...
== Benchmarks
1000 records with 50 dimensions
data = Array.new(1000) {Array.new(50) {rand(10)}} ai4r_data = Ai4r::Data::DataSet.new(:data_items=> data)
Clustering can happen in magical ways
so lets do it over multiple times
n = 5
Benchmark.bm do |x| x.report('KMeans') do n.times { KMeans.new(data) } end x.report("Ai4R") do n.times do b = Ai4r::Clusterers::KMeans.new b.build(ai4r_data, 4) end end end user system total real KMeans 15.960000 0.030000 15.990000 ( 16.062639) Ai4R 70.230000 0.180000 70.410000 ( 70.704843)
== Thanks
- David Richards - For his code reviews and all round helpfulness. - http://github.com/davidrichards
== Copyright
Copyright (c) 2009 Red Davis. See LICENSE for details.