BOW

October 3, 2018 ยท View on GitHub

Bags of identifiers generated from 140,000 most starred projects on GitHub in October 2016 - ~112k after deduplication.

Example:

from sourced.ml.models import BOW
bow = BOW().load("1e3da42a-28b6-4b33-94a2-a5671f4102f4")
print("Number of documents:", len(bow))
print("Number of tokens:", len(bow.tokens))

References

ID1e3da42a-28b6-4b33-94a2-a5671f4102f4
Uploaded2017-06-19 09:16:08.942880
Version1.0.0
Filehttps://storage.googleapis.com/models.cdn.sourced.tech/models%2Fbow%2F1e3da42a-28b6-4b33-94a2-a5671f4102f4.asdf
Size380.8 MB
Data collection dateOctober 2016
Number of (sub)tokens999,424
Number of repositories112,273
License

Dependencies