"Bad" Data Exhibition

August 20, 2024 · View on GitHub

This document is a exhibition that shows those so-called "bad" data from diverse datasets during we use our Data-Juicer to process them. The motivations of this exhibition include:

  • It can help users to better understand how each OP in Data-juicer finds these "bad" data to improve the "quality" of datasets.
  • There might be non-negligible differences between diverse datasets. So some OPs work well on some datasets but might be useless on others.
  • No matter how high-quality people consider a dataset to be (e.g. Wikipedia, Books, ...), there are always some "bad" data hidden in it.

Table of Contents

Involved OPs

OPDatasets
alphanumeric_filterLCS-558K, Wikipedia, Books
average_line_length_filterGithub Code
character_repetition_filterLCS-558K, Wikipedia, Books, Stack Exchange
flagged_words_filterLCS-558K
image_aspect_ratio_filterLCS-558K, MMC4
image_deduplicatorLCS-558K
image_shape_filterLCS-558K
image_size_filterLCS-558K, MMC4
image_text_matching_filterLCS-558K, MMC4
language_id_score_filterBooks
perplexity_filterLCS-558K, Books, ArXiv
special_characters_filterWikipedia, Books
text_length_filterWikipedia, ArXiv, Github Code
words_num_filterStack Exchange
word_repetition_filterMMC4, Wikipedia
  • Everyone from community is welcome to continue to add examples to this table.

Multimodal Datasets

LCS-558K

The pretraining dataset of LLaVA-1.5

LCS-558K
itemvalue
from OPimage_aspect_ratio_filter
id004436823
aspect_ratio36.2857142857
captionthe timber in honey amber
image
image
commentsUnaligned image and caption contents
itemvalue
from OPimage_shape_filter
id005568917
image_height5177
captionthe us coast guard's top five most popular aircrafts infographic
image
image
commentsThe image with large height/width will lose too much information after being processed as model input
itemvalue
from OPimage_shape_filter
id003301613
image_width3469
captioncolor the circle by number pages for children to learn colors
image
image
commentsThe image with large height/width will lose too much information after being processed as model input
itemvalue
from OPimage_size_filter
id002642925
image_size2,391 bytes
captionpink gold opal and diamond ring
image
image
commentsImages with too small size might be invalid placeholders without meaningful contents
itemvalue
from OPimage_text_matching_filter
id001365521
image_text_matching_score0.0008432278991676867
captiona black bmw m140i sports hatch from a dealer's garage
image
image
commentsImages with too small image-text matching score might be invalid placeholders without meaningful contents
itemvalue
from OPalphanumeric_filter
id001135426
alnum_ratio0.429825
captiondakin - laptopruck » » » » » » » » » » » » » » » »
image
image
commentsTexts with too small alnum ratio might contain unexpected extra meaningless tokens
itemvalue
from OPcharacter_repetition_filter
id004292597
char_rep_ratio0.720207
captionharden harden harden harden harden harden harden harden harden harden harden harden harden harden harden harden harden harden harden harden harden
image
image
commentsTexts with too large character repetition ratio might contain repeat contents (captions of LCS-558K are generated by BLIP model)
itemvalue
from OPflagged_words_filter
id001013268
flagged_words_ratio0.263158
captionporn photo video porn cartoon porn porn pictures online for adult porn
imageWon't display this image
commentsTexts with non-zero flagged words ratio might contain NSFW contents
itemvalue
from OPperplexity_filter
id002606088
perplexity19789.9
captionreal white pearl stud earrings sterling 925 925 9250 9210 9240 9210 9280 stud
image
image
commentsTexts with too large perplexity might contain meaningless contents
itemvalue
from OPimage_deduplicator
ids004559803, 003716167, 005659131
image
image
commentsThere are some duplicate images with different names

MMC4

MMC4
itemvalue
from OPimage_aspect_ratio_filter
aspect_ratios[1.6, 10.4651162791]
corresponding text"We found that kahweol acetate and cafestol inhibited growth of cancer cells in mice, but the combination seemed to work synergistically, leading to a significantly slower tumour growth than in untreated mice," said lead author Hiroaki Iwamoto.
image
image
commentsUnaligned image and caption contents. Images with too large aspect ratio might lose too much information after being processed as model input
itemvalue
from OPimage_size_filter
image_sizes[453, 198343]
corresponding textIf you're in InfoSec, you are well aware of how this flies in the face of security team demographics.
image
image
commentsUnaligned image and caption contents. Images with too small size might contains meaningless simple contents
itemvalue
from OPimage_size_filter
image_sizes[481, 517, 532, 482]
corresponding text["Level Up Coin (LUC) is a cryptocurrency token and operates on the Ethereum platform.", "Level Up Coin has a current supply of 1,298,120,000 LUC with 996,923,370 LUC in circulation.", "The last known price of Level Up Coin is 0.000257 USD and is up 23.24% over the last 24 hours.", "More information can be found at https://play2live.io."]
image
image
commentsImages with too small size might be QR codes that contain sensitive contents
itemvalue
from OPimage_text_matching_filter
image_text_matching_score[0.0012427607]
corresponding textMany a times, we face problems connecting to the internet in spite of the Android smartphone being connected to the Wi-Fi.
image
image
commentsUnaligned image and caption contents. Some ad images might be mistakenly regarded as part of the sample.
itemvalue
from OPword_repetition_filter
word_rep_ratio0.917219
text
text
Congratulations on your great designs ladies!
Please feel free to grab out “Top 3” blog badge from the right side bar and display it proudly.
For even more great designs, jump on back and check out all of the great entries to Sketch #169 here.
We love Wednesdays because we get to share with you our Design Team Creations!
Here is what our fantastic design team came up with for Sketch #170.
Congratulations on your great designs ladies!
Please feel free to grab out “Top 3” blog badge from the right side bar and display it proudly.
For even more great designs, jump on back and check out all of the great entries to Sketch #168 here.
We love Wednesdays because we get to share with you our Design Team Creations!
Here is what our fantastic design team came up with for Sketch #169.
Congratulations on your great designs ladies!
Please feel free to grab out “Top 3” blog badge from the right side bar and display it proudly.
For even more great designs, jump on back and check out all of the great entries to Sketch #167 here.
We love Wednesdays because we get to share with you our Design Team Creations!
Here is what our fantastic design team came up with for Sketch #168.
Top 3 for Sketch 166 and Design Team projects for Sketch 167.
Congratulations on your great designs ladies!
Please feel free to grab out “Top 3” blog badge from the right side bar and display it proudly.
For even more great designs, jump on back and check out all of the great entries to Sketch #166 here.
We love Wednesdays because we get to share with you our Design Team Creations!
Here is what our fantastic design team came up with for Sketch #167.
commentsTexts with too large word repetition ratio might be a list of similar, repeated, but not the same contents

Text-only Datasets

Wikipedia

Wikipedia
itemvalue
from OPalphanumeric_filter
wiki pagelink
alnum_ratio0.262965
text
text
Хаос
|
Уран + Гея Тартар,Ерос,Ереб,Нюкта
| |
Тифон, Питон, Ехидна, Хемера,Ефир
циклопи, хекатонхейри

еринии, Понт, Хипнос, хеспериди,

Мом, Немезида, Ирида, Танатос, Морфей
| | | | ...
|Нерей Форкис Кето
| | ||
|Нереиди |
| /горгони, греи /
|т и т а н и
| | | | | | | | | | | | |
Кронос+Рея Диона Океан +Тетия Япет Мнемозина Криос Хиперион + Тея Койос + Феба Темида
|
| + |
_| + + | || ||
+
| | /океаниди/ | | | | | | | | | |
| | Метида Климена | | Палант Астрей Еос Селена Хелиос Астерия Лето |
| | | |
| | | ||_ | + |
| | | | | | | Нике | | | Хеката | |
| | | Епиметей | Борей Зефир Нот | |
| | | Прометей | | |
| | | Менетей | | |
| | | Атлас | | |
| | | | | | | | |
| | | Мая | | Артемида Аполон |
| | | | |
_ | | |
| | | | | | | Асклепий |
| | | | музи | | |
|
| | | |__ | | | | | |
| | | | | | | Хигия Панацея Подалирий |
| | | Хермес | | | |
| | | | | | | | | | | |
| | | Пан | | | | мойри ори харити Астрея
| | |
| | | |
| | | | | | | |
| | Атина | | | | |
| | | | | | |
| | | | | | | |
| Афродита | | | | | |
| + | | | | | |
| | |
|||||
| | богове ||||||
||+

| | | | | | |
Посейдон | Хера Зевс Деметра Хадес Хестия
| | ||+|| |
| Хефест | | | | | |
| Хеба Арес Илифия | Персефона |
|_| | + |
| | | | |
|
Фобос Деймос Хармония |
| |
Семела |
||
|
Дионис

Древногръцки богове
Родословни дървета
commentsTexts with too small alnum ratio might only contain structural content, which might be hard to learn
itemvalue
from OPcharacter_repetition_filter
wiki pagelink
char_rep_ratio0.818624
text
text
{| class="wikitable" style="font-size: 95%; float:center"
|-
! Predajnik
! RTS 1
! RTS 2
! Radio Beograd 1
! Radio Beograd 2
! Radio Beograd 3 (202)
! RTV 1
! RTV 2
! RNS 1
! RNS 2
! RNS 3
! Pink TV
!
!
!
!
!
!
|-
| Avala
| align="center" | 6
| align="center" | 22
| align="center" | 95.3
| align="center" | 97.6
| align="center" | 104.0
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | 45
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Deli Jovan
| align="center" | 43
| align="center" | 23
| align="center" | 87.7
| align="center" | 94.9
| align="center" | 98.9
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Kopaonik, Gobelja
| align="center" | 3
| align="center" | 41
| align="center" | 90.9
| align="center" | 93.7
| align="center" | 102.1
| align="center" |
| align="center" | 58
| align="center" | 39
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Jastrebac
| align="center" | 5
| align="center" | 27
| align="center" | 96.9
| align="center" | 89.3
| align="center" | 103.5
| align="center" |
| align="center" | 21
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Crni Vrh, Jagodina
| align="center" | 11
| align="center" | 35
| align="center" | 89.7
| align="center" | 99.3
| align="center" | 101.0
| align="center" |
| align="center" |
| align="center" | 55
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Vršac, Breg
| align="center" | 11
| align="center" | 39
| align="center" | 95.7
| align="center" | 98.1
| align="center" | 103.0
| align="center" |
| align="center" |
| align="center" | 52
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Ovčar
| align="center" | 8
| align="center" | 42
| align="center" | 88.1
| align="center" | 90.1
| align="center" | 101.6
| align="center" | 51
| align="center" | 67
| align="center" | 25
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Subotica
| align="center" | 5
| align="center" | 61
| align="center" | 88.9
| align="center" | 101.1
| align="center" | 98.5
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Tupižnica
| align="center" | 10
| align="center" | 25
| align="center" | 92.5
| align="center" | 96.1
| align="center" | 100.4
| align="center" |
| align="center" | 44
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Hotel Venac
| align="center" | 10
| align="center" | -
| align="center" | -
| align="center" | 96.5
| align="center" | 101.8
| align="center" |
| align="center" |
| align="center" | 12
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Iriški Venac
| align="center" | -
| align="center" | 24
| align="center" | 94.5
| align="center" | -
| align="center" | -
| align="center" | 41
| align="center" | 48
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Novi Beograd, Geneks
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | 59
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Beograd, Kanarevo Brdo
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | 35
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Beograd, Stojčino brdo
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | 52
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Kosmaj
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | /
| align="center" | 62
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Avala
| align="center" |
| align="center" | 98.5
| align="center" | 101.4
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Tupižnica
| align="center" |
| align="center" | 94.1
| align="center" |
| align="center" | 105.3
| align="center" |
| align="center" |
| align="center" |
| align="center" | 34
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Šutenovačko Brdo
| align="center" |
| align="center" | 101.2
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 43
| align="center" | 21
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Kladovo
| align="center" |
| align="center" | 96.5
| align="center" | 102.0
| align="center" | 94.5
| align="center" |
| align="center" |
| align="center" | 53
| align="center" | 35
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Venac
| align="center" |
| align="center" | 97.2
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Jadovnik
| align="center" |
| align="center" | 104.7
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Crni Vrh, Jagodina
| align="center" |
| align="center" | 106.8
| align="center" | 91.1
| align="center" | 102.9
| align="center" |
| align="center" |
| align="center" | 38
| align="center" | 58
| align="center" | 61
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Vlaina
| align="center" |
| align="center" |
| align="center" | 106.5
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Vršac
| align="center" |
| align="center" |
| align="center" | 93.0
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Zrenjanin
| align="center" |
| align="center" |
| align="center" | 88.7
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Maljen
| align="center" |
| align="center" |
| align="center" | 107.2
| align="center" |
| align="center" |
| align="center" |
| align="center" | 32
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Ada
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 46
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Bešenovački Prnjavor
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 61
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Šid
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 60
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Kula, Silos
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 68
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Zrenjanin, Titelski breg
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 61
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Cer
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 52
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Besna Kobila
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 59
| align="center" | 47
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Tornik
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 59
| align="center" |
| align="center" | 63
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Beograd, Kanarevo brdo
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 32
| align="center" | 61
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Užice, Zabučje
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 26
| align="center" | 36
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Raška, Gradac
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 38
| align="center" | 65
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Beograd, Stojčino brdo
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 25
| align="center" | 8
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Priboj, Bić
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 43
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Prijepolje, Gradina
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 42
| align="center" | 51
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Nova Varoš
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 34
| align="center" | 69
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Valjevo, Pećina
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 58
| align="center" | 21
| align="center" | 49
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
| Bajina Bašta
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" | 39
| align="center" | 30
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
commentsTexts with too large character repetition ratio might contain the same style code in a table for the cells
itemvalue
from OPspecial_characters_filter
wiki pagelink
special_char_ratio0.861592
text
text
This page indexes the individual year in politics pages.

20th century

1960s: 1960 - 1961 - 1962 - 1963 - 1964 - 1965 - 1966 - 1967 - 1968 - 1969
1970s: 1970 - 1971 - 1972 - 1973 - 1974 - 1975 - 1976 - 1977 - 1978 - 1979
1980s: 1980 - 1981 - 1982 - 1983 - 1984 - 1985 - 1986 - 1987 - 1988 - 1989
1990s: 1990 - 1991 - 1992 - 1993 - 1994 - 1995 - 1996 - 1997 - 1998 - 1999

21st century
2000s: 2000 - 2001 - 2002 - 2003 - 2004 - 2005 - 2006 - 2007 - 2008 - 2009
2010s: 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019
2020s: 2020 - 2021 - 2022 - 2023

Politics
commentsTexts with too many special characters might be a list of some other pages
itemvalue
from OPtext_length_filter
wiki pagelink
text_len9
text
text
Canon law
commentsTexts with too short content might be an empty page
itemvalue
from OPword_repetition_filter
wiki pagelink
word_rep_ratio0.965517
text
text


Seasons
1862–63 Barnes F.C. season
1863–64 Barnes F.C. season
1864–65 Barnes F.C. season
1865–66 Barnes F.C. season
1866–67 Barnes F.C. season
1867–68 Barnes F.C. season
1868–69 Barnes F.C. season
1869–70 Barnes F.C. season
1870–71 Barnes F.C. season
1871–72 Barnes F.C. season
1872–73 Barnes F.C. season
1873–74 Barnes F.C. season


Barnes
commentsTexts with too large word repetition ratio might be a list of relevant, repeated, but not the same contents

Books

Books
itemvalue
from OPalphanumeric_filter
alnum_ratio0
text
text
#
#

#
#

#
#

#
#

#

commentsTexts with too small alnum ratio might only contain meaningless tokens
itemvalue
from OPcharacter_repetition_filter
char_rep_ratio0.86
text
text

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|

|
|
|

www.stmartins.com

MacmillanSpecialMarkets@macmillan.com

mollysuberthorpe.com

@mollysuberthorpe
|
|
|

|
|
|
|
|

Preface

CHAPTER ONE

Methods for Effective Practice

Principles of Modern Calligraphy Styles

Letter & Grid Construction

A Guide to Supplies

Help Desk

CHAPTER TWO

Warm-up Exercises & Drills

Pencil Warm-ups

Stroke Drills

CHAPTER THREE

Modern Calligraphy, Letter by Letter

The Building Blocks

Arrow Key

A Modern Basic Alphabet

Modern Variations: Letters, Numbers & Symbols

Modern Ligatures

CHAPTER FOUR

From Letters to Words

One Change at a Time

Word Explorations

CHAPTER FIVE

Five Complete Alphabets

Quip

Nautica

Cream Soda

Dalliance

Blackboard Monoline

CHAPTER SIX

Majuscules & Monograms

Borealis

Chubby Bunny

Vice Versa

CHAPTER SEVEN

Small Layouts

Layout Practice

Writing on Curves & Slants

Envelope Addressing

All-caps Styles

CHAPTER EIGHT

Flourishes & Borders

Flourish Drills

Decorative Flourishes

Illustrative Flourishes

Wreaths & Laurels

APPENDIX I

Grids & Guides

APPENDIX II

Glossary

APPENDIX III

Additional Resources

About the Author

Acknowledgments

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|

......
commentsTexts with too large character repetition ratio might contain lots of repeated contents
itemvalue
from OPperplexity_filter
perplexity380817.4
text
text


ISBN 978-1448-18265-7

commentsTexts with too large perplexity might contain hard-to-understand contents (e.g. ISBN)
itemvalue
from OPlanguage_id_score_filter
lang_score0.057
langen
text
text
......
vUtyh ¼vatkWl½] eq>s fo'okl gh ugha gksrk fd esjh igyh lUrku] og uUgh cPph ftlds fy, geus mlds tUe ls eghuksa igys brus mRlkg ls I;kjk lk dejk ltk;k Fkk] vkt nqfu;k Hkj esa ?kwe pqdh gS vkSj Q+fuZpj fMt+kbu djus esa viuh txg cuk pqdh gSA geas irk py tkuk pkfg, Fkk fd rqe bl {ks= esa dke djksxh D;ksafd tc rqe cPph gh Fkh rc Hkh LVkbfy'k diM+ksa ds izfr rqEgkjk izse] ckjhd fooj.kksa dks Hkkaius okyh rqEgkjh fuxkg vkSj bl ckr ij tksj nsuk fd rqEgkjs vkl&ikl dh phtsa ,dne oSlh gh gksa tSlh rqe pkgrh gks] gesa nax dj nsrk FkkA bl ds lkFk gh rqe viuh xSj ljdkjh laLFkk ds izfr viuh opuc)rk dks Hkh iwjh rjg fuHkkrh gksA rqEgkjs fnu dk ,d&,d iy O;Lr gksrk gSA dHkh&dHkh eSa rqEgsa Fkdk gqvk ns[krh gwa ysfdu fQj lksprh gwa fd ;g rqEgkjk esgur djus dk le; gSA ;g rks rqEgkjh 'kq:vkr gSA

vkjrh ¼vkVZl~½] esjh nwljh cPph] rqEgsa irk ugha gS fd eSa bl ckr dk fdruk bUrt+kj dj jgh gwa fd rqe tku yks fd dkuwu gh rqEgkjs fy, lgh jkg gSA eSa mEehn djrh gwa fd dkuwu dh fMxzh dh i<+kbZ ds fy, U;w;kWdZ esa fcrk, tkus okys rhu lky rqEgsa ;g le>kus esa l{ke gksaxs fd rqEgkjh eka tks dj jgh gS ml dke esa Hkh vPNkbZ gSA dHkh&dHkh eSa izkFkZuk djrh gwa fd rqe esjs lkFk dke djks ysfdu ;fn rqe dksbZ vkSj dke [kkstks tks rqEgsa dkuwu ls Hkh T+;knk ilUn vk,] rks Hkh eSa rqEgkjs fy, [+kq'k gksÅaxhA

vfnrh ¼vnwcsu½] esjh lcls NksVh cPph] rqEgkjs firk dks ;d+hu gS fd rqe gekjh lcls T+;knk izfrHkk'kkyh csVh gks] ftls ,d ckj ;g irk py tk, fd mldk fny D;k pkgrk gS] rks og viuh cguksa ls cgqr T+;knk vkxs tk,xhA rks rqEgsa gekjh lcls Åaph mEehnkas ij [kjk mrjuk gS!
......
commentsTexts with too low language score might contain unreadable texts
itemvalue
from OPspecial_characters_filter
special_char_ratio0.999
text
text


## PageList

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
162.
163.
164.
165.
166.
167.
168.
169.
170.
171.
172.
173.
174.
175.
176.
177.
178.
179.
180.
181.
182.
183.
184.
185.
186.
187.
188.
189.
190.
191.
192.
193.
194.
195.
196.
197.
198.
199.
200.
201.
202.
203.
204.
205.
206.
207.
208.
209.
210.
211.
212.
213.
214.
215.
216.
217.
218.
219.
220.
221.
222.
223.
224.
225.
226.
227.
228.
229.
230.
231.
232.
233.
234.
235.
236.
237.
238.
239.
240.
241.
242.
243.
244.
245.
246.
247.
248.
249.
250.
251.
252.
253.
254.
255.
256.
257.
258.
259.
260.
261.
262.
263.
264.
265.
266.
267.
268.
269.
270.
271.
272.
273.
274.
275.
276.
277.
278.
279.
280.
281.
282.
283.
284.
285.
286.
287.
288.
289.
290.
291.
292.
293.
294.
295.
296.
297.
298.
299.
300.
301.
302.
303.
304.
305.
306.
307.
308.
309.
310.
311.
312.
313.
314.
315.
316.
317.
318.
319.
320.
321.
322.
323.
324.
325.
326.
327.
328.
329.
330.
331.
332.
333.
334.
335.
336.
337.
338.
339.
340.
341.
342.
343.
344.
345.
346.
347.
348.
349.
350.
351.
352.
353.
354.
355.
356.
357.
358.
359.
360.
361.
362.
363.
364.
365.
366.
367.
368.
369.
370.
371.
372.
373.
374.
375.
376.
377.
378.
379.
380.
381.
382.
383.
384.
385.
386.
387.
388.
389.
390.
391.
392.
393.
394.
395.
396.
397.
398.
399.
400.
401.
402.
403.
404.
405.
406.
407.
408.
409.
410.
411.
412.
413.
414.
415.
416.
417.
418.
419.
420.
421.
422.
423.
424.
425.
426.
427.
428.
429.
430.
431.
432.
433.
434.
435.
436.
437.
438.
439.
440.
441.
442.
443.
444.
445.
446.
447.
448.
449.
450.
451.
452.
453.
454.
455.
456.
457.
458.
459.
460.
461.
462.
463.
464.
465.
466.
467.
468.
469.
470.
471.
472.
473.
474.
475.
476.
477.
478.
479.
480.
481.
482.
483.
484.
485.
486.
487.
488.
489.
490.
491.
492.
493.
494.
495.
496.
497.
498.
499.
500.
501.
502.
503.
504.
505.
506.
507.
508.
509.
510.
511.
512.
513.
514.
515.
516.
517.
518.
519.
520.
521.
522.
523.
524.
525.
526.
527.
528.
529.
530.
531.
532.
533.
534.
535.
536.
537.
538.
539.
540.
541.
542.
543.
544.
545.
546.
547.
548.
549.
550.
551.
552.
553.
554.
555.
556.
557.
558.
559.
560.
561.
562.
563.
564.
565.
566.
567.
568.
569.
570.
571.
572.
573.
574.
575.
576.
577.
578.
579.
580.
581.
582.
583.
584.
585.
586.
587.
588.
589.
590.
591.
592.
593.
594.
595.
596.
597.
598.
599.
600.
601.
602.
603.
604.
605.
606.
607.
608.
609.
610.
611.
612.
613.
614.
615.
616.
617.
618.
619.
620.
621.
622.
623.
624.
625.
626.
627.
628.
629.
630.
631.
632.
633.
634.
635.
636.
637.
638.
639.
640.
641.
642.
643.
644.
645.
646.
647.
648.
649.
650.
651.
652.
653.
654.
655.
656.
657.
658.
659.
660.
661.
662.
663.
664.
665.
666.
667.
668.
669.
670.
671.
672.
673.
674.
675.
676.
677.
678.
679.
680.
681.
682.
683.
684.
685.
686.
687.
688.
689.
690.
691.
692.
693.
694.
695.
696.
697.
698.
699.
700.
701.
702.
703.
704.
705.
706.
707.
708.
709.
710.
711.
712.
713.
714.
715.
716.
717.
718.
719.
720.
721.
722.
723.
724.
725.
726.
727.
728.
729.
730.
731.
732.
733.
734.
735.
736.
737.
738.
739.
740.
741.
742.

commentsTexts with too large special character ratio might contain meaningless contents

Stack Exchange

Stack Exchange
itemvalue
from OPcharacter_repetition_filter
char_rep_ratio0.969099481
text
text
Q: Silent sound data uri? Does anyone know of a way to set a data uri to a valid silent sound? I'd be really curious to see if anything exists like that! Thank you.
A: A 0-second WAVE file:
data:audio/wav;base64,UklGRjIAAABXQVZFZm10IBIAAAABAAEAQB8AAEAfAAABAAgAAABmYWN0BAAAAAAAAABkYXRhAAAAAA==
A: i love data uri s so i did this:
http://doiop.com/silent[DOT]mp3
which points to
http://tinyurl[DOT]com/silentmp3
which points to
data:audio/x-wav;
base64,UklGRooWAABXQVZFZm10IBAAAAABAAEAIlYAAESsAAACABAAZGF0YWYWAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
commentsTexts with too large character repetition ratio might contain the base64 code of an image
itemvalue
from OPwords_num_filter
num_words2
text
text
Q: Append
commentsTexts with too few words might be missing content

ArXiv

ArXiv
itemvalue
from OPtext_length_filter
text_len7
text
text
\part*{
commentsTexts with too short content might be missing content
itemvalue
from OPperplexity_filter
perplexity244697
text
text
\section{Information about various village networks}
\begingroup
\squeezetable
\begin{table*}
\caption{Summary details of the
contact network properties for the villages considered in this study.}
\begin{ruledtabular}
\begin{tabular}{c c c c c c c c c c}

SlNo&Nodes&LCC&<k><k>&Modules&Q&μsz\mu_{sz}&σsz\sigma_{sz}&LMS&r\ \hline
1&843&825&8.2&24&0.6782&34.375&19.6783&72&0.02\
2&877&810&7.2123&23&0.7396&35.2174&22.446&91&0.0149\
3&1380&1318&7.9841&25&0.7312&52.72&38.6539&189&0.0161\
4&1025&957&7.2403&25&0.7315&38.28&25.6117&99&0.0155\
5&650&641&7.869&25&0.7477&25.64&20.268&70&0.0154\
6&451&434&6.9954&24&0.7036&18.0833&15.4809&63&0.0193\
7&732&719&9.0668&23&0.7078&31.2609&25.1555&83&0.0195\
8&444&440&8.2818&22&0.7051&20.0&20.3693&75&0.0222\
9&928&914&8.4497&22&0.703&41.5455&33.3517&141&0.0212\
10&354&346&8.8353&19&0.6614&18.2105&15.2784&48&0.0294\
11&605&589&7.8727&23&0.6986&25.6087&23.814&92&0.023\
12&794&760&7.7132&23&0.7225&33.0435&20.0531&74&0.0167\
13&-&-&-&-&-&-&-&-&-\
14&675&645&8.1054&23&0.7453&28.0435&21.1115&91&0.0155\
15&853&852&8.9648&23&0.6954&37.0435&24.5985&90&0.0204\
16&712&693&9.3059&22&0.7152&31.5&24.7694&99&0.0201\
17&879&850&8.6494&25&0.7284&34.0&28.1539&123&0.0177\
18&1146&1140&9.157&24&0.7522&47.5&34.87&127&0.0149\
19&1134&1118&9.2844&24&0.7484&46.5833&31.0979&131&0.014\
20&714&633&8.6477&23&0.744&27.5217&18.5117&66&0.0139\
21&1046&1011&8.6311&25&0.7347&40.44&32.0063&116&0.0164\
22&-&-&-&-&-&-&-&-&-\
23&1252&1186&8.6636&24&0.7572&49.4167&37.5732&125&0.0144\
24&835&820&9.3098&25&0.7222&32.8&22.735&79&0.0165\
25&1313&1286&9.4619&25&0.7751&51.44&32.2404&124&0.0116\
26&674&666&9.0405&21&0.7576&31.7143&28.3198&127&0.0152\
27&708&682&7.4091&25&0.7408&27.28&18.3794&67&0.0144\
28&1612&1570&9.5822&25&0.7812&62.8&39.8628&163&0.0112\
29&1337&1270&7.8276&25&0.7651&50.8&26.9741&115&0.0113\
30&689&675&8.8607&23&0.7489&29.3478&25.1441&91&0.0165\
31&851&819&9.0317&25&0.7893&32.76&22.0023&97&0.0106\
32&1181&1136&9.6514&25&0.7296&45.44&30.0281&102&0.0157\
33&843&824&7.7415&25&0.7279&32.96&29.0799&101&0.0174\
34&692&628&7.1385&24&0.7702&26.1667&26.1465&115&0.0128\
35&806&756&7.2302&25&0.758&30.24&27.388&106&0.0133\
36&1214&1168&8.8733&24&0.7147&48.6667&53.5698&197&0.0214\
37&500&482&7.7759&16&0.6593&30.125&24.2922&91&0.0324\
38&736&726&8.1267&24&0.7723&30.25&21.9037&82&0.0124\
39&1339&1294&9.1376&25&0.7666&51.76&31.9466&134&0.012\
40&1097&1064&8.0442&25&0.7713&42.56&42.0562&168&0.0135\
41&724&703&7.8862&20&0.7163&35.15&26.3615&76&0.0203\
42&853&805&8.0807&25&0.7587&32.2&32.3394&117&0.0153\
43&875&861&8.295&24&0.7331&35.875&30.5195&121&0.016\
44&978&965&8.8518&25&0.7212&38.6&40.4366&136&0.0202\
45&1073&1044&8.2356&25&0.7782&41.76&29.4405&113&0.0116\
46&1257&1216&7.8544&24&0.7683&50.6667&34.6671&155&0.0125\
47&680&660&8.5848&24&0.719&27.5&20.4185&70&0.0174\
48&808&794&8.9232&24&0.6998&33.0833&31.5686&116&0.0225\
49&766&689&8.7083&22&0.6439&31.3182&35.6987&140&0.0362\
50&999&937&8.9883&25&0.6994&37.48&43.5175&145&0.0235\
51&1061&1015&10.6591&21&0.6734&48.3333&55.2943&187&0.0308\
52&1525&1497&10.4369&25&0.7339&59.88&69.595&276&0.0192\
53&642&630&9.0683&23&0.6573&27.3913&32.5746&110&0.0322\
54&467&458&10.1528&20&0.6636&22.9&28.0141&111&0.0326\
55&1180&1151&7.9644&24&0.8192&47.9583&33.5478&127&0.0087\
56&573&553&7.8807&23&0.7223&24.0435&20.6955&76&0.0189\
57&948&919&8.2459&25&0.7222&36.76&29.2278&130&0.0174\
58&914&905&8.8541&25&0.744&36.2&27.0289&110&0.0153\
59&1599&1552&8.5393&24&0.7936&64.6667&46.4091&182&0.0108\
60&1775&1729&8.7953&25&0.789&69.16&42.0692&215&0.0105\
61&591&572&9.3776&20&0.7299&28.6&25.4213&104&0.0191\
62&994&980&9.1816&24&0.7676&40.8333&30.4872&131&0.013\
63&786&774&8.0284&23&0.7833&33.6522&23.392&91&0.0119\
64&1286&1265&8.7763&25&0.7926&50.6&32.3988&115&0.0103\
65&1331&1301&9.3766&24&0.7101&54.2083&42.7444&149&0.0194\
66&814&790&7.8848&23&0.7353&34.3478&29.078&102&0.0173\
67&893&885&8.5401&22&0.7311&40.2273&36.4778&128&0.0192\
68&663&655&8.1389&22&0.7014&29.7727&26.6218&90&0.0225\
69&875&866&10.4688&23&0.6739&37.6522&42.5457&148&0.0281\
70&899&891&9.3547&20&0.6634&44.55&37.4639&129&0.028\
71&1387&1345&8.3836&24&0.7556&56.0417&44.8046&151&0.0144\
72&999&977&8.6192&25&0.6973&39.08&33.5511&147&0.0217\
73&870&858&9.4767&22&0.7207&39.0&31.7905&101&0.0199\
74&743&724&8.2735&23&0.7707&31.4783&22.9307&89&0.0131\
75&831&815&10.0454&24&0.7199&33.9583&43.6305&170&0.0236\
76&1154&1126&8.3064&25&0.7878&45.04&34.5861&128&0.0114\
77&707&671&8.2355&22&0.7551&30.5&24.5352&82&0.015\
\end{tabular}
\end{ruledtabular}
\label{table1}
\end{table*}
\endgroup

{
commentsTexts with too large perplexity might be the table area in LaTeX code

Github Code

Github Code
itemvalue
from OPtext_length_filter
text_len10
text
text
new Date<A
commentsCode with too short content might be missing/meaningless content
itemvalue
from OPaverage_line_length_filter
avg_line_length4.8571428571
text
text
This
should
result
in
an
error
commentsCode with too short average line length might be "bad" code