Progress

June 15, 2022 · View on GitHub

List datasets which are supported by SDK and their associated information w.r.t task schema

Datasets	Updated Date	Task Schema	Normalized State	Comments	Constructor
govreport	2022-02-01	Summarization	Done	Current definition: `text`, `summary`	yixinliu
duorc	2022-02-03	QuestionAnsweringExtractive	Pending	different ids (`plot_id`, `q_id`) should be unified	jinlanfu
wiki_hop	2022-02-03	QuestionAnsweringExtractive	Pending	Two new fields: `candidates`, `annotations`	jinlanfu
hotpot_qa	2022-02-03	QuestionAnsweringHotpot	Pending	Many new fileds (`supporting_facts`), `context` will be a json with a list of sentences.	jinlanfu
ropes	2022-02-03	QuestionAnsweringExtractive	Pending	Many new fileds (`situation`).	jinlanfu
squad_adversarial	2022-02-03	QuestionAnsweringExtractive	Done	Current definition:`question`,`context`,`answers`.	jinlanfu
quoref	2022-02-03	QuestionAnsweringExtractive	Done	Current definition:`question`,`context`,`answers`.	jinlanfu
spider	2022-02-03	SemanticParsing	Pending	Current definition: `question`, `query`	jinlanfu
atis	2022-02-05	TextClassification	Done	Current definition:`text`,`label`	weizhe
cr	2022-02-06	TextClassification	Done	Current definition:`text`,`label`	weizhe
mr	2022-02-06	TextClassification	Done	Current definition:`text`,`label`	weizhe
qc	2022-02-06	TextClassification	Done	Current definition:`text`,`label`	weizhe
subj	2022-02-06	TextClassification	Done	Current definition:`text`,`label`	weizhe
afqmc	2022-02-06	TextMatching	Done	Current definition:`text1`,`text2`, `label`	zhengfu
sst2	2022-02-07	TextClassification	Done	Current definition:`text`,`label`	weizhe
race	2022-02-07	QuestionAnsweringMultipleChoices	Done	Current definition:`questions`,`context`,`options`,`answers`. Note that (1) some datasets are with/without context. (2) additional `exmaple id`	jinlanfu
drop	2022-02-08	QuestionAnsweringAbstractive	Pending	(1) Abstractive QA; (2) `answers` field has a new feature named (`types`).	jinlanfu
fb15k_237	2022-02-09	KGLinkPrediction	Done	Current definition:`head`,`link`, `tail`	Pengfei
restaurant14	2022-02-09	AspectBasedSentimentClassification	Done	Current definition:`aspect`,`text`,`label`	weizhe
sst5	2022-02-10	TextClassification	Done	Current definition:`text`,`label`	weizhe
restaurant16	2022-02-10	AspectBasedSentimentClassification	Done	Current definition:`aspect`,`text`,`label`	weizhe
openbookqa	2022-02-11	QuestionAnsweringMultipleChoicesWithoutContext	Pending	(1) current field: `question`, `options`, `answers`: `text`, `option_idx`; (2) The type of `answers.text` and `answers.option_idx` are `String` not `List`.	jinlanfu
commonsense_qa	2022-02-11	QuestionAnsweringMultipleChoicesWithoutContext	Pending	(1) current field: `question`, `options`, `answers`: `text`, `option_idx`; (2) The type of `answers.text` and `answers.option_idx` are `String` not `List`. (3) The test set does not provide annotated answers.	jinlanfu
winogrande	2022-02-11	QuestionAnsweringMultipleChoicesWithoutContext	Pending	(1) current field: `question`, `options`, `answers`: `text`, `option_idx`; (2) The type of `answers.text` and `answers.option_idx` are `String` not `List`. (3) The test set does not provide annotated answers.	jinlanfu
laptop14	2022-02-11	AspectBasedSentimentClassification	Done	Current definition:`aspect`,`text`,`label`	weizhe
twitter	2022-02-11	AspectBasedSentimentClassification	Done	Current definition:`aspect`,`text`,`label`	weizhe
natural_questions	2022-02-12	QuestionAnsweringAbstractiveNQ	Pending	(1) current field: `question`, `context`, `answers`. Unlike extraction QA or abstract QA, `natural_questions` has a complex structure (see the NQ schema definition). (2) The dataset is very large, occupying 135G of disk storage.	jinlanfu
ai2_arc	2022-02-17	QuestionAnsweringMultipleChoicesWithoutContext	Done	(1) current field: `question`, `options`, `answers`: `text`, `option_idx`;	jinlanfu
social_i_qa	2022-02-17	QuestionAnsweringMultipleChoices	Done	(1) Current definition:`questions`,`context`,`options`,`answers`.	jinlanfu
piqa	2022-02-17	QuestionAnsweringMultipleChoicesWithoutContext	Done	(1) current field: `question`, `options`, `answers`: `text`, `option_idx`;	jinlanfu
codah	2022-02-17	QuestionAnsweringMultipleChoicesWithoutContext	Pending	(1) current field: `question`, `options`, `answers`: `text`, `option_idx`; (2) There is a new but important field `question_category`.	jinlanfu
qasc	2022-02-17	QuestionAnsweringMultipleChoicesQASC	Pending	(1) Current definition:`questions`,`context`,`options`,`answers`. (2) The test set has no labeled answers. (3) `context` is a dictionary with fields `fact1`, `fact2` and `combinedfact`. (4) `qasc` has new field named `formatted_question`.	jinlanfu
wikihow	2022-02-17	Summarization	Done	Current definition: `text`, `summary`	yixinliu
wikisum	2022-02-17	Summarization	Done	Current definition: `text`, `summary`	yixinliu
reddit_tifu	2022-02-17	Summarization	Done	Current definition: `text`, `summary`	yixinliu
bigpatent	2022-02-17	Summarization	Done	Current definition: `text`, `summary`	yixinliu
multi_xscience	2022-02-17	Summarization, MultiDocSummarization	Done	Current definition: (1) Summarization: `text`, `summary`, (2) MultiDocSummarization: `texts`, `summary`	yixinliu
multinews	2022-02-17	Summarization, MultiDocSummarization	Done	Current definition: (1) Summarization: `text`, `summary`, (2) MultiDocSummarization: `texts`, `summary`	yixinliu
dialogsum	2022-02-17	Summarization, DialogSummarization	Done	Current definition: (1) Summarization: `text`, `summary`, (2) DialogSummarization: `dialogue: {"speaker": List[str], "text": List[str]}`, `summary: List[str]`	yixinliu
samsum	2022-02-17	Summarization, DialogSummarization	Done	Current definition: (1) Summarization: `text`, `summary`, (2) DialogSummarization: `dialogue: {"speaker": List[str], "text": List[str]}`, `summary: List[str]`	yixinliu
qmsum	2022-02-17	Summarization, QuerySummarization	Done	Current definition: (1) Summarization: `text`, `summary`, (2) QuerySummarization: `text`, `summary`, `query`	yixinliu
tydiqa	2022-03-08	QuestionAnsweringExtractive	Done	Multilingual QA datasets with 9 languages. Current definition:`question`,`context`,`answers`.	jinlanfu
mlqa	2022-03-08	QuestionAnsweringExtractive	Done	Multilingual QA datasets with 7 languages without a training set. Current definition:`question`,`context`,`answers`.	jinlanfu
dcqa	2022-03-08	QuestionAnsweringDCQA	Pending	The context is a sequence of sentences with ordinal numbers. Current definition:`question`,`context: SentenceID, text`,`answers: SentenceID, text`.	jinlanfu
waimai	2022-04-29	TextClassification	Done	This dataset has only one training set and no test set. Current definition:`label`,`text`.	zihanzhu
chnsenticorp_hotel	2022-05-06	TextClassification	Done	This dataset has only one training set and no test set. Current definition:`label`,`text`.	zihanzhu
onlineshopping	2022-05-06	TextClassification	Done	This dataset has only one training set and no test set. Current definition:`label`,`category`,`text`.	zihanzhu
tnews	2022-05-06	TextClassification	Done	This dataset has training set and validation set. The test set has no label, so we have not added the test set for now. Current definition:`label`,`text`,`keywords`.	zihanzhu
weibo_4moods	2022-05-06	TextClassification	Done	This dataset has only one training set and no test set. Current definition:`label`,`text`.	zihanzhu
weibo_senti	2022-05-06	TextClassification	Done	This dataset has only one training set and no test set. Current definition:`label`,`text`.	zihanzhu
asap_sent	2022-05-07	TextClassification	Done	This dataset has training set and validation set. The test set has no label, so we have not added the test set for now. Current definition:`label`,`text`.	zihanzhu
nlpcc14_sc	2022-05-07	TextClassification	Done	This dataset has training set. The test set has no label, so we have not added the test set for now. No related papers found. Current definition:`label`,`text`.	zihanzhu
se_absa16_came	2022-05-07	TextClassification	Done	The test set has no label, so we have not added the test set for now. In addition to label and text, there is also a variable called "evaluation object" in the data set, which we have not added for now. Current definition:`label`,`text`.	zihanzhu
se_absa16_phns	2022-05-07	TextClassification	Done	The test set has no label, so we have not added the test set for now. In addition to label and text, there is also a variable called "evaluation object" in the data set, which we have not added for now. Current definition:`label`,`text`.	zihanzhu
bq_corpus	2022-05-09	TextMatching	Done	This dataset has training set and validation set. The test set has no label. Current definition:`text1`,`text2`, `label`	zihanzhu
douban_movie	2022-05-09	TextClassification	Done	There is only a train set. Current definition:`label`,`text`.	zihanzhu
lcqmc	2022-05-09	TextMatching	Done	This dataset has training set, validation set and test set. Current definition:`text1`,`text2`, `label`	zihanzhu
paws	2022-05-09	TextMatching	Done	This dataset has training set and validation set. The test set has no label. Current definition:`text1`,`text2`, `label`	zihanzhu
thucnews	2022-05-09	TextClassification	Done	This dataset has training set, validation set and test set. Current definition:`text`,`label`.	zihanzhu
yf_amazon	2022-05-10	TextClassification	Done	There is only a train set. Current definition:`user_id`, `product_id`, `rating`, `timestamp`, `title`, `comment`.	zihanzhu
yf_dianping	2022-05-10	TextClassification	Done	There is only a train set. Current definition:`user_id`, `restaurant_id`, `rating`, `rating_env`, `rating_flavor`, `rating_service`, `timestamp`, `comment`.	zihanzhu
ocnli	2022-05-10	TextMatching	Done	This dataset has training set and validation set. The test set has no label. Current definition:`text1`,`text2`, `label`	zihanzhu
cinlid	2022-05-10	TextMatching	Done	This dataset has training set. The test set has no label. Current definition:`text1`,`text2`, `label`	zihanzhu
iflytek	2022-05-10	TextClassification	Done	This dataset has training set and validation set. The test set has no label. Current definition:`text`,`label`.	zihanzhu
eprstmt	2022-05-11	TextClassification	Done	This dataset has training set, validation set and test set. Current definition:`text`,`label`.	zihanzhu
csldcp	2022-05-11	TextClassification	Done	This dataset has training set, validation set and test set. Current definition:`text`,`label`.	zihanzhu
bustm	2022-05-11	TextMatching	Done	This dataset has training set, validation set and test set. Current definition:`text1`,`text2`, `label`	zihanzhu
cmnli	2022-05-11	TextMatching	Done	This dataset has training set and validation set. The test set has no label. Current definition:`text1`,`text2`, `label`	zihanzhu
ths2021	2022-05-11	TextMatching	Done	The test set has no label. Current definition:`text1`,`text2`, `label`	zihanzhu
bdci2019	2022-05-16	SentimentClassification	Done	The test set has no label. Current definition:`text`,`label`.	zihanzhu
cmid	2022-05-16	IntentClassification	Done	The test set has no label. Current definition: `text`,`label`,`entities`,`seg_result`	zihanzhu
cnsd	2022-05-16	NaturalLanguageInference	Done	This dataset has training set, validation set and test set. Current definition:`text1`,`text2`,`label`.	zihanzhu
cote_bd	2022-05-16	OpinionTargetExtraction	Done	Current definition: `opinion`, `target`	zihanzhu
sohu2021a_ll	2022-05-16	NaturalLanguageInference	Done	The test set has no label. Current definition:`text1`,`text2`,`label`.	zihanzhu
sohu2021a_sl	2022-05-16	NaturalLanguageInference	Done	The test set has no label. Current definition:`text1`,`text2`,`label`.	zihanzhu
sohu2021a_ss	2022-05-16	NaturalLanguageInference	Done	The test set has no label. Current definition:`text1`,`text2`,`label`.	zihanzhu
sohu2021b_ll	2022-05-16	NaturalLanguageInference	Done	The test set has no label. Current definition:`text1`,`text2`,`label`.	zihanzhu
sohu2021b_sl	2022-05-16	NaturalLanguageInference	Done	The test set has no label. Current definition:`text1`,`text2`,`label`.	zihanzhu
sohu2021b_ss	2022-05-16	NaturalLanguageInference	Done	The test set has no label. Current definition:`text1`,`text2`,`label`.	zihanzhu
lcsts	2022-05-17	Summarization	Done	Current definition: `text`, `summary`	zihanzhu
cote_dp	2022-05-17	OpinionTargetExtraction	Done	Current definition: `opinion`, `target`	zihanzhu
cote_mfw	2022-05-17	OpinionTargetExtraction	Done	Current definition: `opinion`, `target`	zihanzhu
advertise_gen	2022-05-18	ConditionalGeneration	Done	Current definition: `source`, `reference`	zihanzhu
dureader_qg	2022-05-18	GuidedConditionalGeneration	Done	Current definition: (1) GuidedConditionalGeneration: `source`, `guidance`, `reference`, (2) QuestionAnswering: `content`, `question`, `answer`	zihanzhu
dureader_robust	2022-05-18	QuestionAnswering	Done	Current definition: `content`, `question`, `answer`	zihanzhu
csl	2022-05-18	KeywordRecognition	Done	Current definition: `text1`, `text2`, `label`	zihanzhu
dureader_checklist	2022-05-18	QuestionAnsweringExtractive	Done	Current definition: `id`, `question`, `context`, `title`, `answers`, `is_impossible`, `type`	zihanzhu
cluewsc2020	2022-05-18	CoreferenceResolution	Done	Current definition: `text`, `pronoun`, `pronoun_idx`, `quote`, `quote_idx`, `label`	zihanzhu
cmrc2018	2022-05-23	QuestionAnsweringExtractive	Done	Current definition: `id`, `context`, `title`, `question`, `answers`	zihanzhu
drcd	2022-05-23	QuestionAnsweringExtractive	Done	Current definition: `id`, `context`, `title`, `question`, `answers`	zihanzhu
chid	2022-05-23	QuestionAnsweringMultipleChoiceWithoutContext	Done	Current definition: `content`, `options`, `answers`	zihanzhu
c3_d	2022-05-23	QuestionAnsweringMultipleChoiceC3	Done	Current definition: `id`, `context`, `question`, `options`, `answers`	zihanzhu
c3_m	2022-05-23	QuestionAnsweringMultipleChoiceC3	Done	Current definition: `id`, `context`, `question`, `options`, `answers`	zihanzhu
finre	2022-05-30	KGLinkTailPrediction	Done	Current definition: `text`, `span1`, `span2`, `relation`	zihanzhu
sanwen	2022-05-30	KGLinkTailPrediction	Done	Current definition: `text`, `span1`, `span2`, `relation`	zihanzhu
cail2019	2022-05-30	QuestionAnsweringMultipleChoiceWithoutContext	Done	Current definition: `question`, `options`, `answers`	zihanzhu
ccpm	2022-05-31	QuestionAnsweringMultipleChoiceWithoutContext	Done	the test set has no label. Current definition: `question`, `options`, `answers`	zihanzhu
cnse	2022-05-31	TextPairClassification	Done	Current definition: `text1`,`text2`,`title1`,`title2`,`keywords1`,`keywords2`, `main_keywords1`,`main_keywords2`,`ner_keywords1`,`ner_keywords2`,`ner1`,`ner2`,`label`	zihanzhu
cnss	2022-05-31	TextPairClassification	Done	Current definition: `text1`,`text2`,`title1`,`title2`,`keywords1`,`keywords2`, `main_keywords1`,`main_keywords2`,`ner_keywords1`,`ner_keywords2`,`ner1`,`ner2`,`label`	zihanzhu
ccks2019_fin	2022-06-01	EventEntityExtraction	Done	The test set has no label. Current definition: `text`, `event_type`, `event_entity`	zihanzhu
ccks2020_fin_ea	2022-06-01	EventArgumentsExtraction	Done	The test set has no label. Current definition: `text`, `event_type`, `arguments`	zihanzhu
ccks2020_fin_ee	2022-06-01	EventEntityExtraction	Done	The test set has no label. Current definition: `text`, `event_type`, `event_entity`	zihanzhu
ccks2021_fin_ea	2022-06-01	EventArgumentsExtraction	Done	The test set has no label. Current definition: `text`, `event_type`, `arguments`	zihanzhu
ccks2021_fin_re	2022-06-01	EventRelationExtractionCausality	Done	The test set has no label. Current definition: `text`, `relation`	zihanzhu
chip2019_qm	2022-06-07	TextPairClassification	Done	This dataset has only a training set. Current definition: `question1`, `question2`, `category`, `label`	zihanzhu
ckbqa	2022-06-07	QuestionAnsweringOpenDomain	Done	This dataset has training, validation and test set. (1)QuestionAnsweringOpenDomain: Current definition: `question`,`answers`. (2)TextToSql: Current definition: `question`,`query`.	zihanzhu
coqa	2022-06-07	QuestionAnsweringOpenDomain	Done	This dataset has training, validation and test set. (1)QuestionAnsweringOpenDomain: Current definition: `question`,`answers`. (2)TextToSql: Current definition: `question`,`query`.	zihanzhu
nlpcc2017_dbqa	2022-06-07	QuestionAnsweringClassification	Done	This dataset has training, validation and test set. Current definition: `question`, `answer`, `label`	zihanzhu
cmrc2019	2022-06-08	ClozeMultipleChoice	Done	This dataset has training, validation and test set. Current definition: `question_mark`, `context`, `options`, `answers`	zihanzhu
dureader_zhidao	2022-06-08	QuestionAnsweringExtractive	Done	This dataset has training set and validation set. The test set has no label. Current definition: `documents`, `answers`, `segmented_answers`, `fake_answers`, `answer_spans`, `question`, `segmented_question`, `question_type`, `fact_or_opinion`, `question_id`, `match_scores`, `answer_docs`	zihanzhu
dureader_search	2022-06-08	QuestionAnsweringExtractive	Done	This dataset has training set and validation set. The test set has no label. Current definition: `documents`, `answers`, `segmented_answers`, `fake_answers`, `answer_spans`, `question`, `segmented_question`, `question_type`, `fact_or_opinion`, `question_id`, `match_scores`, `answer_docs`	zihanzhu
dureader_yesno	2022-06-08	QuestionAnsweringExtractive	Done	This dataset has training set and validation set. The test set has no label. Current definition: `documents`, `question`, `answers`	zihanzhu
children_fairy_tale	2022-06-14	ClozeDocuments	Done	This dataset has training set and test set. Current definition: `documents`, `documents_tokens`, `question`, `question_tokens`, `answers`	zihanzhu
people_daily_rc	2022-06-14	ClozeDocuments	Done	This dataset has training set, validation set and test set. Current definition: `documents`, `documents_tokens`, `question`, `question_tokens`, `answers`	zihanzhu
matinf	2022-06-14	QuestionAnsweringClassification	Done	This dataset has training set, validation set and test set. Current definition: `text`, `label`	zihanzhu
nlpec	2022-06-14	QuestionAnsweringMultipleChoiceNLPEC	Done	This dataset has only the training set. Current definition: `question_type`, `question`, `question_s`, `options`, `options_s`, `context`, `context_s`, `answers`	zihanzhu
cspider	2022-06-15	TexttoSQL	Done	This dataset has a training set and a validation set. Current definition: `question`, `query`, `database_id`	zihanzhu
dusql	2022-06-15	TexttoSQL	Done	This dataset has a training set and a validation set. Current definition: `question`, `query`, `database_id`	zihanzhu
nl2sql	2022-06-15	TexttoSQL	Done	This dataset has a training set and a validation set. Current definition: `question`, `query`, `database_id`	zihanzhu
duee	2022-06-15	EventArgumentsExtraction	Done	This dataset has the training, validation and test set. Current definition: `text`, `event_type`, `trigger`, `trigger_start_index`, `arguments`	zihanzhu
duie	2022-06-15	EntityRelationExtraction	Done	This dataset has the training, validation and test set. Current definition: `text`, `relation`	zihanzhu