Progress

June 15, 2022 ยท View on GitHub

List datasets which are supported by SDK and their associated information w.r.t task schema

DatasetsUpdated DateTask SchemaNormalized StateCommentsConstructor
govreport2022-02-01SummarizationDoneCurrent definition: text, summaryyixinliu
duorc2022-02-03QuestionAnsweringExtractivePendingdifferent ids (plot_id, q_id) should be unifiedjinlanfu
wiki_hop2022-02-03QuestionAnsweringExtractivePendingTwo new fields: candidates, annotationsjinlanfu
hotpot_qa2022-02-03QuestionAnsweringHotpotPendingMany new fileds (supporting_facts), context will be a json with a list of sentences.jinlanfu
ropes2022-02-03QuestionAnsweringExtractivePendingMany new fileds (situation).jinlanfu
squad_adversarial2022-02-03QuestionAnsweringExtractiveDoneCurrent definition:question,context,answers.jinlanfu
quoref2022-02-03QuestionAnsweringExtractiveDoneCurrent definition:question,context,answers.jinlanfu
spider2022-02-03SemanticParsingPendingCurrent definition: question, queryjinlanfu
atis2022-02-05TextClassificationDoneCurrent definition:text,labelweizhe
cr2022-02-06TextClassificationDoneCurrent definition:text,labelweizhe
mr2022-02-06TextClassificationDoneCurrent definition:text,labelweizhe
qc2022-02-06TextClassificationDoneCurrent definition:text,labelweizhe
subj2022-02-06TextClassificationDoneCurrent definition:text,labelweizhe
afqmc2022-02-06TextMatchingDoneCurrent definition:text1,text2, labelzhengfu
sst22022-02-07TextClassificationDoneCurrent definition:text,labelweizhe
race2022-02-07QuestionAnsweringMultipleChoicesDoneCurrent definition:questions,context,options,answers. Note that (1) some datasets are with/without context. (2) additional exmaple idjinlanfu
drop2022-02-08QuestionAnsweringAbstractivePending(1) Abstractive QA; (2) answers field has a new feature named (types).jinlanfu
fb15k_2372022-02-09KGLinkPredictionDoneCurrent definition:head,link, tailPengfei
restaurant142022-02-09AspectBasedSentimentClassificationDoneCurrent definition:aspect,text,labelweizhe
sst52022-02-10TextClassificationDoneCurrent definition:text,labelweizhe
restaurant162022-02-10AspectBasedSentimentClassificationDoneCurrent definition:aspect,text,labelweizhe
openbookqa2022-02-11QuestionAnsweringMultipleChoicesWithoutContextPending(1) current field: question, options, answers: text, option_idx; (2) The type of answers.text and answers.option_idx are String not List.jinlanfu
commonsense_qa2022-02-11QuestionAnsweringMultipleChoicesWithoutContextPending(1) current field: question, options, answers: text, option_idx; (2) The type of answers.text and answers.option_idx are String not List. (3) The test set does not provide annotated answers.jinlanfu
winogrande2022-02-11QuestionAnsweringMultipleChoicesWithoutContextPending(1) current field: question, options, answers: text, option_idx; (2) The type of answers.text and answers.option_idx are String not List. (3) The test set does not provide annotated answers.jinlanfu
laptop142022-02-11AspectBasedSentimentClassificationDoneCurrent definition:aspect,text,labelweizhe
twitter2022-02-11AspectBasedSentimentClassificationDoneCurrent definition:aspect,text,labelweizhe
natural_questions2022-02-12QuestionAnsweringAbstractiveNQPending(1) current field: question, context, answers. Unlike extraction QA or abstract QA, natural_questions has a complex structure (see the NQ schema definition). (2) The dataset is very large, occupying 135G of disk storage.jinlanfu
ai2_arc2022-02-17QuestionAnsweringMultipleChoicesWithoutContextDone(1) current field: question, options, answers: text, option_idx;jinlanfu
social_i_qa2022-02-17QuestionAnsweringMultipleChoicesDone(1) Current definition:questions,context,options,answers.jinlanfu
piqa2022-02-17QuestionAnsweringMultipleChoicesWithoutContextDone(1) current field: question, options, answers: text, option_idx;jinlanfu
codah2022-02-17QuestionAnsweringMultipleChoicesWithoutContextPending(1) current field: question, options, answers: text, option_idx; (2) There is a new but important field question_category.jinlanfu
qasc2022-02-17QuestionAnsweringMultipleChoicesQASCPending(1) Current definition:questions,context,options,answers. (2) The test set has no labeled answers. (3) context is a dictionary with fields fact1, fact2 and combinedfact. (4) qasc has new field named formatted_question.jinlanfu
wikihow2022-02-17SummarizationDoneCurrent definition: text, summaryyixinliu
wikisum2022-02-17SummarizationDoneCurrent definition: text, summaryyixinliu
reddit_tifu2022-02-17SummarizationDoneCurrent definition: text, summaryyixinliu
bigpatent2022-02-17SummarizationDoneCurrent definition: text, summaryyixinliu
multi_xscience2022-02-17Summarization, MultiDocSummarizationDoneCurrent definition: (1) Summarization: text, summary, (2) MultiDocSummarization: texts, summaryyixinliu
multinews2022-02-17Summarization, MultiDocSummarizationDoneCurrent definition: (1) Summarization: text, summary, (2) MultiDocSummarization: texts, summaryyixinliu
dialogsum2022-02-17Summarization, DialogSummarizationDoneCurrent definition: (1) Summarization: text, summary, (2) DialogSummarization: dialogue: {"speaker": List[str], "text": List[str]}, summary: List[str]yixinliu
samsum2022-02-17Summarization, DialogSummarizationDoneCurrent definition: (1) Summarization: text, summary, (2) DialogSummarization: dialogue: {"speaker": List[str], "text": List[str]}, summary: List[str]yixinliu
qmsum2022-02-17Summarization, QuerySummarizationDoneCurrent definition: (1) Summarization: text, summary, (2) QuerySummarization: text, summary, queryyixinliu
tydiqa2022-03-08QuestionAnsweringExtractiveDoneMultilingual QA datasets with 9 languages. Current definition:question,context,answers.jinlanfu
mlqa2022-03-08QuestionAnsweringExtractiveDoneMultilingual QA datasets with 7 languages without a training set. Current definition:question,context,answers.jinlanfu
dcqa2022-03-08QuestionAnsweringDCQAPendingThe context is a sequence of sentences with ordinal numbers. Current definition:question,context: SentenceID, text,answers: SentenceID, text.jinlanfu
waimai2022-04-29TextClassificationDoneThis dataset has only one training set and no test set. Current definition:label,text.zihanzhu
chnsenticorp_hotel2022-05-06TextClassificationDoneThis dataset has only one training set and no test set. Current definition:label,text.zihanzhu
onlineshopping2022-05-06TextClassificationDoneThis dataset has only one training set and no test set. Current definition:label,category,text.zihanzhu
tnews2022-05-06TextClassificationDoneThis dataset has training set and validation set. The test set has no label, so we have not added the test set for now. Current definition:label,text,keywords.zihanzhu
weibo_4moods2022-05-06TextClassificationDoneThis dataset has only one training set and no test set. Current definition:label,text.zihanzhu
weibo_senti2022-05-06TextClassificationDoneThis dataset has only one training set and no test set. Current definition:label,text.zihanzhu
asap_sent2022-05-07TextClassificationDoneThis dataset has training set and validation set. The test set has no label, so we have not added the test set for now. Current definition:label,text.zihanzhu
nlpcc14_sc2022-05-07TextClassificationDoneThis dataset has training set. The test set has no label, so we have not added the test set for now. No related papers found. Current definition:label,text.zihanzhu
se_absa16_came2022-05-07TextClassificationDoneThe test set has no label, so we have not added the test set for now. In addition to label and text, there is also a variable called "evaluation object" in the data set, which we have not added for now. Current definition:label,text.zihanzhu
se_absa16_phns2022-05-07TextClassificationDoneThe test set has no label, so we have not added the test set for now. In addition to label and text, there is also a variable called "evaluation object" in the data set, which we have not added for now. Current definition:label,text.zihanzhu
bq_corpus2022-05-09TextMatchingDoneThis dataset has training set and validation set. The test set has no label. Current definition:text1,text2, labelzihanzhu
douban_movie2022-05-09TextClassificationDoneThere is only a train set. Current definition:label,text.zihanzhu
lcqmc2022-05-09TextMatchingDoneThis dataset has training set, validation set and test set. Current definition:text1,text2, labelzihanzhu
paws2022-05-09TextMatchingDoneThis dataset has training set and validation set. The test set has no label. Current definition:text1,text2, labelzihanzhu
thucnews2022-05-09TextClassificationDoneThis dataset has training set, validation set and test set. Current definition:text,label.zihanzhu
yf_amazon2022-05-10TextClassificationDoneThere is only a train set. Current definition:user_id, product_id, rating, timestamp, title, comment.zihanzhu
yf_dianping2022-05-10TextClassificationDoneThere is only a train set. Current definition:user_id, restaurant_id, rating, rating_env, rating_flavor, rating_service, timestamp, comment.zihanzhu
ocnli2022-05-10TextMatchingDoneThis dataset has training set and validation set. The test set has no label. Current definition:text1,text2, labelzihanzhu
cinlid2022-05-10TextMatchingDoneThis dataset has training set. The test set has no label. Current definition:text1,text2, labelzihanzhu
iflytek2022-05-10TextClassificationDoneThis dataset has training set and validation set. The test set has no label. Current definition:text,label.zihanzhu
eprstmt2022-05-11TextClassificationDoneThis dataset has training set, validation set and test set. Current definition:text,label.zihanzhu
csldcp2022-05-11TextClassificationDoneThis dataset has training set, validation set and test set. Current definition:text,label.zihanzhu
bustm2022-05-11TextMatchingDoneThis dataset has training set, validation set and test set. Current definition:text1,text2, labelzihanzhu
cmnli2022-05-11TextMatchingDoneThis dataset has training set and validation set. The test set has no label. Current definition:text1,text2, labelzihanzhu
ths20212022-05-11TextMatchingDoneThe test set has no label. Current definition:text1,text2, labelzihanzhu
bdci20192022-05-16SentimentClassificationDoneThe test set has no label. Current definition:text,label.zihanzhu
cmid2022-05-16IntentClassificationDoneThe test set has no label. Current definition: text,label,entities,seg_resultzihanzhu
cnsd2022-05-16NaturalLanguageInferenceDoneThis dataset has training set, validation set and test set. Current definition:text1,text2,label.zihanzhu
cote_bd2022-05-16OpinionTargetExtractionDoneCurrent definition: opinion, targetzihanzhu
sohu2021a_ll2022-05-16NaturalLanguageInferenceDoneThe test set has no label. Current definition:text1,text2,label.zihanzhu
sohu2021a_sl2022-05-16NaturalLanguageInferenceDoneThe test set has no label. Current definition:text1,text2,label.zihanzhu
sohu2021a_ss2022-05-16NaturalLanguageInferenceDoneThe test set has no label. Current definition:text1,text2,label.zihanzhu
sohu2021b_ll2022-05-16NaturalLanguageInferenceDoneThe test set has no label. Current definition:text1,text2,label.zihanzhu
sohu2021b_sl2022-05-16NaturalLanguageInferenceDoneThe test set has no label. Current definition:text1,text2,label.zihanzhu
sohu2021b_ss2022-05-16NaturalLanguageInferenceDoneThe test set has no label. Current definition:text1,text2,label.zihanzhu
lcsts2022-05-17SummarizationDoneCurrent definition: text, summaryzihanzhu
cote_dp2022-05-17OpinionTargetExtractionDoneCurrent definition: opinion, targetzihanzhu
cote_mfw2022-05-17OpinionTargetExtractionDoneCurrent definition: opinion, targetzihanzhu
advertise_gen2022-05-18ConditionalGenerationDoneCurrent definition: source, referencezihanzhu
dureader_qg2022-05-18GuidedConditionalGenerationDoneCurrent definition: (1) GuidedConditionalGeneration: source, guidance, reference, (2) QuestionAnswering: content, question, answerzihanzhu
dureader_robust2022-05-18QuestionAnsweringDoneCurrent definition: content, question, answerzihanzhu
csl2022-05-18KeywordRecognitionDoneCurrent definition: text1, text2, labelzihanzhu
dureader_checklist2022-05-18QuestionAnsweringExtractiveDoneCurrent definition: id, question, context, title, answers, is_impossible, typezihanzhu
cluewsc20202022-05-18CoreferenceResolutionDoneCurrent definition: text, pronoun, pronoun_idx, quote, quote_idx, labelzihanzhu
cmrc20182022-05-23QuestionAnsweringExtractiveDoneCurrent definition: id, context, title, question, answerszihanzhu
drcd2022-05-23QuestionAnsweringExtractiveDoneCurrent definition: id, context, title, question, answerszihanzhu
chid2022-05-23QuestionAnsweringMultipleChoiceWithoutContextDoneCurrent definition: content, options, answerszihanzhu
c3_d2022-05-23QuestionAnsweringMultipleChoiceC3DoneCurrent definition: id, context, question, options, answerszihanzhu
c3_m2022-05-23QuestionAnsweringMultipleChoiceC3DoneCurrent definition: id, context, question, options, answerszihanzhu
finre2022-05-30KGLinkTailPredictionDoneCurrent definition: text, span1, span2, relationzihanzhu
sanwen2022-05-30KGLinkTailPredictionDoneCurrent definition: text, span1, span2, relationzihanzhu
cail20192022-05-30QuestionAnsweringMultipleChoiceWithoutContextDoneCurrent definition: question, options, answerszihanzhu
ccpm2022-05-31QuestionAnsweringMultipleChoiceWithoutContextDonethe test set has no label. Current definition: question, options, answerszihanzhu
cnse2022-05-31TextPairClassificationDoneCurrent definition: text1,text2,title1,title2,keywords1,keywords2, main_keywords1,main_keywords2,ner_keywords1,ner_keywords2,ner1,ner2,labelzihanzhu
cnss2022-05-31TextPairClassificationDoneCurrent definition: text1,text2,title1,title2,keywords1,keywords2, main_keywords1,main_keywords2,ner_keywords1,ner_keywords2,ner1,ner2,labelzihanzhu
ccks2019_fin2022-06-01EventEntityExtractionDoneThe test set has no label. Current definition: text, event_type, event_entityzihanzhu
ccks2020_fin_ea2022-06-01EventArgumentsExtractionDoneThe test set has no label. Current definition: text, event_type, argumentszihanzhu
ccks2020_fin_ee2022-06-01EventEntityExtractionDoneThe test set has no label. Current definition: text, event_type, event_entityzihanzhu
ccks2021_fin_ea2022-06-01EventArgumentsExtractionDoneThe test set has no label. Current definition: text, event_type, argumentszihanzhu
ccks2021_fin_re2022-06-01EventRelationExtractionCausalityDoneThe test set has no label. Current definition: text, relationzihanzhu
chip2019_qm2022-06-07TextPairClassificationDoneThis dataset has only a training set. Current definition: question1, question2, category, labelzihanzhu
ckbqa2022-06-07QuestionAnsweringOpenDomainDoneThis dataset has training, validation and test set. (1)QuestionAnsweringOpenDomain: Current definition: question,answers. (2)TextToSql: Current definition: question,query.zihanzhu
coqa2022-06-07QuestionAnsweringOpenDomainDoneThis dataset has training, validation and test set. (1)QuestionAnsweringOpenDomain: Current definition: question,answers. (2)TextToSql: Current definition: question,query.zihanzhu
nlpcc2017_dbqa2022-06-07QuestionAnsweringClassificationDoneThis dataset has training, validation and test set. Current definition: question, answer, labelzihanzhu
cmrc20192022-06-08ClozeMultipleChoiceDoneThis dataset has training, validation and test set. Current definition: question_mark, context, options, answerszihanzhu
dureader_zhidao2022-06-08QuestionAnsweringExtractiveDoneThis dataset has training set and validation set. The test set has no label. Current definition: documents, answers, segmented_answers, fake_answers, answer_spans, question, segmented_question, question_type, fact_or_opinion, question_id, match_scores, answer_docszihanzhu
dureader_search2022-06-08QuestionAnsweringExtractiveDoneThis dataset has training set and validation set. The test set has no label. Current definition: documents, answers, segmented_answers, fake_answers, answer_spans, question, segmented_question, question_type, fact_or_opinion, question_id, match_scores, answer_docszihanzhu
dureader_yesno2022-06-08QuestionAnsweringExtractiveDoneThis dataset has training set and validation set. The test set has no label. Current definition: documents, question, answerszihanzhu
children_fairy_tale2022-06-14ClozeDocumentsDoneThis dataset has training set and test set. Current definition: documents, documents_tokens, question, question_tokens, answerszihanzhu
people_daily_rc2022-06-14ClozeDocumentsDoneThis dataset has training set, validation set and test set. Current definition: documents, documents_tokens, question, question_tokens, answerszihanzhu
matinf2022-06-14QuestionAnsweringClassificationDoneThis dataset has training set, validation set and test set. Current definition: text, labelzihanzhu
nlpec2022-06-14QuestionAnsweringMultipleChoiceNLPECDoneThis dataset has only the training set. Current definition: question_type, question, question_s, options, options_s, context, context_s, answerszihanzhu
cspider2022-06-15TexttoSQLDoneThis dataset has a training set and a validation set. Current definition: question, query, database_idzihanzhu
dusql2022-06-15TexttoSQLDoneThis dataset has a training set and a validation set. Current definition: question, query, database_idzihanzhu
nl2sql2022-06-15TexttoSQLDoneThis dataset has a training set and a validation set. Current definition: question, query, database_idzihanzhu
duee2022-06-15EventArgumentsExtractionDoneThis dataset has the training, validation and test set. Current definition: text, event_type, trigger, trigger_start_index, argumentszihanzhu
duie2022-06-15EntityRelationExtractionDoneThis dataset has the training, validation and test set. Current definition: text, relationzihanzhu