Progress
June 15, 2022 ยท View on GitHub
List datasets which are supported by SDK and their associated information w.r.t task schema
| Datasets | Updated Date | Task Schema | Normalized State | Comments | Constructor |
|---|---|---|---|---|---|
| govreport | 2022-02-01 | Summarization | Done | Current definition: text, summary | yixinliu |
| duorc | 2022-02-03 | QuestionAnsweringExtractive | Pending | different ids (plot_id, q_id) should be unified | jinlanfu |
| wiki_hop | 2022-02-03 | QuestionAnsweringExtractive | Pending | Two new fields: candidates, annotations | jinlanfu |
| hotpot_qa | 2022-02-03 | QuestionAnsweringHotpot | Pending | Many new fileds (supporting_facts), context will be a json with a list of sentences. | jinlanfu |
| ropes | 2022-02-03 | QuestionAnsweringExtractive | Pending | Many new fileds (situation). | jinlanfu |
| squad_adversarial | 2022-02-03 | QuestionAnsweringExtractive | Done | Current definition:question,context,answers. | jinlanfu |
| quoref | 2022-02-03 | QuestionAnsweringExtractive | Done | Current definition:question,context,answers. | jinlanfu |
| spider | 2022-02-03 | SemanticParsing | Pending | Current definition: question, query | jinlanfu |
| atis | 2022-02-05 | TextClassification | Done | Current definition:text,label | weizhe |
| cr | 2022-02-06 | TextClassification | Done | Current definition:text,label | weizhe |
| mr | 2022-02-06 | TextClassification | Done | Current definition:text,label | weizhe |
| qc | 2022-02-06 | TextClassification | Done | Current definition:text,label | weizhe |
| subj | 2022-02-06 | TextClassification | Done | Current definition:text,label | weizhe |
| afqmc | 2022-02-06 | TextMatching | Done | Current definition:text1,text2, label | zhengfu |
| sst2 | 2022-02-07 | TextClassification | Done | Current definition:text,label | weizhe |
| race | 2022-02-07 | QuestionAnsweringMultipleChoices | Done | Current definition:questions,context,options,answers. Note that (1) some datasets are with/without context. (2) additional exmaple id | jinlanfu |
| drop | 2022-02-08 | QuestionAnsweringAbstractive | Pending | (1) Abstractive QA; (2) answers field has a new feature named (types). | jinlanfu |
| fb15k_237 | 2022-02-09 | KGLinkPrediction | Done | Current definition:head,link, tail | Pengfei |
| restaurant14 | 2022-02-09 | AspectBasedSentimentClassification | Done | Current definition:aspect,text,label | weizhe |
| sst5 | 2022-02-10 | TextClassification | Done | Current definition:text,label | weizhe |
| restaurant16 | 2022-02-10 | AspectBasedSentimentClassification | Done | Current definition:aspect,text,label | weizhe |
| openbookqa | 2022-02-11 | QuestionAnsweringMultipleChoicesWithoutContext | Pending | (1) current field: question, options, answers: text, option_idx; (2) The type of answers.text and answers.option_idx are String not List. | jinlanfu |
| commonsense_qa | 2022-02-11 | QuestionAnsweringMultipleChoicesWithoutContext | Pending | (1) current field: question, options, answers: text, option_idx; (2) The type of answers.text and answers.option_idx are String not List. (3) The test set does not provide annotated answers. | jinlanfu |
| winogrande | 2022-02-11 | QuestionAnsweringMultipleChoicesWithoutContext | Pending | (1) current field: question, options, answers: text, option_idx; (2) The type of answers.text and answers.option_idx are String not List. (3) The test set does not provide annotated answers. | jinlanfu |
| laptop14 | 2022-02-11 | AspectBasedSentimentClassification | Done | Current definition:aspect,text,label | weizhe |
| 2022-02-11 | AspectBasedSentimentClassification | Done | Current definition:aspect,text,label | weizhe | |
| natural_questions | 2022-02-12 | QuestionAnsweringAbstractiveNQ | Pending | (1) current field: question, context, answers. Unlike extraction QA or abstract QA, natural_questions has a complex structure (see the NQ schema definition). (2) The dataset is very large, occupying 135G of disk storage. | jinlanfu |
| ai2_arc | 2022-02-17 | QuestionAnsweringMultipleChoicesWithoutContext | Done | (1) current field: question, options, answers: text, option_idx; | jinlanfu |
| social_i_qa | 2022-02-17 | QuestionAnsweringMultipleChoices | Done | (1) Current definition:questions,context,options,answers. | jinlanfu |
| piqa | 2022-02-17 | QuestionAnsweringMultipleChoicesWithoutContext | Done | (1) current field: question, options, answers: text, option_idx; | jinlanfu |
| codah | 2022-02-17 | QuestionAnsweringMultipleChoicesWithoutContext | Pending | (1) current field: question, options, answers: text, option_idx; (2) There is a new but important field question_category. | jinlanfu |
| qasc | 2022-02-17 | QuestionAnsweringMultipleChoicesQASC | Pending | (1) Current definition:questions,context,options,answers. (2) The test set has no labeled answers. (3) context is a dictionary with fields fact1, fact2 and combinedfact. (4) qasc has new field named formatted_question. | jinlanfu |
| wikihow | 2022-02-17 | Summarization | Done | Current definition: text, summary | yixinliu |
| wikisum | 2022-02-17 | Summarization | Done | Current definition: text, summary | yixinliu |
| reddit_tifu | 2022-02-17 | Summarization | Done | Current definition: text, summary | yixinliu |
| bigpatent | 2022-02-17 | Summarization | Done | Current definition: text, summary | yixinliu |
| multi_xscience | 2022-02-17 | Summarization, MultiDocSummarization | Done | Current definition: (1) Summarization: text, summary, (2) MultiDocSummarization: texts, summary | yixinliu |
| multinews | 2022-02-17 | Summarization, MultiDocSummarization | Done | Current definition: (1) Summarization: text, summary, (2) MultiDocSummarization: texts, summary | yixinliu |
| dialogsum | 2022-02-17 | Summarization, DialogSummarization | Done | Current definition: (1) Summarization: text, summary, (2) DialogSummarization: dialogue: {"speaker": List[str], "text": List[str]}, summary: List[str] | yixinliu |
| samsum | 2022-02-17 | Summarization, DialogSummarization | Done | Current definition: (1) Summarization: text, summary, (2) DialogSummarization: dialogue: {"speaker": List[str], "text": List[str]}, summary: List[str] | yixinliu |
| qmsum | 2022-02-17 | Summarization, QuerySummarization | Done | Current definition: (1) Summarization: text, summary, (2) QuerySummarization: text, summary, query | yixinliu |
| tydiqa | 2022-03-08 | QuestionAnsweringExtractive | Done | Multilingual QA datasets with 9 languages. Current definition:question,context,answers. | jinlanfu |
| mlqa | 2022-03-08 | QuestionAnsweringExtractive | Done | Multilingual QA datasets with 7 languages without a training set. Current definition:question,context,answers. | jinlanfu |
| dcqa | 2022-03-08 | QuestionAnsweringDCQA | Pending | The context is a sequence of sentences with ordinal numbers. Current definition:question,context: SentenceID, text,answers: SentenceID, text. | jinlanfu |
| waimai | 2022-04-29 | TextClassification | Done | This dataset has only one training set and no test set. Current definition:label,text. | zihanzhu |
| chnsenticorp_hotel | 2022-05-06 | TextClassification | Done | This dataset has only one training set and no test set. Current definition:label,text. | zihanzhu |
| onlineshopping | 2022-05-06 | TextClassification | Done | This dataset has only one training set and no test set. Current definition:label,category,text. | zihanzhu |
| tnews | 2022-05-06 | TextClassification | Done | This dataset has training set and validation set. The test set has no label, so we have not added the test set for now. Current definition:label,text,keywords. | zihanzhu |
| weibo_4moods | 2022-05-06 | TextClassification | Done | This dataset has only one training set and no test set. Current definition:label,text. | zihanzhu |
| weibo_senti | 2022-05-06 | TextClassification | Done | This dataset has only one training set and no test set. Current definition:label,text. | zihanzhu |
| asap_sent | 2022-05-07 | TextClassification | Done | This dataset has training set and validation set. The test set has no label, so we have not added the test set for now. Current definition:label,text. | zihanzhu |
| nlpcc14_sc | 2022-05-07 | TextClassification | Done | This dataset has training set. The test set has no label, so we have not added the test set for now. No related papers found. Current definition:label,text. | zihanzhu |
| se_absa16_came | 2022-05-07 | TextClassification | Done | The test set has no label, so we have not added the test set for now. In addition to label and text, there is also a variable called "evaluation object" in the data set, which we have not added for now. Current definition:label,text. | zihanzhu |
| se_absa16_phns | 2022-05-07 | TextClassification | Done | The test set has no label, so we have not added the test set for now. In addition to label and text, there is also a variable called "evaluation object" in the data set, which we have not added for now. Current definition:label,text. | zihanzhu |
| bq_corpus | 2022-05-09 | TextMatching | Done | This dataset has training set and validation set. The test set has no label. Current definition:text1,text2, label | zihanzhu |
| douban_movie | 2022-05-09 | TextClassification | Done | There is only a train set. Current definition:label,text. | zihanzhu |
| lcqmc | 2022-05-09 | TextMatching | Done | This dataset has training set, validation set and test set. Current definition:text1,text2, label | zihanzhu |
| paws | 2022-05-09 | TextMatching | Done | This dataset has training set and validation set. The test set has no label. Current definition:text1,text2, label | zihanzhu |
| thucnews | 2022-05-09 | TextClassification | Done | This dataset has training set, validation set and test set. Current definition:text,label. | zihanzhu |
| yf_amazon | 2022-05-10 | TextClassification | Done | There is only a train set. Current definition:user_id, product_id, rating, timestamp, title, comment. | zihanzhu |
| yf_dianping | 2022-05-10 | TextClassification | Done | There is only a train set. Current definition:user_id, restaurant_id, rating, rating_env, rating_flavor, rating_service, timestamp, comment. | zihanzhu |
| ocnli | 2022-05-10 | TextMatching | Done | This dataset has training set and validation set. The test set has no label. Current definition:text1,text2, label | zihanzhu |
| cinlid | 2022-05-10 | TextMatching | Done | This dataset has training set. The test set has no label. Current definition:text1,text2, label | zihanzhu |
| iflytek | 2022-05-10 | TextClassification | Done | This dataset has training set and validation set. The test set has no label. Current definition:text,label. | zihanzhu |
| eprstmt | 2022-05-11 | TextClassification | Done | This dataset has training set, validation set and test set. Current definition:text,label. | zihanzhu |
| csldcp | 2022-05-11 | TextClassification | Done | This dataset has training set, validation set and test set. Current definition:text,label. | zihanzhu |
| bustm | 2022-05-11 | TextMatching | Done | This dataset has training set, validation set and test set. Current definition:text1,text2, label | zihanzhu |
| cmnli | 2022-05-11 | TextMatching | Done | This dataset has training set and validation set. The test set has no label. Current definition:text1,text2, label | zihanzhu |
| ths2021 | 2022-05-11 | TextMatching | Done | The test set has no label. Current definition:text1,text2, label | zihanzhu |
| bdci2019 | 2022-05-16 | SentimentClassification | Done | The test set has no label. Current definition:text,label. | zihanzhu |
| cmid | 2022-05-16 | IntentClassification | Done | The test set has no label. Current definition: text,label,entities,seg_result | zihanzhu |
| cnsd | 2022-05-16 | NaturalLanguageInference | Done | This dataset has training set, validation set and test set. Current definition:text1,text2,label. | zihanzhu |
| cote_bd | 2022-05-16 | OpinionTargetExtraction | Done | Current definition: opinion, target | zihanzhu |
| sohu2021a_ll | 2022-05-16 | NaturalLanguageInference | Done | The test set has no label. Current definition:text1,text2,label. | zihanzhu |
| sohu2021a_sl | 2022-05-16 | NaturalLanguageInference | Done | The test set has no label. Current definition:text1,text2,label. | zihanzhu |
| sohu2021a_ss | 2022-05-16 | NaturalLanguageInference | Done | The test set has no label. Current definition:text1,text2,label. | zihanzhu |
| sohu2021b_ll | 2022-05-16 | NaturalLanguageInference | Done | The test set has no label. Current definition:text1,text2,label. | zihanzhu |
| sohu2021b_sl | 2022-05-16 | NaturalLanguageInference | Done | The test set has no label. Current definition:text1,text2,label. | zihanzhu |
| sohu2021b_ss | 2022-05-16 | NaturalLanguageInference | Done | The test set has no label. Current definition:text1,text2,label. | zihanzhu |
| lcsts | 2022-05-17 | Summarization | Done | Current definition: text, summary | zihanzhu |
| cote_dp | 2022-05-17 | OpinionTargetExtraction | Done | Current definition: opinion, target | zihanzhu |
| cote_mfw | 2022-05-17 | OpinionTargetExtraction | Done | Current definition: opinion, target | zihanzhu |
| advertise_gen | 2022-05-18 | ConditionalGeneration | Done | Current definition: source, reference | zihanzhu |
| dureader_qg | 2022-05-18 | GuidedConditionalGeneration | Done | Current definition: (1) GuidedConditionalGeneration: source, guidance, reference, (2) QuestionAnswering: content, question, answer | zihanzhu |
| dureader_robust | 2022-05-18 | QuestionAnswering | Done | Current definition: content, question, answer | zihanzhu |
| csl | 2022-05-18 | KeywordRecognition | Done | Current definition: text1, text2, label | zihanzhu |
| dureader_checklist | 2022-05-18 | QuestionAnsweringExtractive | Done | Current definition: id, question, context, title, answers, is_impossible, type | zihanzhu |
| cluewsc2020 | 2022-05-18 | CoreferenceResolution | Done | Current definition: text, pronoun, pronoun_idx, quote, quote_idx, label | zihanzhu |
| cmrc2018 | 2022-05-23 | QuestionAnsweringExtractive | Done | Current definition: id, context, title, question, answers | zihanzhu |
| drcd | 2022-05-23 | QuestionAnsweringExtractive | Done | Current definition: id, context, title, question, answers | zihanzhu |
| chid | 2022-05-23 | QuestionAnsweringMultipleChoiceWithoutContext | Done | Current definition: content, options, answers | zihanzhu |
| c3_d | 2022-05-23 | QuestionAnsweringMultipleChoiceC3 | Done | Current definition: id, context, question, options, answers | zihanzhu |
| c3_m | 2022-05-23 | QuestionAnsweringMultipleChoiceC3 | Done | Current definition: id, context, question, options, answers | zihanzhu |
| finre | 2022-05-30 | KGLinkTailPrediction | Done | Current definition: text, span1, span2, relation | zihanzhu |
| sanwen | 2022-05-30 | KGLinkTailPrediction | Done | Current definition: text, span1, span2, relation | zihanzhu |
| cail2019 | 2022-05-30 | QuestionAnsweringMultipleChoiceWithoutContext | Done | Current definition: question, options, answers | zihanzhu |
| ccpm | 2022-05-31 | QuestionAnsweringMultipleChoiceWithoutContext | Done | the test set has no label. Current definition: question, options, answers | zihanzhu |
| cnse | 2022-05-31 | TextPairClassification | Done | Current definition: text1,text2,title1,title2,keywords1,keywords2, main_keywords1,main_keywords2,ner_keywords1,ner_keywords2,ner1,ner2,label | zihanzhu |
| cnss | 2022-05-31 | TextPairClassification | Done | Current definition: text1,text2,title1,title2,keywords1,keywords2, main_keywords1,main_keywords2,ner_keywords1,ner_keywords2,ner1,ner2,label | zihanzhu |
| ccks2019_fin | 2022-06-01 | EventEntityExtraction | Done | The test set has no label. Current definition: text, event_type, event_entity | zihanzhu |
| ccks2020_fin_ea | 2022-06-01 | EventArgumentsExtraction | Done | The test set has no label. Current definition: text, event_type, arguments | zihanzhu |
| ccks2020_fin_ee | 2022-06-01 | EventEntityExtraction | Done | The test set has no label. Current definition: text, event_type, event_entity | zihanzhu |
| ccks2021_fin_ea | 2022-06-01 | EventArgumentsExtraction | Done | The test set has no label. Current definition: text, event_type, arguments | zihanzhu |
| ccks2021_fin_re | 2022-06-01 | EventRelationExtractionCausality | Done | The test set has no label. Current definition: text, relation | zihanzhu |
| chip2019_qm | 2022-06-07 | TextPairClassification | Done | This dataset has only a training set. Current definition: question1, question2, category, label | zihanzhu |
| ckbqa | 2022-06-07 | QuestionAnsweringOpenDomain | Done | This dataset has training, validation and test set. (1)QuestionAnsweringOpenDomain: Current definition: question,answers. (2)TextToSql: Current definition: question,query. | zihanzhu |
| coqa | 2022-06-07 | QuestionAnsweringOpenDomain | Done | This dataset has training, validation and test set. (1)QuestionAnsweringOpenDomain: Current definition: question,answers. (2)TextToSql: Current definition: question,query. | zihanzhu |
| nlpcc2017_dbqa | 2022-06-07 | QuestionAnsweringClassification | Done | This dataset has training, validation and test set. Current definition: question, answer, label | zihanzhu |
| cmrc2019 | 2022-06-08 | ClozeMultipleChoice | Done | This dataset has training, validation and test set. Current definition: question_mark, context, options, answers | zihanzhu |
| dureader_zhidao | 2022-06-08 | QuestionAnsweringExtractive | Done | This dataset has training set and validation set. The test set has no label. Current definition: documents, answers, segmented_answers, fake_answers, answer_spans, question, segmented_question, question_type, fact_or_opinion, question_id, match_scores, answer_docs | zihanzhu |
| dureader_search | 2022-06-08 | QuestionAnsweringExtractive | Done | This dataset has training set and validation set. The test set has no label. Current definition: documents, answers, segmented_answers, fake_answers, answer_spans, question, segmented_question, question_type, fact_or_opinion, question_id, match_scores, answer_docs | zihanzhu |
| dureader_yesno | 2022-06-08 | QuestionAnsweringExtractive | Done | This dataset has training set and validation set. The test set has no label. Current definition: documents, question, answers | zihanzhu |
| children_fairy_tale | 2022-06-14 | ClozeDocuments | Done | This dataset has training set and test set. Current definition: documents, documents_tokens, question, question_tokens, answers | zihanzhu |
| people_daily_rc | 2022-06-14 | ClozeDocuments | Done | This dataset has training set, validation set and test set. Current definition: documents, documents_tokens, question, question_tokens, answers | zihanzhu |
| matinf | 2022-06-14 | QuestionAnsweringClassification | Done | This dataset has training set, validation set and test set. Current definition: text, label | zihanzhu |
| nlpec | 2022-06-14 | QuestionAnsweringMultipleChoiceNLPEC | Done | This dataset has only the training set. Current definition: question_type, question, question_s, options, options_s, context, context_s, answers | zihanzhu |
| cspider | 2022-06-15 | TexttoSQL | Done | This dataset has a training set and a validation set. Current definition: question, query, database_id | zihanzhu |
| dusql | 2022-06-15 | TexttoSQL | Done | This dataset has a training set and a validation set. Current definition: question, query, database_id | zihanzhu |
| nl2sql | 2022-06-15 | TexttoSQL | Done | This dataset has a training set and a validation set. Current definition: question, query, database_id | zihanzhu |
| duee | 2022-06-15 | EventArgumentsExtraction | Done | This dataset has the training, validation and test set. Current definition: text, event_type, trigger, trigger_start_index, arguments | zihanzhu |
| duie | 2022-06-15 | EntityRelationExtraction | Done | This dataset has the training, validation and test set. Current definition: text, relation | zihanzhu |