Word Salad — data

Word Salad: Relating Food Prices and Descriptions
Victor Chahuneau, Kevin Gimpel, Bryan R. Routledge, Lily Scherlis, and Noah A. Smith

The restaurant menus dataset is divided into three parts for running the experiments:

Training data = 7,401 restaurants / 733,360 items: train.json.gz
Development data = 934 restaurants / 90,917 items: dev.json.gz
Evaluation data = 944 restaurants / 91,697 items: test.json.gz

Python code to train a baseline model is available on Github.

The files contain one JSON map per line, representing a restaurant menu, with the following schema:


{
    "id": <str>, 
    "name": <str>,
    "city": <str>,
    "address": <str>,
    "neighborhoods": [<str>],
    "categories": [<str>],
    "avg_rating": <float>,
    "latitude": <float>,
    "longitude": <float>,
    "info": {<str>: <str>},
    "items": {
        "name": <str>,
        "description": <str>,
        "price": <float>
    }
}