Processing data =============== You can chain `filter-df` command after the `all-data` command or `from-file` command to apply various finds of processing to the data obtained through your query. For example, the following invocation will give you the data corresponding to the best runs (using metric best_validation_fixed_f1 as higher the better) in sweeps `5pfpcetn` and `abcd12345`. .. code-block:: console $ wandb-utils \ -e username_or_team \ -p project_name \ all-data \ --filters "{\"sweep\":{\"\$in\":[\"5pfpcetn\", \"abcd12345\"]}}" \ filter-df -f sweep_name -f sweep -f run -f path -f test_fixed_f1 -f best_validation_fixed_f1 -i path \ --query "df.sort_values('best_validation_fixed_f1', ascending=False).drop_duplicates(['sweep_name'])" \ print You can also do fairly complex things using `--pd-eval` that uses `pandas.eval` function. For instance following command performs 4 processing steps 1. `--pd-eval "test_CMAP=rmax(df.test_MAP_max_n, df.test_MAP_min_n)"` creates a new column named `test_MAP` by taking the max of two existing columns. 2. The second and third steps create two new columns `_dataset` and `_model` by extracting strings from the `tags` column. 3. The last step `--pd-eval "df.groupby(['_model', '_dataset'], as_index=False).mean()"` performs grouby followed by taking the mean of groups. .. code-block:: console $ wandb-utils -e USERNAME -p PROJECT \ all-data \ filter-df --pd-eval "test_CMAP=rmax(df.test_MAP_max_n, df.test_MAP_min_n)" \ filter-df --pd-eval "_model=df.tags.str.extract(r'model@([^\|]+)',expand=False)" \ filter-df --pd-eval "_dataset=df.tags.str.extract(r'dataset@([^\|]+)',expand=False)" \ filter-df --pd-eval "df.groupby(['_model', '_dataset'], as_index=False).mean()" \ filter-df -f test_MAP -f test_CMAP -f test_constraint_violation -f _model -f _dataset \ print There even more general and powerful ways, `--python-exec` and `--python-eval`, to process the dataframe using python's native `exec()` and `eval()` functions, respectively, that allow executing arbitrary python code.