Release v0.0.1 · Significant-Gravitas/Auto-GPT-Benchmarks

What's Changed

First commit for AutoGPT Benchmarks by @dschonholtz in #1
Typo in README.md by @ambujpawar in #2
Remove the submodule, reference OpenAI directly rather than running it on the command line, fix logging by @dschonholtz in #16
Update README.md by @dschonholtz in #17
Graphs for evals by @rihp in #20
windows docs make workspace if not there by @dschonholtz in #25
EvalNames with dates for the eval run filename and compatibility with 0.3.0 by @dschonholtz in #26
init first challenge template by @ScarletPan in #34
start fixtures, types, challenge creation, mock run (stable by @SilenNaihin in #37
Add automatic regression markers by @SilenNaihin in #38
MockManager, mock_func in data.json by @SilenNaihin in #39
addition of basic challenges, easier challenge creation, --mock flag, adding mini-agi by @SilenNaihin in #40
Update README.md by @SilenNaihin in #41
adding hook to integrate agnostically by @SilenNaihin in #42
Integrate one challenge to auto gpt by @merwanehamadi in #44
Add static linters ci by @merwanehamadi in #45
Run regression tests on push to master and stable by @merwanehamadi in #46
Integrate with gpt engineer by @merwanehamadi in #47
Integrate smol developer with agbenchmark by @merwanehamadi in #48
Explain how to benchmark new agents by @merwanehamadi in #49
local runs, home_path config, submodule miniagi by @SilenNaihin in #50
Add retrieval challenge test + run tests on CI pipeline by @merwanehamadi in #51
Add pr template by @merwanehamadi in #52
Add information retrieval 3 by @merwanehamadi in #54
Change test dependencies by @merwanehamadi in #55
dynamic workspace path by @SilenNaihin in #56
Add basic memory challenge by @merwanehamadi in #57
Rename '--reg' flag to '--maintain' by @merwanehamadi in #58
Add 'Remember multiple ids' memory challenge by @merwanehamadi in #59
added caching based on file key by @SilenNaihin in #62
Add 'remember ids with noise' challenge by @merwanehamadi in #61
Add 'remember phrases with noise' challenge by @merwanehamadi in #63
fix home_path, local mini-agi run works by @SilenNaihin in #64
Add 'Debug simple typo with guidance' challenge by @merwanehamadi in #65
Add "Debug code without guidance" challenge by @merwanehamadi in #66
Get rid of get file path by using the data.json convention to store the challenge information by @merwanehamadi in #67
Print out all of stdout on each process poll. by @erik-megarad in #69
Add .txt to memory challenges by @merwanehamadi in #70
Fix memory challenge 2 by @merwanehamadi in #71
Use artifacts out instead of python code by @merwanehamadi in #72
i/o workspace, adding superagi by @SilenNaihin in #60
fixing the incorrect addition of superagi by @SilenNaihin in #73
quality of life improvements & fixes by @SilenNaihin in #75
Fix debug code challenge by @merwanehamadi in #76
Add gpt engineer to ci by @merwanehamadi in #78
just json, no test files by @SilenNaihin in #77
Combine all agents into one ci.yml by @merwanehamadi in #79
adding search interface challenge and cleaning repo by @SilenNaihin in #80
Add Helicone by @merwanehamadi in #81
Add "Simple web server" challenge by @merwanehamadi in #74
added --test, consolidate files, reports working by @SilenNaihin in #83
Fix tests ci by @merwanehamadi in #82
All Agents log to helicone automatically by @merwanehamadi in #85
Fix Auto-GPT integration by adding python module as entrypoint by @merwanehamadi in #86
Fix Auto-GPT looping forever by @merwanehamadi in #87
Add custom properties to Helicone by @merwanehamadi in #91
Enable cache again by @merwanehamadi in #92
fixing backslashes, adding basic metrics by @SilenNaihin in #89
Fix Smol developer and gpt engineer by @merwanehamadi in #93
Remove dependencies cache by @merwanehamadi in #94
Remove dependencies if a specific test is asked by the user by @merwanehamadi in #95
Update submodules and upload artifacts by @merwanehamadi in #97
Add basic code generation challenge by @merwanehamadi in #98
Replace hidden files with custom python by @merwanehamadi in #99
Start showing benchmark results by @merwanehamadi in #100
Show Auto-GPT results by @merwanehamadi in #102
Display smol-developer-results by @merwanehamadi in #103
Display results per category by @merwanehamadi in #104
Update auto gpt to current version of master by @merwanehamadi in #105
Update Auto-GPT score by @merwanehamadi in #106
Clean up workspace between each test by @erik-megarad in #109
Add three sum challenge by @merwanehamadi in #108
Fix ci by @merwanehamadi in #110
Remove cache true on pr by @merwanehamadi in #111
Dynamic cutoff and other quality of life by @SilenNaihin in #101
Allow change location of reports by @merwanehamadi in #115
Fix cutoff errors by @merwanehamadi in #116
Fix pipes issue by @merwanehamadi in #117
Update reports when pushing to master by @merwanehamadi in #162
dynamic home path for runs by @SilenNaihin in #119
internal_info.json dynamic changes by @SilenNaihin in #163
file naming when --test by @SilenNaihin in #164
Use report location by @merwanehamadi in #165
fixing memory challenges, naming, testing mini-agi, smooth retrieval scaling by @SilenNaihin in #166
Push reports to google drive by @merwanehamadi in #167
Integrate Beebot by @merwanehamadi in #169
Change beebot submodule by @merwanehamadi in #170
Disable cache by @merwanehamadi in #174
Kill subprocesses when test ends by @erik-megarad in #172
Update beebot submodule by @merwanehamadi in #175
Update submodules by @merwanehamadi in #176
integrate baby-agi by @SilenNaihin in #168
Publish pypi package by @merwanehamadi in #179
Update publish_package.yml by @merwanehamadi in #180
Make spreadsheet dynamic by @merwanehamadi in #181
Update Helicone mitm to pin to a specific version by @merwanehamadi in #182
Update permission package by @merwanehamadi in #183
Change package version by @merwanehamadi in #184

New Contributors

@dschonholtz made their first contribution in #1
@ambujpawar made their first contribution in #2
@rihp made their first contribution in #20
@ScarletPan made their first contribution in #34
@SilenNaihin made their first contribution in #37
@erik-megarad made their first contribution in #69

Full Changelog: https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks/commits/v0.0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.1

What's Changed

New Contributors

Contributors