Meeting Notes - arc-community/arc GitHub Wiki
arc meetup Oct 05
(minuted by @asteriskCat)
andreaskoepf: Working hard for 1 month trying to complete project - memetic search plus gradient. Memetic - random sample, gradient, cross/mutate, neural network as proxy for transforms so it's differentiable. Hold network constant after training. Looking for team members.
Andreyz4k: Some progress, but nothing to report right now.
asteriskCat:
Looked at implementations of optimal transport loss function metrics
Wasserstein, Sinkhorn, Monge-Kantorovich, etc., are related optimal transport loss functions that give a smooth gradient even if the proposed image from the ARC solver is close, but offset so nothing lines up.
WGAN uses some tricks to avoid directly computing optimal transport, and so is unusable for our purposes.
 The Geometric Loss functions between sampled measures, images and volumes — GeomLoss package has several different types of appropriate losses, and demos of using them for various purposes, including comparing the distance between color photos.  Because of our "color is a 1-hot attribute" property, there will have to be a little tweaking, but shouldn't be bad - I think it might amount to breaking the algorithm into pieces, running the parts of the algorithm once for each color used, then assembling the pieces for all color combinations if I can find a kernel or dynamic programming solution.  If push comes to shove, the color image distances could be used "as is," but wouldn't give accurate results if the color palettes were off between the proposed solution and the target.  GeomLoss is designed to be used with the PyTorch framework.
The Geometric Loss functions between sampled measures, images and volumes — GeomLoss package has several different types of appropriate losses, and demos of using them for various purposes, including comparing the distance between color photos.  Because of our "color is a 1-hot attribute" property, there will have to be a little tweaking, but shouldn't be bad - I think it might amount to breaking the algorithm into pieces, running the parts of the algorithm once for each color used, then assembling the pieces for all color combinations if I can find a kernel or dynamic programming solution.  If push comes to shove, the color image distances could be used "as is," but wouldn't give accurate results if the color palettes were off between the proposed solution and the target.  GeomLoss is designed to be used with the PyTorch framework.
futureisold: Working on gym. Endeavoring to use unique combinations based on group arguments to minimize training time. Discussion of example solutions vs riddle solutions. One board at a time uses less memory, but there are instances where multiple examples are needed to resolve ambiguities. Memory riddles require all the boards, too. This is a good place to start.
jammy3417: Implemented an infinite data generator and made it available. Interested in handling OOD using method of OBJECT-CENTRIC COMPOSITIONAL IMAGINATION FOR VISUAL ABSTRACT REASONING https://openreview.net/forum?id=rCzfIruU5x5 to improve generalization.
parapraxis: Working with theory that if system is trained on simple tasks, a language model (here, longT5 on custom riddle generator) will generalize on tasks that need to compose them. Experiments showed this wasn't completely true, although there was some evidence of compositional generalization. A mix of simple and composed tasks looks best. Discussion w AK re: equivariance for object detection.
XMaster96: Issues with getting old checkpoints giving the same good results as earlier.
Mohamed Osman: Continuing working on his method after vacation, including the "new riddle" of 20220921.
Yannic Kilcher: Implemented a DSL, ZickZack approved. Hierarchical scene decomposition into typed objects (in the scene composition sense, but also classes) with properties. Typed transforms are classes and also have properties. DSL is hierarchical. Once object properties are defined, the object can be rendered. Can also yield a fitness score, and will need to parse. Priors (e.g., gravity) will have properties as well (e.g., gravity direction). Parsing consists of broadcasting a request to all modules that could potentially recognize pieces of object. Implemented very simple DSL of lines and translations, parsing with heuristics. Combinatorics are limited by requirements for type matching - any renderable object can be translated.
Discussion of reversible functions (which have limitations, such as noise is not reversible - you can't determine where removed noise was once it's removed).
Discussion of how to resolve ambiguities - fit score and Occam's razor. AK: randomly select 20 or so riddles, construct graphs for representing them. YK: This is the next step I intend to do. AK: "Waterfall" models - look at them as an instance for a relatively complicated graph. YK: It may not be able to solve a lot of problems, but it should generate good riddles, more human aligned. It should work at an object level rather than pixel level. AK: It's important that the riddles contain enough information to solve it. aC: That's about the same as solving ARC challenge. MO: You need to be able to incorporate priors. The rotation problem with 2 examples almost worked, but didn't have enough information w/o priors to solve. YK: That's what the current year problem is - how to inject arbitrary priors into systems. AK: Your line looks ambiguous. YK: It's definitely ambiguous, but the pixel level complexity is high. Ambiguity is resolved by fit and complexity scores. More elements, width or depth is more complex. Will use heuristics a lot of the time. aC: Are you thinking like microservices or some similar dispatching? YK: Exactly, some parsing will be complex. "Can anything in this scene give me a direction for gravity?" If you can't find a property, the system defers to another round until someone can fill it in.
YK: Scene composition has to work while still looking like the same graph. AK: What about 2 objects v 1 object? YK: There would be 2 graphs, but would they would be similar. It constructs objects with defined properties, but some of those properties might be graphs.
Generating a graph from the input board, and then looking at similarities between the graphs and subgraphs, should give you clues as to what you need to pay attention to. Similarly the output. Self-similarity (find all the stars) is also something that should be extracted as well. Once one graph is parsed, look for similar things in the other boards.
The parsing should be unique, but the chunking of pieces of graphs is not.
gabbo: Has anyone tried metalearning?
MO: Metalearning has been discussed many times, it's a good, if not exact, fit.
gabbo: AK, you talked about splitting into groups.
AK: Described memetic system sketch w onenote. The magic is that gradient can be used for search.
Open discussion.
arc meetup Sep 28
(minuted by @asteriskCat)
ZickZack: Looking into fuzzy logic. It's mostly about parsing. Transformations can be weakened. Fuzzy logic operates differently from a Bayesian system. BS has regions where things are thought to be relevant, and you can move search to other places. Fuzzy logic builds hard constraints to limit. The algos work very differently BS, fuzzy works better for control.
parapraxis: Transformers are consistently solving some types problems. Illustrated in stream some examples. Some discussion of loss functions. Noise augmentation, any example can also be used as a test, etc.
Jammy: Trying to train models w infinite data.
gabbo: New. 1 year ago tried to reproduce neural reasoning paper, not much success. Some discussion of paper, importance of autoencoder.
asteriskCat:
- 
Looking at fuzzed-up testing grammar generation as form of compression. Generation of grammar, some research I've been doing into using fuzzified testing as a generator. [The Fuzzing Book](https://www.fuzzingbook.org/) where such grammars are used to generate valid tests, also [uds-se/fuzzingbook: Project page for "The Fuzzing Book"](https://github.com/uds-se/fuzzingbook). They show how to use fuzzing from grammars to generate valid tests, and how to add constraints (such as ports must be in a certain numerical range) for those things a grammar cannot generate. 
- 
Number-wall representations of sequences - this approach uses the number-wall, essentially a very efficient way of computing multiple determinants by reusing the determinant components. It will generate any linearly iterated sequence, including Fibonacci, modulo, polynomial, and rational. If the initial elements of the sequence are integers, I think we always wind up with integers as well, both in the tableau and in the output sequence. https://www.youtube.com/watch?v=NO1_-qptr6c 
- 
Sparse matrix representations for boards as compression. Assume most common tile is "0" (and store color translation), then store other data as sparse matrix using common scipy etc. utilities (block sparse row matrix) 
- 
Collective tilings as compression https://arxiv.org/pdf/1902.02861.pdf Compression/tilings/2019 - Discovering Descriptive Tile Trees.pdf (1D only) 
- 
Better loss functions - cross-entropy on a per-pixel basis is not ideal - can't give information when output is close but not matching. Wasserstein/Sinkhorn and others (sum of k peaks of cross-correlation with a discount factor $\delta^k$ (which implies data are Gaussianized for accuracy, Costanza (absolute value) version of the same, etc.) 
- 
Representation of geometry of boards by sequences It seems a reasonable representation is as follows: 
- Board x dimension
- Board y dimension
- Sequence of data, scanned per some fixed protocol (raster (toroid), snake, space-filling, etc.)
- If there are too many characters in the sequence for the board space, truncate
- If there are too few characters in the sequence for the board space, repeat the last cell contents. These last two rules fix the problem found in many generative systems where it is possible to generate bad data. Note that having x, y and length(seq) is only slightly redundant, but allows fixing bad generated sequences, and getting a score for them that yields feedback for weight updates or other learning.
Notice that we probably want the board size and sequences generated differently, since the board alphabet is very much smaller (10) than the maximum board size (90).
- Geometry-aware transformers
Andreyz4k: DreamCoder work. AK: Does DC have a mechanism that prunes out junk to make a more efficient transformation set? A4k: There is a complexity penalty. AK: How do you seed such a system? A4k: On each epoch, you solve several puzzles, then you try to solve the entire corpus of task. There is a timeout on puzzles (20 sec or so).
andreaskoepf: UViM - inspired approach using oracle and guide code. Description of approach using whiteboard illustrations.
- a model used to learn a codebook representation of a transformation from riddle examples
- a model that given an input and a codebook representation gives you the correct output corresponding to that transformation
- a model that is able to chain codebook representations in order to produce more complex compositions of transformations (this is essentially a search model over compositions of transformations but in the space of codebooks). Matteo: Could do UVIM training using Kaggle transformation tags. ViT completely generalized to the task of "copy symbol, and color according to the small square."
0x000FF4: Working on automata. Now working on approximations on sphere. (reading the book: Approximations of Harmonic functions on spheres and balls)
arc meetup Sep 21
- andreaskoepf: wrote variation-generators for two riddles from the public ARC evaluation set (code in prototyping/riddle_script in arc-research repo), conducted ViT "inpainting" experiments: ViT can solve unseen variants of e_009d5c81 after training on ~1k examples when the model was pretrained on a synthetic ridig-plus dataset.
- Andreyz4k: last two weeks working on his DreamCoder extensions, reported a about the approach he took ~1y ago with a manual DSL
- asteriskCat: Reviewed image saccade entropy measures, Found subseq sequence estimators - public repos for GPT and subseq (wavelets + Barrow Wheeler transform compression) are available.
- futureisold: conceptional work, planning a RL approach, wants to create an openai gym env for ARC e.g. with actions to select and set pixels ... reward might be based on correct pixels, maybe MuZero as model for agent; suggested to add existing notebooks (e.g. the analysis done by pa) to the arc-research repo in a new folder, he will take care of this.
- Mohamed Osman: worked on solving a simple 3x3 rotation riddle with only two training examples by fine-tinune a model with batch-size 1 with augmentations, contrastive loss (to prevent memorization of outputs), high spectral-norm (was helpful) and several tricks; now trys to solve the e_009d5c81 riddle also with very few training examples; explained reversible test-time augmentation
- pa: thought about riddle generators: suggests creating basic riddles as demonstrations for the priors described by Chollet, that would be more feasible than creating variation-generators for all ARC riddles, e.g. teach basic concepts to models (e.g. couting).
- jammy3417: worked on a variation generator for the "connected compartments dungon" to see how long it takes to write a generator for a complex riddle, will probably continue with simpler riddles
- parapraxis: (showed pa's data ARC analysis notebook), trained a discrimiator to classify synthetic vs original riddles, allows to rank/classify synthetic riddles, he selected ~400 more human-like from the synthetic set, removed copy-augmentation out of his training-dataset, his current best model solves 5 items from the held-out test-set: can do color and object-counting, seems to get basic concepts in multiple other riddles but fails to fully solve them, currently using LongT5 (base model ~200M params) (good: can ingest all of the items, no filtering on size required), also tried LongFormer (~240M params) e.g. for the synth/real discriminator.
- XMaster96: trained variant of Magma without adapters from random init on large synthetic data-set: solves 23 training, 5 test riddles exactly, some model overview: grids->image, CNN image encoder, 144 token-embeddings -> fed to GTP-2 language model -> decode output autoregressively; next week: training improved model (bi-directional embedding on "tokens" coming from model, different tokenizer, bug fixed)
arc meetup Sep 07
Reports
andreaskoepf: Vision transformer test on simple problems seems to work well; vision transformer probably won't work well for complex problems, but it's worth seeing what it can do. Presents all the boards at the same time, every board is 1 token, compress color ot 3 channels as RGB encoded.
Weight sharing for each layer
Weights and biases, posted soon
Partially inspired by Visual Prompting via Image Inpainting.
10 x 10 boards.
Andreyx4k: Working on Julia (mostly) approach, implementing Julia version of original OCAML approach. Transforming tasks with DreamCoder; more focused on DreamCoder than fleshing out transforms. Talked to DreamCoder author Karen. AK suggested approaching them about a paper.
asteriskCat: Working through papers of Ricardo Menezes Campello de Souza. I intend to use this to construct finite field transformations for similarity and oracular metrics.
1971 - The Fast Fourier Transform in a FInite Field.pdf
1998 - The Hartley Transform in a Finite Field.pdf
1998 - Trigonometry in Finite Fields and a New Hartley Transform.pdf
2000 - The Complex Finite Field Hartley Transform.pdf
2001 - Hartley number theoretic transforms.pdf
2001 - On Fast Finite Field Hartley Transform Algorithms.pdf
2004 - The Discrete Cosine Transform over Prime Finite Fields.pdf
2005 - Infinite Sequences, Series Convergence and the Discrete Time Fourier Transform over Finite Fields.pdf
2010 - The finite field fractional Fourier transform.pdf
2015 - Fast Finite Field Hartley Transforms Based on Hadamard Decomposition.pdf
2015 - Fourier Codes and Hartley Codes.pdf
2015 - Fourier Codes.pdf
2015 - Introducing an Analysis in Finite Fields.pdf
2015 - The Z Transform over Finite Fields.pdf
futureisold: Putting together ideas to discuss with group members. Mapping between ARC and similar context of rate control for bit compression. Original paper had 2 agents, this years paper avoids the 2 players, but it doesn't really correspond to ARC. Goal is self-competition on boards that are matched. Working on larger boards (30 x 30). Basic idea is reward system on pixel that it changes, and hopefully learn rules and be invariant of color, hopefully at least 3 transformations. Thinking of this as a game, at first we just randomly change a pixel. The most challenging part is making the environment. Hopeful that a depth of 3 with 200 examples, it should find rules like MuZero. Will write for discussion.
Jammy3147: Working on training a differentiable cellular automata (CA). It's only training from existing examples. It needs to figure out the rules implicitly. For problems that need to run a lot of steps, it has difficulty learning. Maybe algorithmic regularization would work better than functional regularization. Using a neighborhood of 3 x 3. If anyone has neurosymbolic methods or grokking experience, help would be appreciated. It was recommended that spectral/bi-Lipschitz regularization be tried. Described a sparse global problem and how it could be solved - discussion about approaches to solving. It was suggested that it might do better somehow knowing about multiple boards, and if knowledge from other puzzles would help.
futureisold suggested using an attention block such as the one they used in the stable diff: to get better global response https://github.com/CompVis/stable-diffusion/blob/ce05de28194041e030ccfc70c635fe3707cdfc30/ldm/modules/diffusionmodules/model.py#L150
Mohamed Osman: Video-based model. Spectral diffusion didn't work. Intra-riddle learner - using video transformer, working towards metalearning. Wants to try some other ideas using least amount of data and finding best regularization. No autoregressive decoding yet, but may try this later. Focusing on pure transformations. Has board processor, and interframe (communication between boards) attention as inductive biases. It seems video models are useful. The board transformer is in the transformer. Questioning whether an autoencoder (AE) is a good idea - it optimizes for compression, which may not generate useful latents, and the boards are different every time. The representations might be something the AE can't represent.
XMaster96:  Generated 20M synthetic puzzles of different depth.  Pretrained MAGMA models, multimodal, with the idea that it will already understand a lot about boards, languages, etc.  Can't give architectural specifics for commercial models, of course.
 Relatively poor performance at this time; after 2 pixels or so, issues stop token as a frequently occurring failure mode.
Relatively poor performance at this time; after 2 pixels or so, issues stop token as a frequently occurring failure mode.
AK: Most of these puzzles are no longer human solvable (use smear, etc.). 10 x 10 boards, constant size, constant token number, etc., trying to reduce complexity as much as possible, then will try increasing complexity until it fails and try to figure out how to patch. Wants to try curriculum approach.
XM: The MAGMA model should be at least as well as icecuber.  The MAGMA adapters are where all the information is stored, and are frozen, so wants to try unfreezing last 5-6 layers.  Code base is hard to work with.  Wants to train all parameters, unchaining trained model from language model.  Don't have compute for 10B parameters available very often.  Several issues with configuration in experiments uncovered.  Now feels a lot more data is needed; trying to stay below 1 epoch.  Estimated 200M riddles needed, trying to use their usual batch size.  Trained w Pipeline Parallelism, which trains different pieces of the network on different GPUs (a PyTorch implementation is at https://pytorch.org/docs/stable/pipeline.html)
 This is how the number of required examples were estimated; perhaps 20 days of compute time will be needed.
This is how the number of required examples were estimated; perhaps 20 days of compute time will be needed.
0x000FF4: Asking about others working on similar approaches.
Open discussion
AK: Ideas on how to set up a proper experiment for seeing results of curriculum training. Depth of 2 failed, fell back to a very restricted transform set. Rigid data set, Rigid+color data set worked @ 97% accuracy, increasing the number of transform types and depth.
XM: Adapters are now improving faster than luminaire space models. Very early on training.
(minuted by @asteriskCat)
arc meetup aug 31
parapraxis: more experiments, progress with BART, has ready system to submit to kaggle to test on private test set, tested with BART-large and BART-base. Tried several things to get correct sequence length, e.g. predicting grid size. discovered long T5 model, allows longer sequence input up to 16k tokens input with relative pos embeddings, local and global attention, tried different combinations of models - large model ~800M params (comparable to BART-large), long T5 results better than BART, per token accuracy BART: 35%, long T5: 58%, presents his result table, using trained model output as additional input to newly trained model -> new model augmentation. training code for long-t5 works with many other models from huggingface. maybe try next: auto-regressice model that combines answers of early models, basically a form of error-corrections.
Mohamed Osman: began implementation of a hypernet model for some experiment. lipschitz regularization method looked promising. goal: learn tranformation instead of memorization. Spectral regularization -> NAR paper read again: meta-learner produces instructions + transformer does single board transformation. Tried to implement the tranformer model of the NAR paper. Plan to read DNC paper. working further on a NAR. "Weight updates in the eval loop are an important part"
dmiles/logicmoo: plans to create a video of the workflow in his proglog approach
asteriskCat: looking to transfer entropy, reed-slaomon coding seems interesting, looking at similarity of input-ouptut boards, based on seen transformations in the training-examples.
Andreyz4k: has ~2Y experience on ARC. Replicating how a human solves the tasks. bi-directional program search with some pattern matching. Not want to write DLS functions by hand, therefore looking and extending of dream-coder, rewriting it in Julia. Improving heuristics for search in DreamCoder. Expert on DreamCoder.
0x000ff4: Ran into a problem with auto-encoder function discrimination.
pa: read a lot about spectral regularization (harder form of weight decay), mem aspect might be less important (intuition from reading other papers like the differential neural computer paper). NAR paper lacks ablation experiemnts: e.g. how good would it work without memory, without spectral regularization, etc. NAR has many moving parts. Idea: Start smaller and build model from the model. Together with Avelina: BERT is now done in TF, now training, testing this week. Added to arc-mentation: Noise augmentations, still needs to be merged. Proposes merging arc-mentation into arc (not to use a sub-repo). May work on other riddles.
Yannic: Working on high-level DSL.
arc meetup aug 3
- andreaskoepf
- First version of perceiver setup, 80% accuracy on multi label classification
- Look into different positional encodings
 
- Asterisk cat
- Reviewed papers on neural turing machines and group theory for kernel methods
- Z order scan might be interesting for locality preserving hashing
 
- Futurisold
- Looked into muzero
- Specifically paper on video compression, Agent self-competes. Muzero general
- Framed arc into gym env
- For now same-shape pairs
- Action: switch pixels
 
- Matteo
- Worked on data augmentation system everyone can use
- What is the extreme Version of data augmentation? Apply the full DSL Operations
- Evaluation on Andreas synthetic sets with augmentation and got about 15% correct with a ViT based on a 3d convent
- 30x30 output with extra padding class
- Cost function seems to be important
 
- Mohammed
- Arcmentation library
- Torch vision style, extendable
- PA
- Wants to replicate dream coder or neural abstract reasoner
 
- Parapraxis
- Experimenting with LMs e.g. BART
- Accuracy declines non linearly with length of sequence compared
- Explored batch sizes, best batch size is 1
- used filtered dataset of samples that fit into 1024 characters, results in 290ish examples for training, 50ish for validation
- reaches eval accuracies between 14%-50% depending on split
- accuracy a bit cheated, because limits output to length of correct answer
 
- Jammy
- Coded cellular automata on samples that are suitable
- Maps local neighborhood (3x3) to value in center, implement as convnet, recurrently applied, Gumbel softmax sampling
- Single step automata work well, but multiple steps harder
- Curriculum learning building up steps seems to help
- Regularization via L1
 
arc meetup july 27
- andreaskoepf
- perceiver based approach, on vacation
 
- asteriskCat
- http://www.mattmahoney.net/dc/text.html
- compression benchmarking problem (hutter prize: 500k euros to compress wikipedia)
 

- X_i and Y_i are input and output board i for riddle
- X_query is also an input board, Y_answer the network’s prediction
- process description specifies how to transform input into output
- backprop loss through process description extraction to learn process description
- BeatriceBernardo
- found a bug: https://volotat.github.io/ARC-Game/?task=evaluation%2F4852f2fa.json
- example ID: 4852F2FA
- doesn’t seem to work with correct solution
- the online game doesn’t seem to have the correct answer
- the raw data seems to have the correct answer
 
- Matteo
- converted ice cuber solution to python to visualize
 
- Mohamed Osman
- working on hyperlearner
- hyperlearner: cross-validation, split dataset, cross-validate
- here dataset means a single riddle, and the samples are the data
 
- creating hyperparameter-tuners that choose augmentations, learning rates, etc. that improve cross-validation metrics
 
- parapraxis
- took andreas’ dataset of simple transformations and fine-tuned BART & T5 models
- added simple data augmentations
- results: not great, best 2/1000 completely correct
- learned more about representing boards as strings
- number symbols seem to be better than letters, maybe due to tokenizers
 
- copying a grid task did not work well either
- looked at the correctness of the first N characters
- big difference between low and high N (low N ≤ 20 very good, high N bad)
 
- wants to look into connection of board representation to language
- also tried full text descriptions where a human describes the grid, works almost, but not entirely
 
- futurisold
- working on POET
 
- Jammy3417
- working on integrating Cellular Automata into DSL
 
- Open Discussion
- Neural Turing Machines
- are they an option?
- https://arxiv.org/pdf/1807.03819.pdf maybe related
- why have they gone out of fashion
 
- MuZero
- discovers “rules” of game itself
- what should be “one move”?
 
- GFlowNets
- distributional matching over graphs / discrete operations
- could be used to generate outputs? dsl applications?
 
- Ideas for data generation
- just sampling ice cuber makes riddles that are way out of distribution
- we could slightly modify or combine existing samples where we know the dsl solution
- c++ code uses heuristics to normalize colors
 
 
- Neural Turing Machines
arc meetup july 20
- andreaskoepf
- Completed first version of riddle generator, based on ice cuber dsl, set of functions included in brute-force search
- generate random graphs based on these functions
- 3 types:
- unary operations: 1 image input 1 output
- binary: 2 image input 1 output
- others: work with lists of images
 
- also generate random input candidates based on augmented versions of arc samples
- retry until no “outliers”
- synthetic examples so far are limited to tiny subset of functions as a starting point
- w/ Matteo figured out which of the ice cuber solution solves which ID
- started on implementing first neural approach
- opinion: we need to get the functions correct, the rest is solvable by RL / DL
 
- asteriskCat
- looked at BWMD metric github
- BWMDs: embeddings dependent on IT quantities related to compression, euclidean distance between embeddings reflects probability of similarity in some space
- “compressing two things together to see how different they are”
- Idea: can guide MCTS by observing whether the test sample has been solved “in the same way” (information theoretically) as the train samples, or whether a particular MCTS branch is on the right track
- they used first-order statistics, but that’s probably not enough
- tunneling BWT can compress further
- assumption: we should handle transformations / statistics on strings like convolutions,
- extended BWT can handle transformation of multiple strings simultaneously
 
- dmiles
- has tried working with arc in prolog
- “GOFAI” approach, implement rules & inference engine
- has transformations (like ice cuber) & feature generators
 
- plan: try aleph program induction, allows us to take a model and create the prolog source code that would make that model work
- opinion: if we add more ice-cuber-like functions, we can probably solve many of the tasks, but not really fruitful because it’s just hand-crafting
 
- grey
- continues work on the fast DSL executor
- idea: could try "skill discovery" on the OIES the encyclopedia of integer sequences. The DSL is already mostly there, only need a good way to "find skills." -- this way would then be transferrable
 
- futurisold
- mypy in tests is 3 times faster than normal python
- neuralhash: model by apple to match images, tried on arc to determine sample distances
- did not work out well
- maybe related to the need to upscale images when using neuralhash
- tried on rotations / crops / etc. but neuralhash did not work to see whether images are similar for arc
 
- looked into open-endedness: keep inventing new complex tasks as you solve them
- Kenneth Stanley’s POET is foundational work in this
- new paper by Stanley: Evolution through large models https://arxiv.org/abs/2206.08896
- looking into python library for genetic programming, see if we can adapt ice cuber dsl to it, and continuously evolve more interesting or more “common sense” riddles, maybe also evolve new functions
 
 
- Matteo (via text)
- I rewrote IceCuber solution into python (still needs to be debugged) to run search, record all steps in the graph and allow us to combine with other DSLs.
- I ran a depth 3 search in cpp over all functions and recorded the results (I can't control the order because of how optimised the code is, but I can control which functions are in the search). The results are in the github as a csv. I'll be adding the results to wiki pages when I get a chance, showing which combinations can lead to a solution.
- I'm in the middle of going through all the outputs in python and plotting how each step in the transformation looks. Will add the results to the wiki when I get a chance
- I tracked down derek larson and emailed him about collabing
 
- Mohamed Osman
- took week off
- idea: hyperlearner: cross-validation plus auto-parameter tuner
- opinion: we have to look for end-to-end differentiable ideas
 
- parapraxis
- explores use of language models to see how far we can get solving arc
- tried fine-tuning different models
- different gpt-models (openai) → costly
- work ok
 
- t5
- tended to make same-color board of dominant color
 
- bart
- not too bad
 
- gpt-j
 
- different gpt-models (openai) → costly
- important: how to format the input
- numbers, letters, with and without spaces
- seems like letters or numbers without spaces might be the best
 
- data augmentation seems to help
- rotations, etc.
 
- to try next: train on synthetic data from andreas and see whether that can solve any of the “real” arc samples
 
- sayhisam1
- is looking into gflownets
 
- XMaster96
- idea: fine-tune a multi-modal system
- MAGMA: pick pre-trained language model, add adapters and image encoders & train on image-captioning tasks
- adds multi-modal understanding to language model, hopefully also things like objectness
- works for the company that created MAGMA and that company has larger models
- has done experiments on the larger models, but doesn’t have clearance to talk yet
- spoilers:
- having the MAGMA additions makes a huge difference
- andreas’ synthetic data makes a huge difference
 
 
 
- idea: fine-tune a multi-modal system
14 Jul 2022 Overview of current efforts/working directions
- @AlexK looks for ways to learn “intuitive physics”
- @parapraxis is evaluating and finetuning existing NLP models for zero-shot reasoning on ARC
- @XMaster96 considers fine tuning a large image captioning models, maybe leveraging short video data
- @0x000FF4 is working on extracting features with an auto encoder to guide an evolutionary based program search
- @futurisold recently tried creating graphs based on board decompositions, he wants to evaluate the performance gain of using pypy when working with the ice-dsl. @GRey is documenting his train of thoughts solving riddles and he started a library to expose the native CPP icecuber dsl to Python
- @asteriskCat evaluates compression as an indirect distance/similarity metric between transformation-candidate outputs and ground truth or to generate addition input features of training pairs
- @Yannic Kilcher worked on completing the documentation of the human priors mentioned in François Chollet’s measure of intelligence paper
- @dmiles builds his own ARC DSL with feature extraction and transformations in prolog, for example works on riddles that contain superpixels (rectangular areas of same color)
- @Matteo looked into the icecuber cpp implementation and it’s limitation
- @Jammy3417 proposed solution strategies for riddles that were not solved with icecuber depth 4 brute force search: like using cellular automata (e.g. to solve maze/shortest path riddles)and mirroring with transparency for “masked inpainting riddles” (mostly symmetric patterns)
- @Awesome_Ruler_007 tried out some things with NLP and generative models, proposed a logo generated with DALLE-2
- @andreaskoepf created a first prototype of a synthetic dataset generator and ran all the icecuber search on all public ARC training and evaluation examples (documented as an addition column in the wiki of our arc repository) .
Wed 13 Jul 2022 MN
(notes taken by futurisold)
- review last week
- ideas log
- ff4 - autoencoder and maybe move to transformers
- andreas
- diffusion models on the boards?
- RL solution?
 
- dmiles
- his own DSL in prologue
- correlation: take the manually written solution and compare it against the feature set through the feature detector (was there any recoloring, etc.)
 
- jammy
- CA solution
- DSL lacking features
- CA has issues with superpixels
 
 
- CA solution
- asterisk
- use compression to computes a vector who returns an embedding (like a kernel, get access to the tools of LA) - in a NS we put a score to each branch => gets a score without any training
- augmentation
- pretraining
 
 
- use compression to computes a vector who returns an embedding (like a kernel, get access to the tools of LA) - in a NS we put a score to each branch => gets a score without any training
- alexk
- video prediction style?
 
 
- others
- did Chollet put some priors into the riddles that even he didn't know about?
- more discussion on the concerns about the space complexity and the limitations of handcrafting rules