Modeling - liniribeiro/machine_learning GitHub Wiki

Model centric AI development x Data centric AI development

Make sure that you are feeding your model a quality data Its not just collect data, but use tools to improve the data in the best efficient way.

AI systems = Code(algorithm/model) + Data

There are many project that the model does well and we must improve the data to have an accurate prediction

Screenshot 2024-09-20 at 11 43 44

Is important to make good choices while interacting in the loop of development, in how modify the data, model or the hyperparameters.

When building a model there are 3 key milestones that most projects should try to accomplish:

Just doing well on the test sets isn't enough for many applications

Ways to establish baseline:

Baselines helps to indicate what might be possible.

Literature search to see what's possible (courses, blogs, open-source projects)
Find open -source implementation if available
A reasonable algorithm with good data will often outperform a great algorithm with no so good data.

Error analysis and performance auditing

Examine and tag examples -> propose tags Visual inspection example:

Product recommendations example:

Useful metrics for tags:

Segment the data into different categories and use questions like the above to try to decide what prioritize to work on.

Decide on most important categories to work on based on:

Adding/improving data for specific categories

For categories that you want to prioritize:

Combining precision + recall is one metric used for many applications, calculating F1 to know what model has best performance.

F1 also can help you to identify were to work on.

Performance auditing before push model to production can help to catch problems.

Auditing framework

Check for fairness, accuracy and other problems.

Brainstorm the ways the system might go wrong
- Performance on subset of data
- how common are the errors
- Performance on rare classes
Establish metrics to assess performance against there issues on appropriate slices of data
Get business/product owner buy-in.

ps: tensorflow model analysis