SQL Related Topics - madBeavis/PimpMyAtv GitHub Wiki

Foreward

The MAD wiki says to use MariaDB, so please do such. MariaDB is a fork of mysql that shares many similarities, but there are some differences you may encounter once you dig deeper into things.

Interacting with the database through the command line is not for everyone. Some experienced folks may like it, but it is definitely not ideal for newcomers or the lazy folks like me.

This write up is not about specifics, but will point you in the right direction via some googling to get things running efficiently for you.

You should know which database server that you use to run MAD on, i.e. MariaDB. Your google searches will have to be adjusted accordingly. I will assume you have a running database that MAD can communicate with.

Setups

Most running MAD utilize a VPS for scanning. I use MAD locally on a laptop running Mint as an operating system and replicate data up to a VPS for maps. I don't worry about security on my laptop, I have big dogs and proficient with toys that go boom, so my scanning laptop is probably the safest MAD scanning machine.

For your database, you will want to have it setup on a SSD whether it be on a VPS or locally. Prefer a fast NVME over standard 2.5in SATA ssd. The classic sign of a slow database is seeing "MITM workers falling behind" in the MAD session log, which is a bit nebulous as it says one has a problem, but not much on how to resolve. Read the section at the bottom of this page for some pointers on how to remedy your setup.

Replication is an advanced topic that took me many months to get running in a reliable manner. There were many issues to address combined with me learning along the way. It is not something you want to do unless you can't avoid it. I do it only because I have crappy internet that won't allow a VPS based MAD setup with the amount of devices I wanted to run from different ip addresses for my MAD scanning machine. So I google, struggled, learned and now have a setup that works well.

Programs

I mainly use a Windows 10 laptop to manage my MAD laptop. I connect to the db on my MAD laptop with a program called HeidiSQL. Get it off their website. It works rather well and who doesn't appreciate someone that writes programs in Delphi.

Since I do some management on my MAD laptop, researched free options but settled on running HeidiSQL utilizing wine. Depending on your distro, wine may come with it. I didn't have much luck with old wine that came with my distro, so I had to upgrade to latest. Due to the variances in distros, you will have to google install latest wine ubuntu 20.04 (substitute your distro name accordingly).

For fancier queries utilizing joins, I have to use another program besides HeidiSQL as it doesn't do visual design for queries. I use Microsoft Access as I already own it. I setup the odbc connection, link the tables and start designing queries. There are other paid options such as NaviCat (I use the sqlite version and it is nice). If you don't own anything suitable, you can investigate openoffice or equivalent as alternatives.

Security

I am going to assume that you have run the mysql_secure_installation command properly. There are many things to consider, so do through investigation via googling.

I will also assume that you have looked into a firewall, whether it be iptables, ufw or and equivalent. UFW is easy to use and there is plenty of help on google for getting things setup properly.

Out of the box, MariaDB only allows connections from localhost. Get it setup by googling something along the lines of allow remote mysql connections. You will need to edit some config files and setup your username permissions properly.

Take your time reading how to harden your setup, you never know who will try to hack your VPS. Leaving your standard mysql port of 3306 open to the public while allowing the user 'root' access from any ip would not be a good idea. You can setup users in mysql to only work from certain ip addresses, which is fine if you are accessing your vps from a static ip address at home or work. For a dirty workaround on using a floating ip is as follows: normally use ufw to block 3306, disable it to do your edits and then re-enable ufw.

Using HeidiSQL examples

Being a GUI, things are simpler to accomplish than through the command line. Do take into consideration that HeidiSQL will allow you to do some stupid and tragic things that will cause harm to your database, such as dropping tables with just a few clicks. I will assume that you have researched something before typing them by hand in the MySQL console. Look at what you are intending to do and subsequently click prior to doing it.

Let us start with a simple example. Say you want to clear quests for a rescan after an event starts. One will assume that your walkers are setup to handle rescan at this time, i.e. the end time for quests isn't something like 0600. Stop MAD. Navigate to the trs_quest table, right click on it and choose "Empty Table(s)" to delete all the records in it. I also do the same for trs_status table, which will force MAD to go through the walkers. Restart MAD and you get the new quests.

You will still have to write some sql for MAD tasks, but things are simpler. Say you want to clean up your pokemon table and delete all pokemon older than a day. For those new to sql, I would recommend using a select query first to inspect the data and then deleting from the grid even though it will be slower than a straight delete query. It makes sense to build up to delete queries, with select first you can view the data and make sure the data is indeed what you want to delete prior to committing to the deleting of data.

Make sure you are on the correct database in the gui and switch to the "Query" tab. Start by typing "delete from pokemon where pokemon". Next type a period ("."). After a pause and along the same lines as intellisense in visual studio, HeidiSQL will popup a list of the fields in the pokemon table (you can also make the list show by using ctrl-spacebar). Start typing "disappear_time" and it will narrow down options and press enter to complete it. Type " < now", press ctrl-spacebar and it will show the correct form of the command, i.e. "now()". You can also consider using the date_sub or date_add function to adjust for your reference to utc. Since I use replication (too many deletes at a time clogs it up) or for just viewing a few records I use the "limit" command. So type " limit 200000;", which will only delete 200000 records. So you should end up with the command DELETE FROM pokemon WHERE pokemon.disappear_time < NOW() LIMIT 200000 (adjust now with a offset to compensate for your timezone). To execute it, right click on the query window and choose "Run". Assuming all goes well you should end up with something along the lines of "/* Affected rows: 200000 Found rows: 0 Warnings: 0 Duration for 1 query: 3.720 sec. */" in the logs. If you had a fat finger and caused an invalid query, you will get an alert box that may or may not help you diagnose where the error is. I repeat until affected records gets to zero.

Backing up your database

Always backup your database, you don't want to lose all your hard work. Monster spawns, raids and quests are transient by nature and will be found again, but all your settings and spawnpoint information are important, so do backups unless you like starting from scratch. Google backup mysql database using mysqldump and find one you like. linuxize appears to have a good set of example usages. You can do manual backups or utilize cron jobs to help automate the tasks, again, google is your friend. HeidiSQL does offer the ability to export the database, but I have not had much luck doing full db restores with it, so stick with mysqldump for that task.

Optimizing your database

I can't cover all the issues of how it happens (google!), but after large deletes or if queries and/or inserts have slowed down (for instance due to indexes), you should optimize your database. Google optimize mysql database mysqlcheck to find examples on how to accomplish it. You can also use HeidiSQL to do it, from the menu "Tools -> Maintenance" and then select the db or tables you want to optimize. Either way, probably best to shutdown your MAD instance(s) while you do it to avoid mitm backups.

MITM falling behind

MAD is simply taking in more data than can be written to the database. I don't see this problem as my scanning is not done in a data dense area and my system is on real hardware with a quality NVME ssd and has plenty of ram. VM users, VPS users and those with spinning drives are the usual candidates asking for help, so I compiled this guide for reference. The following is collected from discord and the web to at least provide a more cohesive source of data than searching discord or hoping one of the MAD wizards is online when you need help. I am no database expert, but I can google enough to get things running well enough to not have issues and are willing to collect the information so that others can benefit.

Have you rebooted your server or restarted MAD lately? Even my kids and wife generally know to not ask for tech support if they haven't rebooted their device... Does the problem start immediately after reboot? Does it manifest after a certain amount of time?

The first question you might be asked is do you have Redis up and running. Redis acts as a cache between MAD and the database to minimize writes to the database. Basically, MAD asks redis if the data it wants to write is in the cache, if so it doesn't do the write to the db and if it doesn't exist, the write data command gets issued. If you do not have redis running the process is very simple. So begin with this post to get it going. Read about how to set memory and key retention policies in this post - as always, don't mess with stuff if you don't know the ramifications and consequences. Once setup and running, you can verify that redis is indeed caching data by issuing the following command in terminal redis-cli monitor and watching the loads of caching goodness scroll by at a rapid pace.

The second question you will probably be asked on Discord is "when was the last time you cleaned your pokemon table". Without maintenance, this table invariably collects a lot of data. It will eventually slow down your system if left to its own devices. So delete out what you don't need/want to keep and optimize your database afterwards. Optimize using HeidiSQL or from the terminal using mysqlcheck -o --all-databases -u username -p after stopping MAD.

Another question you should answer affirmatively is have you updated mad recently. In June of 2021, some work was done to eliminate some connection issues that slowed down inserts. Don't sweat the details and keep looking at shiny things that don't help your setup run well, just keep MAD up to date.

With the introduction of nearby mons in June of 2021, more stats data is being written. There appears to be one record for each mon seen, so it has the potential to bog down your database in the same manner as the pokemon table. So keep this one cleaned also. This table is called 'trs_stats_detect_seen_type'.

If you are running raw stats, turn it off and see if the workers still keep falling behind. Try disabling regular stats. See if the problem goes away and adjust accordingly. If you do decide to run stats, you should index the tables as outlined on discord. But don't go around indexing random data unless you are capable of determining the consequences of doing such.

RocketMAD has some helper functionality that you can enable to clean up your database. Your call on whether to utilize or not.

Make sure you have an SSD on your scanning machine (real, vm or vps), not a spinning drive. A quick and dirty benchmark can be found here. After the run, do delete the test file generated as it is 1gig in size. For reference, my cheap VPS used for maps does ~500mb/sec, my NVME WD Black on my scanning machine did ~1.5GB/sec and my new mini computer for scanning does ~1.4GB/sec.

If you have a shitty cheap vps, consider upgrading. A small one may suffice when you start, but consider upgraded processors and ram as your setup increases in size.

There are a couple options you can change in MAD config file, mitmreceiver_data_workers and db_poolsize, with both defaulting to 2. Per snoopy, set your mitmreceiver_data_workers to your count of devices. But of course there is some discussion if this is optimal, as all devices won't likely be writing data at the same time. The rule of thumb for db_poolsize used to be one less than data worker, but there has been discussion for an ideal value, so play with it. For both of the prior values in config, experiment and benchmark. You can also uncomment mitm_ignore_pre_boot to have mad discard data they have prior to the start of MAD, in case your MITM gets backed up right away.

The simplest way to tune your SQL is to run MySQLTuner script, which can be found on their github. Run your db for a few days, run the script, adjust settings, restart the db and repeat. It is designed to analyze typical db workloads, not MAD workloads, but is a reasonable option for the newcomer to db tuning.

MySQL makes a tool to hook into your db and do some monitoring. It is called MySQL Workbench. Download it and do some googling on how to interpret the data. However, it doesn't give many optimization tips or suggestions, just data. Look under 'Performance -> Dashboard' for the information.

Google and learn how to tune your db. Backup you config file (most likely /etc/mysql/my.cnf, but does depend on your distro) file before changing values. A few links on basic tuning are given below. I don't know crap, but tuning your db is an advanced subject that you can really bugger up your database with if you just start changing values without finding out the reasoning, implications and ramifications of said changes. Proceed with caution, don't make wholesale changes and record what you changed so you can revert if things go wonky.

If you are the "got to tinker with everything they touch" then come asking for help, please stop. Go find a 12 step program for such. Just quit it and get things running well enough - you can always revisit the tuning topic later to satisfy your urges once your skills evolve.

There are a few values that you can readily change without minimal downside if you choose sane values. innodb_io_capacity and innodb_io_capacity_max are set very low for SSD drives as outlined in the following Percona article. Try bumping them up a it, say 1000 and 4000 to start.

Another value that can be changed to a more reasonable value is innodb_buffer_pool_size. A simplistic description of the function of the value is that you can cache records in memory as opposed to requiring read/write for data on the disk drive, which obviously speeds things up. By reading the MySQL documentation, it is set to a very small default value. While you are reading the documentation, visit the page of how MySQL uses memory. The rules of thumb outlined seem reasonable, there is no point of assigning 75% of RAM to be used for the buffer pool if you never let your db get bigger than 2gb like I do, as I don't keep more than a few weeks of spawns and don't give a damn about stats beyond is the scanner working properly. Be sure to get a VPS or machine with enough RAM to be useful, i.e. don't cheap out with a stupid small 2gb machine if you are serious about doing things right with a fair amount of devices. Do some googling, there are many posts on how to size the value for your use case, such as the first answer on this page. Another article on determining the value is here.

Another value to consider modifying is skip-name-resolve, read more here.