When MySQL is Slow: Replacing MySQL with PrestoDB 0.246

By Stephen Krings

Former Security Operations Engineer at Cloudflare, Inc.

github.com/kringz
twitter.com/stephenkrings
email: stephen <at> gigashock *dot* com


Intro


I found it fairly straightforward to move data from MySQL to PrestoDB. PrestoDB is excellent for read-only queries with large datasets. PrestoDB differs from Starburst Presto. If you're using Starburst Presto, see this post.


Why in the world do I need PrestoDB?


The truth is: I probably don’t need to use a query engine this powerful. PrestoDB is for clients that have terabytes of data with billions of rows distributed across hundreds of servers.


This project is quite small in comparison; the database I'm using is only ~11 gigabytes with ~18 million rows and is stored on one server.


The dateset is, however, large enough to make my out-of-the-box MySQL database choke under the pressure of a "LIKE '%value%'" query. Running these queries in MySQL takes about 3x longer compared to PrestoDB and fully consumes the capacity of my server's hardware in the meantime.


PrestoDB quickly handles the same "LIKE '%value%'" query out-of-the-box and with ease, on the same hardware I am using with MySQL.


Tutorial

This tutorial will provide instructions for installing and
configuring the following software on Ubuntu 20.04.01. Recommended:
PrestoDB 0.246
Minio Python3 s3cmd-2.0.1 openjdk-11-jre-headless

Optional: vim screen

This tutorial assumes the existence of a MySQL database on the same system.

$ sudo apt update 

Install Python3:

$ sudo apt install python3

$ sudo apt-get install python-dateutil

$ sudo ln -s /usr/bin/python3 /usr/bin/python


I'm using vim as a text editor:

$ sudo apt-get install vim


Install screen to make things easier:

$ sudo apt-get install screen


Install Java:

$ sudo apt install openjdk-11-jre-headless


$ mkdir ~/Desktop/source
$ cd ~/Desktop/source


Download and Install Minio

$ wget https://dl.min.io/server/minio/release/linux-amd64/archive/minio_0.0.20210116021944_amd64.deb
$ sudo dpkg -i minio_0.0.20210116021944_amd64.deb
$ sudo mkdir /mnt/data
$ sudo chown -R user /mnt/data && sudo chmod u+rxw /mnt/data
Note: 'user' is my username in Linux.


Edit ~/.bashrc

$ vim ~/.bashrc

Add the following lines:

MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin
$ source ~/.bashrc


Run Screen
$ screen

Press ENTER


Run Minio
$ minio server /mnt/data


Detach your screen (ctrl + a + d)


Download and Configure s3cmd:

$ wget https://sourceforge.net/projects/s3tools/files/s3cmd/2.0.1/s3cmd-2.0.1.tar.gz

$ tar xzf s3cmd-2.0.1.tar.gz
$ touch ~/.s3cfg
$ vim ~/.s3cfg
# Setup endpoint
host_base = localhost:9000
host_bucket = localhost:9000
bucket_location = us-east-1
use_https = False
# access keys
access_key = minioadmin
secret_key = minioadmin
# enable s3 signature apis
signature_v2 = False
$ cd s3cmd-2.0.1
Create your bucket:
$ ./s3cmd mb s3://data
View your buckets:
$ ./s3cmd ls s3://
$ cd ~/Desktop/source/
Download the PrestoDB Server:
$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.246/presto-server-0.246.tar.gz
Download the PrestoDB CLI:
$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar
$ sudo chmod +x presto-cli-0.246-executable.jar
Unpack Presto:
$ tar xzf presto-server-0.246.tar.gz
Changing the filename here to simplify things:
$ mv presto-server-0.246 presto
$ cd presto 
$ mkdir etc
$ cd etc
$ mkdir catalog
Create config.properties:
$ touch config.properties
$ vim config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=3GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
Create jvm.config:
$ touch jvm.config
$ vim jvm.config
-server
-Xmx5G
-XX:+UseG1GC
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-Djdk.attach.allowAttachSelf=true
Create node.properties:
$ touch node.properties
$ vim node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/mnt/data
Create minio.properties:
$ cd catalog
$ touch minio.properties
$ vim minio.properties
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://data/
hive.s3.aws-access-key=minioadmin
hive.s3.aws-secret-key=minioadmin
hive.s3.endpoint=http://localhost:9000
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.allow-drop-table=true

Create your MySQL config for Presto:
Use your MySQL credentials here.
$ touch mysql.properties
$ vim mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://127.0.0.1/?useSSL=false&allowPublicKeyRetrieval=true
connection-user=stephen
connection-password=pa55w0rd!awaeqewqeqweqwAAA

Open another screen:
$ screen
$ ~/Desktop/source/presto/bin/launcher run
Wait until you see ==== SERVER STARTED ====

Detach your screen (ctrl + a + d)
$ cd ~/Desktop/source
Run the presto CLI:
$ ./presto-cli-0.246-executable.jar
Create a schema in presto:
presto> create schema minio.books;
presto> use minio.books;

Import your MySQL table into Presto:
presto:books> create table minio.books.list as select * from mysql.books.book;
This will copy the MySQL table into a Presto table.

Test
presto:books> select count(*) from list;

Benchmarks

PrestoDB is 2.5x faster than MySQL in this test.

MySQL (27.80 seconds):
presto> select count(*) from book where BINARY title like '%Red%';

PrestoDB (11 seconds):
presto> select count(*) from list where title like '%Red%';

Possible Errors

Error:
presto> create table minio.books.author as select * from mysql.books.author;

Query 20210204_070906_00000_7d4wf failed: Unsupported Hive type: varchar(16777215). Supported VARCHAR types: VARCHAR(<=65535), VARCHAR.
Solution:
This means a column in your MySQL table is either 1) a datatype that is not supported by presto or 2) exceeds the maximum character count. I changed each of the columns in MySQL to VARCHAR.

____________

Error:
2021-02-03T01:48:57.973+0900 ERROR main com.facebook.presto.server.PrestoServer No factory for connector mysql java.lang.IllegalArgumentException: No factory for connector mysql at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
Solution:
This means you are missing the MySQL plugin located here:

Try deleting the ~/Desktop/source/presto folder and re-unpacking presto-server-0.246.tar.gz (tar xzf presto-server-0.246.tar.gz). Feel free to save the ~/Desktop/source/presto/etc folder so you don't have to re-create your configuration files.

__________

Error:

Exception in thread "main" java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object)'
Your issue is the incompatibility between the guava version and other binary versions you are using.
$ mv $HIVE_HOME/lib/guava-19.0.jar $HIVE_HOME
/lib/guava-19.0.jar.bak
$ cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/