Google IO & AlloyDB
Database
41
Male avatar

hicder viết ngày 13/05/2022

Bài này mình viết ở Viet Tech, xin đăng lại ở đây.

Google IO & AlloyDB

Google IO vừa mới diễn ra. Vì là iFan cứng nên mình không quan tâm đến nó lắm, trừ việc trong sự kiện này Google giới thiệu AlloyDB. Thế thì AlloyDB khác gì các transactional SQL database khác, như Spanner (cũng từ Google), CockroachDB, Yugabyte, Postgres và MySQL. AWS cũng có cả Aurora, nhưng mình chưa tìm hiểu về nó lắm.

Ở đây mình chỉ so sánh những transactional database, cụ thể là OLTP SQL, với nhau. Có rất nhiều loại database (OLTP vs OLAP vs HTAP). Và còn có SQL vs NoSQL nữa, nhưng chắc để topic này sau. Mình cũng xin phép chém tiếng Anh từ đây, tại viết tiếng Việt viết mãi không tìm được những từ chuyên môn :(

Traditional SQL DBMS model

For the longest time, SQL database stays mostly in 1 single server. Meaning, all tables of a database have to live in the same physical node. All the query processing and storage stay in that node. A lot of databases are built this way (even though some of them have evolved): OracleDB, Postgres, MySQL, SQL Server etc.

If developers want to split their data in multiple nodes, they need to handle it in the application layers. Meaning developers will need to be aware that their data lives in multiple "tables": "user_1", "user_2", ..., "users_10". If they want to count number of "users", they would have to construct 10 SQL queries, get the results and sum them up.

To be more specific, if you want:

SELECT COUNT(*) FROM user;

you need to manually build and execute all these queries:

SELECT COUNT(*) FROM user_1;
SELECT COUNT(*) FROM user_2;
...
SELECT COUNT(*) FROM user_10;

And then sum the results up. You can imagine this is a nightmare for more complicated queries with nested clauses. Also, cross-shard transactions are pretty much impossible.

Then comes Vitess

Because of this reason, folks from Youtube developed Vitess. Basically, it acts as a single MySQL database in front of all other MySQL databases, and will handle the distributed queries. Meaning, they will do the "construct 10 SQL queries, get the results and sum them up" for you. App developers no longer need to be aware of the distributed nature of their database, and can think of their database as a single server. All you need now is a single SQL query!

However, Vitess doesn't solve the "transactional" aspect of the database very well. They even discourage it. Recently, I talked with the head of databases at a large payment company that uses Vitess, and they actually disallow cross shard transaction, and only allow Vitess to act as a distributed read database. Transactional write (meaning INSERT) needs to be single shard.

Then comes Spanner and the likes

I'll group Spanner and its similar counterparts (CockroachDB and Yugabyte) in the same group. Essentially, they are full-fledged distributed SQL database. Meaning they will handle both read and cross-shard transactional writes for you, and you don't need to worry about the sharding aspects of it. Under the hood, they split the databases into query engine and storage engine. The storage engine uses a consensus algorithm as their replication mechanism (I believe Spanner uses Paxos or its vairant, and CockroachDB and Yugabyte use Raft). The query engine will translate the SQL query into multiple stages, some stage lives in the storage engine, and some lives in the query engine. On a high level, the query engine is aware of the distributed nature of the storage engine, and will construct the SQL plan to optimize for this scenario.

These also handle distributed transactions for you. They use a special clock to do that (in Spanner they call it TrueTime, using a special hardware. This is what you get for having a ton of money). In Yugabyte and Cockroach they use Hybrid Logical Clock, a software-based solution. What this means is, these databases can handle both distributed read and write transparently, and this is very nice for developers.

The downside: Spanner is expensive (according to some folks who have considered it) and GCP only. I don't know how well Cockroach and Yugabyte do, nor how expensive they are.

Which brings US AlloyDB

So what is AlloyDB? Think of AlloyDB as a single-server Postgres, with a ridiculously large "local" disk. I put local in quotation mark because it's not actually local, but network block device. However, from the point of view of the database, it doesn't know that. All it knows is that it has a disk, and it acts as a traditional SQL database with local disk, even though this "local" disk is actually on Google FS, a highly distributed storage engine, sharded and replicated in different places. So the main architectural difference between Spanner and AlloyDB is whether the query layer is aware of the distributed nature of the storage layer.

So what's exactly is the difference? Say, you want to run this query:

SELECT * FROM foo WHERE col=1;

In a database with locally-attached storage, it will push the col=1 filtering condition into the storage engine, and the query engine will only get back the rows where col = 1. In AlloyDB, because it doesn't know the distributed nature of the storage, it will just read the data block back to the query engine, and do the filtering there. There are a lot more data transfer this way.

Of course, the details on AlloyDB is very sparse, so my understanding might be wrong here. If someone in Viet Tech is from AlloyDB team and finds it inaccurate, I'm more than happy to be corrected. Let me know.

So why don't companies just use X (the latest and greatest)

Often times, transactional DB is the source of truth for your data. Migrating database while it's running is ridiculously ridiculously ridiculously difficult. Something goes wrong and you have corrupted data. So companies would rather pay a lot more money to the traditional vendor (cough Oracle cough) than to migrate.

Bình luận


White
{{ comment.user.name }}
Bỏ hay Hay
{{comment.like_count}}
Male avatar
{{ comment_error }}
Hủy
   

Hiển thị thử

Chỉnh sửa

Male avatar

hicder

1 bài viết.
5 người follow
Kipalog
{{userFollowed ? 'Following' : 'Follow'}}
Bài viết liên quan
White
2 0
Kotlin giờ là một từ khóa làm điên đảo cả giới lập trình viên đặc biệt là lập trình viên android. Kotlin hiện nay rất phổ biến và cực kì mạnh mẽ, c...
Aragami1408 viết hơn 3 năm trước
2 0
White
16 1
Chào các bạn, trong chuỗi nghiên cứu của mình về Blockchain, có một bài viết khá thú vị về Database mà mình nghĩ nên chia sẽ ở đây, mong nhận đc nh...
Phuoc-Thanh Do viết 3 năm trước
16 1
White
16 1
Hành trình đuổi bắt giáo sư Moriarty của thế giới bottleneck: database. Nếu bài viết trước bạn đã được đóng vai chàng thám tử Sherlock Holmes và c...
Minh Monmen viết hơn 1 năm trước
16 1
{{like_count}}

kipalog

{{ comment_count }}

bình luận

{{liked ? "Đã kipalog" : "Kipalog"}}


Male avatar
{{userFollowed ? 'Following' : 'Follow'}}
1 bài viết.
5 người follow

 Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!