Friday, February 3. 2012
I do not yet have all of the hardware and networking gear set up on my network; so this is merely a description of what I'm looking to do, to gather some ideas and feedback, and then figure out how to proceed. If, after reading the details below, you want to join, or have suggestions, please let me know!
History / Ideas
I've been thinking about starting a "nerd net" for quite some time. I have many friends that use a Linux/*BSD machine at their border, and typically have some kind of services running inside the network. I'd like to link these networks together and share access to services. This opens up all kinds of possibilities...
To avoid a single point of failure, and saturation of any one network's bandwidth, we would avoid linking in a hub-and-spoke fashion. Instead, I propose that we maintain a list of active nodes (essentially, each person's gateway box) and try to maintain at least 3 active VPN connections at a time. In other words, each node on the network would have a VPN tunnel to at least 3 other nodes. It would be very useful to have control of a DNS zone for maintaining this list. Each node would have it's own A record; say, mynode.domain.com. Each time a node wants to connect to the network, it would request the A record for something like connect.domain.com, which would hand out A records of each of the registered nodes in a round-robin fashion.
Authentication will be central in this; so some knowledge of SSL will be beneficial. I, and possibly a few key others, would maintain access to a CA signing key and the nodes would be authenticated to the network via certificates signed by that CA key. Any node that you attempt to connect to should trust you based on that certificate; and, based on the certificate presented, you should trust any connection attempt with a valid, signed certificate. I, and possibly others, can assist with any certificate-based configuration issues that you might have.
To allow for a private network that's going to be potentially changing topology on a regular basis, we'll need a routing protocol, such as RIPv2. For simplicity, each network would receive it's own /24 of RFC1918 space, with the gateway box running the VPN software being the "node" on the network. Most likely, you'd want to set up split-tunneling on the gateway box so that any requests going to the private network route over the VPN and the rest of your traffic (web surfing, email, etc) goes out your normal internet connection.
With each node on the network being connected to at least 3 other nodes at any given time, that requires us to keep track of the various routes from one network to another. Instead of trying to keep track of this by hand, we could easily set up RIPv2 and announce the routes we "know". I imagine using netblocks in the 172.16.0.0/12 range, with each network having a /24 and being multi-homed (connected to 3+ other nodes), there could potentially be several routes from one network to the other. A light-weight, distance-vector routing protocol like RIPv2 seems to be a good fit; open-source implementations, simple, and proven. The route to any network from yours would be the one with the least intermittent hops.
Another, more complex, possibility would be using OSPF and OpenBSD's open-source implementation. Given that each node will have different bandwidth, this may be a good idea to try.
Services and other ideas
This would be the whole reason for the network!
I would be interested in providing several services to the network, but not publicly. For one, a Linux- or FreeBSD-based shell server with access to the internet, compilers/development tools, documentation, email (anyone remember pine+procmail?), etc. A (small-ish) public web space to let people know that you're part of the network; something like Apache's mod_userdir. A blog application accessible only from within the nerd-net.
I would also maintain the internal network's intranet site. This could be a site used to post updates of system maintenance, new services being offered/tested, and a way to maintain an up-to-date list of all of the nodes.
A private IRC server is definitely on the TODO list. Any and all bots would be allowed; if anyone would be interested in linking the IRC daemons, I'll likely be using the Blackened or UltimateIRCd.
I could provide SMTP services for the network. If we find a zone to use, I can provide email services for that zone. Technically, unlimited user@zone email addresses. I would also be willing to host DNS services for the internal network; providing dynamic DNS and a "view" for those coming in via the network.
Another service I'd consider offering would be your own PostgreSQL database for development and testing. A big part of what I envision for this network is that it'll be a big collection of computer-savvy geeks; what better place to deploy and test code than on a network inhabited solely by those with the clue to help you in debugging, vuln testing, etc.
Other possibilities include shared-CPU time using tools like distcc(1). Maybe remote storage (NFS/iSCSI)?
The benefits of a private network are many, and extend beyond just the sharing of services. This is a chance to build a real-world, potentially large-scale network with changing topology. Any member wishing to add/configure new services or features is welcomed to; and the services could be advertised/listed on the internal network's intranet site.
Other benefits include the ability to policy-route certain traffic. For instance, I would be interested in routing my DNS traffic over the VPN to be routed out someone else's connection; it's low traffic, but my ISP mangles my DNS traffic to route to their servers regardless of what server I point to. Such configuration could even be set up where my DNS traffic goes out through a different VPN's node each time (i.e., a type of load-balancing).
As the network uses a VPN to set up, exactly what it is, a Virtual Private Network, we should seek to keep the network private. Configuring an Apache reverse-proxy, or some type of port-forward from your external IP address into the network would be frowned upon. Any need to create such accesses for the outside should be discussed with the entire group, and hopefully some sort of consensus reached.
It would also be the responsibility of each node's owner to keep the machine secure and up-to-date. Any breach of one of the nodes would lead to an open route to the network.
I will create the CA key used to sign all certificates used to access the VPN; but, as I do not wish to be the sole decider in who is allowed on the network and who is not, I plan to create a handful of sub-CA certificates to be distributed to trusted associates who may also sign certificates for potential users. If you've been given a sub-CA certificate, you're being trusted to know who you're letting into our private little clubhouse. :) Using multiple sub-CA certificates will also allow for anyone to validate who provided access for this particular user/network; and the ability to revoke access as necessary (hoping that it wouldn't ever become necessary).
A "nerd net" would be a fun project to gather a bunch of us geeks together and share services. This is the first time I'm really throwing the idea out there (beyond mentioning it to a few people here and there), so if anyone else thinks it would be fun, get ahold of me!
Friday, September 16. 2011
PostgreSQL 9.1 Released
Among the many new features, here's a snippet from the News page:
Advancing the State of the Art
Our community of contributors innovates with cutting-edge features. Version 9.1 includes several which are new to the database industry, such as:
The SE-Postgres looks to be particularly interesting. It allows you to use SELinux Mandatory Access Controls on the PostgreSQL users and data. Neat. Particularly the SECURITY LABEL.
Beware, however, that there are some incompatibilities with previous releases!
From the 9.1 Release Notes:
By default, backslashes are now ordinary characters in string literals, not escape characters. This change removes a long-standing incompatibility with the SQL standard. escape_string_warning has produced warnings about this usage for years. E'' strings are the proper way to embed backslash escapes in strings and are unaffected by this change.
Also from the Release Notes, it mentions the addition of synchronous replication.
PostgreSQL streaming replication is asynchronous by default. If the primary server crashes then some transactions that were committed may not have been replicated to the standby server, causing data loss. The amount of data loss is proportional to the replication delay at the time of failover.
Synchronous replication offers the ability to confirm that all changes made by a transaction have been transferred to one synchronous standby server. This extends the standard level of durability offered by a transaction commit. This level of protection is referred to as 2-safe replication in computer science theory.
When requesting synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the transaction log on disk of both the primary and standby server. The only possibility that data can be lost is if both the primary and the standby suffer crashes at the same time. [...] ...it also necessarily increases the response time for the requesting transaction. The minimum wait time is the roundtrip time between primary to standby.
There's also a What's New in PostgreSQL 9.1 wiki page that explains much of these new features in detail.
Friday, August 21. 2009
DOJ Approves $7.4B Oracle-Sun Deal
Oracle on Thursday said the U.S. Department of Justice (DOJ) has approved its $7.4 billion acquisition of Sun Microsystems, although the deal is subject to certain conditions and still needs the blessing of European regulators.
Oracle first announced its bid in April and Sun shareholders approved the acquisition on July 16.
The combined company will give Oracle an array of new assets, including a stake in the computer hardware market, the open-source MySQL database and stewardship of the Java programming language.
Oracle will undoubtedly cut a large portion of lesser-performing sectors of the company. I'm afraid that this might be the death-knell for SPARC-based processors; including the Niagara and UltraSPARC T-2.
Sun certainly has it right with these processors; they boast very low power consumption and up to 64-way SMT on 8 cores per chip. Compare that to your 4-way SMT AMD64 Phenoms and the like.
I don't forsee [Open]Solaris going anywhere anytime soon. Solaris has long been the platform of choice for large Oracle installations, and I see the Solaris+Java combination as being the crown jewels to Oracle. Oracle has embraced open-source to a pretty fair degree thus far, so I see no reason that they would try to close OpenSolaris or anything similar.
I could honestly not care less what becomes of MySQL. It's been a sub-standard RDBMS from the very get-go. PostgreSQL serves just fine for single-database solutions; and I'd recommend Oracle RAC for clustered/multi-master replication scenarios.
Sunday, June 14. 2009
Monday, June 23. 2008
pl/pgSQL Programming Guide
With PL/pgSQL you can group a block of computation and a series of queries inside the database server, thus having the power of a procedural language and the ease of use of SQL, but with considerable savings of client/server communication overhead.
This can result in a considerable performance increase as compared to an application that does not use stored functions.
Also, with PL/pgSQL you can use all the data types, operators and functions of SQL.
This is a link to the PostgreSQL 8.3 documentation for the pl/pgSQL procedural programming language. You can greatly speed up application performance by moving much of the decision-making to the database.
Wednesday, June 18. 2008
IBM May Open Source DB2
IBM is positive about the possibility of bringing out its DB2 database management software under an open source license.
While the computing giant has no immediate plans to open source DB2, market conditions may make it unavoidable, according to Chris Livesey, IBM's U.K. director of information management software.
"We have a light version of the product offered for free, which is a step towards exposing our core (DB2) technology," said Livesey. "Looking at IBM's heritage in contributing to the open source market, we've been particularly keen to lead that market. Open source is an interesting space, as a whole. As the future unfolds, and the economics become clearer, there's going to be more commitment to open source by everybody. We've made good steps towards that."
While this is speculation at this point, it would be nice to see an open-source DB2. I expect some pushback from the financial industry (banks are almost exclusively IBM hardware/OS/database setups), but everyone else should benefit from this; including IBM. I'd like to see the replication code from DB2 make it's way into PostgreSQL, or eventually just switch to an open-source DB2.
Does anyone remember when "open source" became a verb?
Tuesday, April 15. 2008
GreenSQL: Open-Source Database Firewall Solution
GreenSQL is an Open Source database firewall used to protect databases from SQL injection attacks. GreenSQL works in a proxy mode and has built in support for MySQL. The logic is based on evaluation of SQL commands using a risk scoring matrix as well as blocking known db administrative commands (DROP, CREATE, etc).
This looks like a pretty neat tool. Unfortunately, it's MySQL only. As it is GPL, I'm going to see how difficult it might be to recode this for PostgreSQL; or at least add PgSQL support in addition to the existing MySQL support.
They even put their money where their mouth is. There's a SQL Injection Test Page where you can attempt to circumvent authentication via SQL injection. I only tried a few different values, but it caught everything I threw at it.
Wednesday, March 26. 2008
IBM Takes Stake in Open Source Database Vendor
For more than a decade, PostgreSQL has been a cornerstone of the open source database market. In recent years, EnterpriseDB has emerged as a leading vendor supporting and driving PostgreSQL forward.
It will now continue its efforts thanks in part from a little help from IBM. EnterpriseDB is also re-branding and expanding its PostgreSQL efforts to take even more direct aim at its rival MySQL on the open source side and Oracle on the proprietary side.
IBM is joining EnterpriseDB's C round of venture financing which in total raises $10 million for the open source database vendor. Andy Astor, CEO of EnterpriseDB, noted that IBM's was one of four groups participating (including Charles River Ventures, Fidelity Ventures, Valhalla Partners) in this current financing round. To date, EnterpriseDB has raised $37.5 million in venture financing.
I'm certainly happy to see corporate interest in PostgreSQL. PostgreSQL has been my RDBMS of choice since the 6.x days, but I'm no stranger to MySQL. MySQL, far inferior to PostgreSQL (in my opinion), gets all of the money and corporate attention, and PostgreSQL gets jack.
Astor told InternetNews.com that IBM called EnterpriseDB and the discussions went from there. Astor added that the discussions with IBM pre-dated the $1 billion acquisition of MySQL by Sun earlier this year.
It sure sounds fishy, but I believe them. It's entirely likely that both Sun and IBM were looking to take stake in the open source RDBMS world at around the same time.
Astor argued that EnterpriseDB has emerged as a leading direct competitor to Oracle's proprietary database and IBM wants a piece. IBM of course has its own proprietary database in DB2, which also competes against both Oracle and PostgreSQL.
This confuses me. I can see how Sun might be willing to "take sides" with MySQL (in the sense of aligning with an open source database) as they do not develop/sell database software. They do have a cozy ISV relationship with Oracle, but after all, they're merely using it to sell Sun systems and software. Here we have IBM giving money for development of PostgreSQL when they have their own established, enterprise-grade RDBMS.
Not that I'm complaining, of course.
Postgres Plus takes the core PostgreSQL database and bundles in additional components to make it easier to install and deploy. Postgres Plus Advanced Server adds in additional closed source Oracle compatibility extensions to PostgreSQL.
It appears that a lot of these add-ons are just pre-packaged third-party solutions; most are open-source. Here's a quick list of features:
Postgres Studio ("built on" the pgAdmin utility)
Geospatial functionality (looks to be the add-ons right out of $PG_SRC/contrib)
GridSQL Parallel Query
Slony-I replication (single-writer/many-readers)
The one thing to complain about: it doesn't appear that any of these enhancements by EnterpriseDB will make their way back upstream into the open source, community version of PostgreSQL. If it does, there is little to no information stating as such.
Monday, September 10. 2007
Greenplum Database is the first open source powered database software that can scale to support multi-terabyte data warehousing demands. Greenplum Database allows organizations to analyze vast amounts of business data 10 to 100 times faster than traditional data warehouse solutions at a fraction of the cost.
Greenplum Database's fundamental breakthrough is its ability to store and process terabytes of data using clusters of low-cost servers. Greenplum Database moves processing power as close as possible to the data, so processing always occurs in parallel, delivering a dramatic boost in query and load performance. In addition, Greenplum Database's Dynamic Provisioning technology makes it easy to add incremental data warehouse capacity when needed, avoiding costly appliance upgrades.
It looks like there might finally be a decent option for PostgreSQL scalability. I have not personally used this software, yet, but I will be installing it shortly. I intend to do some benchmarks versus a stand-alone PostgreSQL database and see how the numbers match up.
If this solution is half as good as the website touts it to be, then Oracle may end up losing some serious ground in the SMB (Small-to-Medium Business) market.
High-Level Architecture: A database in Greenplum is actually an array of individual databases, usually running on different servers or hosts, all working together to present a single database image. The Greenplum master is the primary entry point to the Greenplum Database System. It is the database instance where users connect to the database and execute SQL statements. The master coordinates the work amongst the other database instances in the system-the Greenplum segments, which is where the user data resides.
Mirroring and Fault Tolerance: When you deploy your Greenplum Database system, you have the option to configure mirror segments. Mirror segments allow database queries to fail over to a backup segment if the primary segment is unavailable.
Greenplum Database is able to detect when a host is unavailable or when a segment database server process is down. When this occurs the master will mark the primary segments on that host as out-of-service and immediately switch over to the mirror segments so that the operation can continue.
Why is Greenplum Database better-suited to business intelligence and data warehousing than databases like Oracle?
Greenplum Database’s “shared-nothing” architecture is optimal for fast queries and loads because it places processors as close as possible to the data itself, and performs queries and other operations with the maximum degree of parallelism possible. “OLTP” architectures like Oracle’s were designed and built with an entirely different purpose and are not capable of the kinds of parallelism, or performance, that Greenplum Database delivers.
How is that Greenplum can deliver a product like Greenplum Database, but no other company or organization has to date?
When the founders of Greenplum converged in 2003, they set out to change the game. They saw that enterprise software, and particularly database software, was far too expensive and performed badly. Our uniquely capable team includes some of the best minds in the industry, with experts from Oracle, Teradata, Sybase, Informix, Netezza, PostgreSQL, HPTi, CalTech, MIT, Stanford University, and other leading companies, organizations and institutions. Greenplum Database is the result of the confluence of Greenplum’s unique vision and vast experience in the midst of undeniable industry trends.
Monday, May 28. 2007
Adaptive Modeling in Brute-force Cracking Dictionaries
The size of the dictionary could be reduced, also the number of attemps, if an adaptive model is used against the guess. An adaptive model is really useful when a part of the guess (most of the times the password) is known.
It's obvious that the author of this article does not speak English as a first language, but it's informative nonetheless. An interesting peek into frequency analysis, typically used for breaking crypto, applied to SQL injections.
Sunday, May 20. 2007
PostgreSQL Performance Tuning Howtos
Performance Tuning PostgreSQL:
There are several postmaster options that can be set that drastically affect performance, below is a list of the most commonly used and how they effect performance:
It's always good to have plenty of shared buffers. Most/all database systems use them to share data between processes (IPC).
With using/not using the fsync(2) command in anything I/O sensitive, you always risk data corruption. You've really got to want to turn up the performance to turn this off. fsync(2) is a pretty decent performance hit, but a slow[er] database server is worth ensuring that it's data is consistent.
Note that many of these options consume shared memory and it will probably be necessary to increase the amount of shared memory allowed on your system to get the most out of these options.
Turning up the amount of shared memory at the OS level is always good for database workloads.
Tuning PostgreSQL for Performance:
PostgreSQL counts a lot on the OS to cache data files and hence does not bother with duplicating its file caching effort. The shared buffers parameter assumes that OS is going to cache a lot of files and hence it is generally very low compared with system RAM. Even for a dataset in excess of 20GB, a setting of 128MB may be too much, if you have only 1GB RAM and an aggressive-at-caching OS like Linux.
There is one way to decide what is best for you. Set a high value of this parameter and run the database for typical usage. Watch usage of shared memory using ipcs(8) or similar tools. A recommended figure would be between 1.2 to 2 times peak shared memory usage.
This howto also contains a link to the Annotated postgresql.conf and Global User Configuration Guide.
PostgreSQL 8.0 Performance Checklist
Literally, a performance checklist for PostgreSQL. Shorter read than the others, but it's to the point.
Monday, July 10. 2006
Friday, June 30. 2006
Thursday, June 29. 2006
MySQL Denial of Service
Thanks to Kanatoko for discovering this.
A query such as "select str_to_date( 1, NULL );" will result in a crash of the MySQL daemon.
It appears that several versions from the 4.1, 5.0, and 5.1 branch are vulnerable.
(Page 1 of 1, totaling 14 entries)