Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#43598 - 24/10/2001 11:07 Way OT: Ok, off topic
eternalsun
Pooh-Bah

Registered: 09/09/1999
Posts: 1721
Loc: San Jose, CA
Looking at some of the posts from everyone, I just want to post a couple of off topic questions just for opinions or answers.

If you have a situation with a remote server (5 seconds to respond per request) containing half the data needed in a response. This server is XML. And you have a front end system closely coupled to a database able to access the other half of the data fairly quickly. The information on the remote server has a lifetime of around 30 minutes before it changes. Customer requests go through the fast server, but all the requests require a response containing the other half of the relevant data from the far and slow server. What's a nice efficient way of sending back mostly fast consistent responses? Ideas? I've heard of reverse proxy caching, but don't know much about it.

The other question I have is having to do with large data stores. We have a situation where a few million records are touched, inserted, deleted, updated, etc per day. Ideally, the database would hold about 400 million records, of which a few million will change daily. The changes arrive in spits and spurts via flat file, and takes half a day to load. However, requests for this data is constant. What's a good suggested architecture to allow for uninterrupted read access from those tables without disrupting one or the other? Clues?

Calvin


Top
#43599 - 24/10/2001 11:28 Re: Way OT: Ok, off topic [Re: eternalsun]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31600
Loc: Seattle, WA
More details on the systems in question would be nice. Are you using Oracle, SQL server, something else...?

I won't be able to answer any question related to that stuff, I just read the message and realized that any answers will be system-specific, so anyone who answers will need that information before giving any advice.

___________
Tony Fabris
_________________________
Tony Fabris

Top
#43600 - 24/10/2001 13:38 Re: Way OT: Ok, off topic [Re: eternalsun]
fvgestel
old hand

Registered: 12/08/2000
Posts: 702
Loc: Netherlands
These implementations sound very specific and I think you should do a lot of research in available options. One oversight at this level could cause major performance problems in the future.

I've heard of reverse proxy caching, but don't know much about it.

A reverse proxy works just like a normal proxy, it just hides the remote machine from the user : suppose the remote machinename is remote.acme.com, you could register a local machinename to mask the remote machine. You would need a caching-proxy to speed up things in this situation though. There are a lot of commercial products which offer this functionality; if ypu want to play with it, you can use the proxy module in apache to try things out. If you want to speed up even more, you could think of some roundrobin-dns configuration, and multiple proxy-caches serving data; the DNS-servers will return one of several IP-adresses for one hostname. If you've got some decent network-equipment, there's probably some way to have the same result using the switches/routers.
The database could also benefit from roundrobin-dns, in combination with some good replication-design. A Sun Fire could also help



Frank van Gestel
_________________________
Frank van Gestel

Top
#43601 - 24/10/2001 17:53 Re: Way OT: Ok, off topic [Re: eternalsun]
tanstaafl.
carpal tunnel

Registered: 08/07/1999
Posts: 5549
Loc: Ajijic, Mexico
Ideally, the database would hold about 400 million records, of which a few million will change daily

This is the Bugzilla database for the empeg and emplode bug listings, right?



tanstaafl.

"There Ain't No Such Thing As A Free Lunch"
_________________________
"There Ain't No Such Thing As A Free Lunch"

Top
#43602 - 25/10/2001 11:33 Re: Way OT: Ok, off topic [Re: tfabris]
eternalsun
Pooh-Bah

Registered: 09/09/1999
Posts: 1721
Loc: San Jose, CA
We've got some Oracle databases doing the heavy hitting. I'm not looking for a solution, rather, hints, tips, ideas, far fetched or not. It's for a massive new product we're building.

Perhaps at the next empeg meet I'll hand out a few evaluation copies of our software. My company does real time tracking and analysis of aircraft. We already have the problem of "where is every single plane in the sky right now? how fast? how high? what's its course? is it off course? when will it get there?" (see screen shot).

Now we're moving on to something much bigger, but if it performs like a dog, it may be scrapped.

Calvin


Attachments
42605-us_all_flights_small.gif (98 downloads)


Top
#43603 - 25/10/2001 11:37 Re: Way OT: Ok, off topic [Re: fvgestel]
eternalsun
Pooh-Bah

Registered: 09/09/1999
Posts: 1721
Loc: San Jose, CA
Frank,

If the remote machine to be proxied, is an XML based source that functions via http post, it can be proxied by apache correct? I'll look into it and see.

I'll alsocheck out the possibilty of round-robin, and see if there's anything that could benefit.

Thanks,
Calvin

Top
#43604 - 25/10/2001 12:25 Re: Way OT: Ok, off topic [Re: eternalsun]
fvgestel
old hand

Registered: 12/08/2000
Posts: 702
Loc: Netherlands
I don't have any direct experience with XML, other than some WAP-experiment with my nokia and a freeware WAP-server, but I suppose the XML-documents contain a normal expiration-tag, just like anything else served over the HTTP-protocol. As apache can function as a robust and extendible proxy-server, I doubt it cannot handle the task.
_________________________
Frank van Gestel

Top
#43605 - 25/10/2001 15:48 Re: Way OT: Ok, off topic [Re: eternalsun]
bonzi
pooh-bah

Registered: 13/09/1999
Posts: 2401
Loc: Croatia
About that large data store:
- Do updates/inserts/deletes come all day? How time-critical are they? Are they independent of each other or come in large transactions?
- How many indices do queries use? Can table be effectivelly partitioned on all of them? Some of them?
- Does particular version of Oracle support read-only replicas?
_________________________
Dragi "Bonzi" Raos Q#5196 MkII #080000376, 18GB green MkIIa #040103247, 60GB blue

Top
#43606 - 26/10/2001 13:06 Re: Way OT: Ok, off topic [Re: bonzi]
eternalsun
Pooh-Bah

Registered: 09/09/1999
Posts: 1721
Loc: San Jose, CA
The update/inserts/deletes of concern comes in large chunks far, far apart. Each chunk could be considered one transaction and is independent afaik.

Are you suggesting using a read only replica with a lot of indices of a write only table with no indices?

I'll look into the other stuff today and see.

Calvin

Top
#43607 - 27/10/2001 06:13 Re: Way OT: Ok, off topic [Re: eternalsun]
schofiel
carpal tunnel

Registered: 25/06/1999
Posts: 2993
Loc: Wareham, Dorset, UK
I'd be looking at placing an ORB on each machine serving up. Then place a fast server in the system, either stand-alone or virtuallised between several physical machines. On this machine you could put up a custom CGI app written in either JAVA or C++ using CORBA to pull the data sources under a common access point. With the info distributed like this you can look at parallelism, load balancing, etc.

Alternatively, run up JSPs on a server, with Servlets on different data source machines providing access to each data source. The JSP applications can access the servlets according to required content, or serve up pages to the clients that interrogate the servlets seperately.

The biggest problem you seem to have is in the different delivery rates from different sources: you could handle that by composing the UI delivered to the user of servlet clients; the clients contact a specific servlet, registers an interest, and then waits for updates to be pushed back to it ('Server Push" rather than "Client Pull"). Works very well if you get it right.

With the data source integration issues, you are touching on two problems and you should analyse them seperately. There is constant-load external access, and non-intrusive data integration. Both are pretty standard solutions provided off the shelf by the major dB engine manufacturers. Multi-source updating was something I had to deal with for a telephone info system where around 100 agents were continually accessng the dB as it was updated. We handled it with buffered processes and a common access point for all requests. It meant that the agents had to wait and have lower performance accessing the dB, but it ensured the updating/integration took priority with no loss, which was our design criterion.
_________________________
One of the few remaining Mk1 owners... #00015

Top
#43608 - 29/10/2001 18:25 Re: Way OT: Ok, off topic [Re: schofiel]
eternalsun
Pooh-Bah

Registered: 09/09/1999
Posts: 1721
Loc: San Jose, CA
We have a single (customer) point of access at this point, but have not considered "choking" the rate at that point as updates are taking place. I'll have to look into this.

What do you mean by buffered processes?

I think you broke it down very well to say there is two distinct problems here. The most impactful of the two is the slow source problem. The feed out of there requires multiple requests to satisfy, and each request can take on the order of 5 seconds each, which can be problematic with hundreds of requests needed to compose a response. This wouldn't ordinarily be an issue because we can throw multithreaded requests at it, except the source tends to slow down even more when that's done. So for this reason, we're looking at a reverse proxy of some type. I'm thinking of squid operating in httpd-accelerate mode perhaps.

The type of responses we ship out are complete "answers" that aren't easy to break down in terms of the speed of the individual sources. So if a response requires Sources A, B, C, D, E ; and the response times are 1, 2, 3, 4, 5 respectively. The response has to be complete and whole, and will end up taking up as long as the slowest response. Distributing the sources (as opposed to collating them into a single large db) doesn't make a lot of sense because the ultimate origin of the sources are from outside our systems. But I can see what you're getting at, if the sources are partitioned, then the updates can be uninterrupted and access to data in the process of being updated is controlled by each source independently. Correct me here if I'm misinterpreting or not reading through correctly.

Calvin

Top