Disk and I/O Sizing for MOSS2007 – Part 2
This is part two of a two part article that discusses techniques for sizing your disk requirements for SharePoint. It is one of a myriad of tasks that you have to perform as part of the planning process, but it is one task that you ideally want to get right the first time, as storage technology can be a potentially high capital cost. I first performed a comprehensive analysis of disk space and performance scalability requirements for a MOSS07 farm back in February 07.
The first article examines some of the tools and techniques you can utilise to help you fill out Microsoft’s sizing worksheet. Part two builds upon this with a real-world example. So to recap from part 1, we have the following stats captured.
MYSERVER stats taken 10am – 30/1/2007
- System Uptime: 551 hours
- Total Logons since reboot: 927794
- Logons per minute – 927794 / (551 * 60): 28
- Total Opened Files since reboot: 114518692
- Files opened per minute – 114518692 / (551 * 60) : 3463
- Average Disk Queue length: 2-4
- Reported open files in computer management: 1206
- Number of unique users listed with opened files: 201
- Number of open files for APPS: 664
- Number remaining not APPS: 542
- % of open files to DATA shares: 45%
- Of non APPS, number of open files (as opposed to folder): 314
- % of open files versus active listed files (314/1206): 26%
So first we will estimate I/O requirements and finish off with disk space.
Required I/O Throughput Estimation
Throughput is the number of operations that a server farm can perform per second and is measured in requests per second (RPS). Ideally the number of operations that are requested per second is lower than the number that can be performed. If the number of operations that is requested exceeds the number that can be performed, user actions and other operations take longer to complete.
From Microsoft’s capacity documentation: “RPS measurements can be converted to the total number of users by using a model of typical end-user behaviour. Like many human behaviours, there is a broad range of “typical” behaviour. The user model for Windows SharePoint Services 3.0 has the following two variables:”
- Concurrency – The percentage of users that are using the system simultaneously.
- Request rate — The average number of requests per hour that an active user generates.
So, Referring to our assumptions above from examining MYSERVER.
- Number of unique users listed with opened files: 219
- Number of staff in Head Office: 600
Equals an estimated base user concurrence of 33%.Â
So now we have a user concurrancy rate, lets now examine the rate that files are opened on the server and also apply a weighting.
Now when I did this examination, the “files open per minute” figure that I reached was 3463. This seemed a very high number, and I believed misrepresented the true number of files opened per minute. As it happened, this server hosted an application share called APPS where applications were run directly from the server (F:\). It was very difficult to determine how much impact the APPS share had on this figure, because the nature of the applications was that files were accessed/opened/executed potentially much more then static data files. Unfortunately for me, the APPS and DATA shares were on the same logical disk partition, so I had no easy way to split the I/O into quantifiable counters. So this is one estimate where I applied a best guess (see the 5% – 20% below).
- Files opened per minute – 114518692 / (551 * 60) : 3463
- Of non F:\APPS, number of open files (as opposed to folder): 314
- % of open files versus active listed files (314/1206): 26%
- Non APPS share I/O weighting: (5%-20%)
So now, I used the following methodology to estimate current load.
I took the “number of files opened per minute figure” and multiplied by the “% of open files versus active listed files” and called that “RELEVENT FILES”
- 3463 * 26% = 900 Relevant files opened per minute
Now I took the “relevent files” figure and applied divided it by the number of concurrent users.
Relevant Files / Concurrent Users = Requests Per User Per Minute
- 900 / 219 = 4.1
Next, I applied the “Non APPS share I/O weighting” figures and applied them
- 4.1 * 5% = 0.2 requests per user per minute
- 4.1 * 20% = 0.82 requests per user per minute
So, my result was that between .2 and 0.82 file requests per user per minute. If we examine Microsoft’s published throughput table that helps you estimate load, we can see where we fit.
Load |
Request rate |
Supported users |
Light |
20 requests per hour. An active user will generate a request every 180 seconds. |
Each response per second of throughput supports 180 simultaneous users and 1,800 total users. |
Typical |
36 requests per hour. An active user will generate a request every 100 seconds. |
Each response per second of throughput supports 100 simultaneous users and 1,000 total users. |
Heavy |
60 requests per hour. An active user will generate a request every 60 seconds. |
Each response per second of throughput supports 60 simultaneous users and 600 total users. |
Extreme |
120 requests per hour. An active user will generate a request every 30 seconds. |
Each response per second of throughput supports 30 simultaneous users and 300 total users. |
- .2 * 60 = 12 requests per minute
- .82 * 60 = 49.2 requests per minute
According to the above table, this represents “Normal” to just under “Heavy” load. Given the server was already under Disk I/O strain (disk queue length > 2), and the fact that I applied a fudge factor when estimating the overhead of non data files (APPS), I recommended that we size the server to accommodate “heavy” load.
Phew! That was a lot of variables and formulas.. and unfortunately we are not there yet!
Complex vs. Common Operations and Daily Peak Factor
Microsoft identify that different types of operations performed in SharePoint have different load requirements.
- Common Operations typically include tasks like browsing categories and lists and hitting the home page.
- Complex operations are document collaboration functions (including document check-in and document upload.) Microsoft suggests that these operations carry a higher weighting when estimating total operations per day. (3 * common operations)
The next step is to estimate complex versus common operations. As I mentioned at the very start of this post, one of the key requirements of SharePoint here was for document collaboration. Thus I introduced my second “best guess” estimate. That estimate was that 75% of operations were common, and 25% of operations were complex.
Now astute readers of this post may have noticed that thus far, I had not applied any weighting calculations to time of day factors. Fear not, I did not forget about it! 🙂
So far, our estimations of usage have been based on a sample taken of the file server at 10am on the 30/1/2006. We do not know for sure if this represents peak usage, nor do we expect the same level of load to be experienced at 10PM as opposed to AM. So we make the next two assumptions
Peak Factor is an approximate number that estimates the ratio of peak site throughput to its average throughput. Microsoft recommends a value of 1-4.
Number of hours per day estimates the hours of the day where we would expect to experience average to peak load
- Complex operations: 25%
- Common operations: 75%
- Peak Factor: 2
- Number of hours per day: 10
Previously we estimated .2 to .8 requests per minute. Utilising .8, this equates to 1152 theoretical requests per user per day.
Here is the calculation
- Total Operations Per user Per Day: 1152
- Common Operations (75%): 864
- Complex Operations (weight 3): 864* (288 * 3)
- Peak Factor: 2
- Number of hours per day: 10
Now we can use Microsoft’s “Estimate Peak Throughput Worksheet” to estimate capacity requirements to accommodate all of these factors.
Estimate User Response Time
Response times are categorized in the following way:
- Slow (3-5 seconds) User response times can slow to this rate without issue
- Recommended (1-2 seconds) The average user response time target.
- Fast (<1 second) For organizations whose businesses demand speed
The following Microsoft table lists throughput targets based on user response times.
Total users |
Slow (RPS) |
Recommended (RPS) |
Fast (RPS) |
500 | .4 | .5 | .7 |
1,000 | .7 | 1.0 | 1.2 |
5,000 | 4.0 | 5.0 | 6.0 |
There were 600+ users in the head office so estimated RPS to achieve FAST is around .875 RPS. This is simply an interpolation from .7 RPS for 500 users and 1.2RPS for 1000 users.
The following Microsoft table lists throughput targets in RPS at various concurrency rates
Total users |
5% concurrency rate |
10% |
15% |
25% |
50% |
75% |
100% |
500 | .25 | .5 | .75 | 1.25 | 2.5 | 3.75 | 5.0 |
1000 | .5 | 1.0 | 1.5 | 2.5 | 5.0 | 7.5 | 10.0 |
5,000 | 2.5 | 5.0 | 7.5 | 12.5 | 25.0 | 37.5 | 50.0 |
So, we previously determined that that there was 600+ users in head office where the file server we examined resides. The concurrency factor for the sample period was 33%. So, lets doubled the concurrency rate to accomodate for when a higher percentage of staff are active. In this case, the estimated RPS to achieve 66% is 3.125.
As we previously saw in the table earlier that illustrated the sorts of RPS figures required to accomodate “Normal”, vs “Heavy” load, the RPS required to accommodate 462 simultaneous users under heavy to extreme load is 7.2 and 14.5 RPS. Therefore, this figure should more than accommodate for response time and concurrency estimates which are considerably less.
Estimating Disk Space Requirements
To explain how I estimated future disk growth, I have to describe the activities of the client. They are a project oriented service delivery company. Prior to commencing the pilot, I asked several key stakeholders of the project to broadly categories the different project types across some basic time, concurrency and disk space criteria.
Activity Type |
Number active |
Disk Size per Project |
Duration |
Projects (small) |
4-6 |
50GB |
1 Year |
Projects (medium) |
3-4 |
150GB |
2 Years |
Projects (large) |
1-3 |
350GB |
3 Years |
Detailed Feasibility Studies |
3-4 |
5GB |
6-9 months |
Pre-Feasibility Studies |
2-3 |
5GB |
3-6 months |
Scoping Studies |
2-3 |
2GB |
2-3 months |
Proposals |
3-4 |
2GB |
1-2 months |
As I mentioned in Part 1, a limited pilot was conducted over a 3 month period. The pilot used a detailed feasibility study. At the time of the commencement of the Pilot, the study was 70% complete. The original data copied to the SharePoint web application was:
3725 Files. 2.97GB
As of 8 weeks later using windows explorer view against document libraries, the total files/disk space was:
6432 Files, 3.83GB
Note: Windows Explorer View only reports the latest version of a file, and does not report recycle bin data. Therefore, we can use it to estimate organic file growth without SharePoint versioning and recycle bin implications.
So, the aforementioned study was around 70% complete at the start of the pilot and almost 90% complete by the time the study period officially concluded. Therefore over the period of 2 months, the data grew by around 942MB. Based on this, we can assume that the final size of the study would probably have been around 4.16GB.
This figure was derived from the 942MB in 2 months, extrapolated to 1413 megabytes (1.4GB) for the quarter (942 * 1.5) . This was added to the 2.97GB originally loaded in the Pilot. What gives me assurance that this estimate is accurate is that if we compare this real world figure initially provided for a detailed feasibility study, final estimate of 4.16GB fits closely to the 5GB estimated.
Activity Type |
Number active |
Disk Size per Project |
Duration |
Detailed Feasibility Studies |
3-4 |
5GB |
6-9 months |
Now, again referring to the list of project types, examine the “number active” column. Lets assume that we are at full capacity, and we have the upper estimate of projects active. From this we can derive a quarterly growth rate in organic disk usage.
Activity Type |
Growth Rate per quarter |
Projects (small) |
12.GB |
Projects (medium) |
18.75GB |
Projects (large) |
30GB |
Detailed Feasibility Studies |
1.6GB |
Pre-Feasibility Studies |
2.5GB |
Scoping Studies |
2GB |
Proposals |
1GB |
Now, here is the fun bit! The Pilot database size was approximately 5GB after the data was initially copied into SharePoint. Remember this was when the project was 70% complete. However, the SQL database size at the conclusion of the pilot (90% completion of the project) was 8.6GB
This is 3.6GB for 2 months and projected further would be 5.4GB for a quarter. This is over 3 times the amount of disk required prior to SharePoint! Microsoft say plan for twice the data you will experience. I suggest this estimate may be is a little low.
So, the next step is to revisit the above tables and recalculate assuming this figure seen in the pilot was actually the case going forward.
Activity Type |
Number active |
Disk Size |
Duration |
Projects (small) |
4-6 |
169GB |
1 Year |
Projects (medium) |
3-4 |
506GB |
2 Years |
Projects (large) |
1-3 |
1181GB |
3 Years |
Detailed Feasibility Studies |
3-4 |
17GB |
6-9 months |
Pre-Feasibility Studies |
2-3 |
17GB |
3-6 months |
Scoping Studies |
2-3 |
7GB |
2-3 months |
Proposals |
3-4 |
7GB |
1-2 months |
Activity Type |
Growth Rate per quarter in SharePoint |
Projects (small) |
42GB |
Projects (medium) |
63GB |
Projects (large) |
98GB |
Detailed Feasibility Studies |
5.6GB |
Pre-Feasibility Studies |
8.5GB |
Scoping Studies |
7GB |
Proposals |
4.6GB |
So now that we have some figures to work with, lets look at Microsoft’s general recommendations for SQL Server.
Database log files        |
Disk space for log files will vary based on log settings and the number of databases. For more information, see Physical Database Storage Design (http://go.microsoft.com/fwlink/?LinkId=78853&clcid=0x409). |
Configuration database            |
The configuration database will not grow past this size. Â 1.5 GB |
Free space       |
 Leave at least 25% free space for each hard disk or volume.           |
Microsoft has recommended that you allow twice as much disk space for content to allow for versioning. Based in the figures in the previous section, the figure derived was more than 3 times the disk space as content.
So the next step was to estimate disk usage two years out. We were not going to immediately implement SharePoint for all existing projects. Instead, new projects would use SharePoint as they came onstream. This meant that it would be a ramp up phase before we hit our ‘optimum’ usage.
So to perform this estimate, I made the following assumptions.
- It assumes that we are running the largest estimate of concurrent projects and studies
- It assumes the length of these projects and studies are at the upper end of the estimation
- It assumes an evenly varied spread of the timing of studies and projects.
- For example: Of the 6 active small projects, 2 have recently started, 2 are halfway through and 2 are almost complete
So below is the consolidated table listing these assumptions, as well as incorporating Microsoft’s SQL recommendations.
Activity | Point in time | Disk Space Used |
Projects (small) * 2 | start | 84GB |
Projects (small) * 2 | middle | 216GB |
Projects (small) * 2 | end | 338GB |
Projects (medium) * 2 | start | 126GB |
Projects (medium) * 1 | middle | 252GB |
Projects (medium) * 1 | end | 506GB |
Projects (large) * 1 | start | 392GB |
Projects (large) * 1 | middle | 784GB |
Projects (large) * 1 | end | 1181GB |
Detailed Feasibility Study * 2 | start | 11.2GB |
Detailed Feasibility Study * 1 | middle | 11.2GB |
Detailed Feasibility Study * 1 | end | 16.8GB |
Pre Feasibility Study * 2 | start | 17GB |
Pre Feasibility Study * 1 | end | 17GB |
Scoping Study * 4 | Â | 28GB |
Proposal * 4 | Â | 18.4GB |
TOTAL DISK SPACE | Â | 3998.6GB |
Free Space Requirement (25%) | 999.65GB | |
Database Log Files (10%) of content | 500MB | |
TOTAL SQL Server Disk Space | 5.5TB |
So, excluding log files, we have a total potential disk space of over 5 terabytes! Holy Crap!! (collective intake of breath). That is even before we look at disk space for SQL backups and indexing!! For backups, I personally like to have at least 3-4 days worth of disk space to hold more than 24 hours worth of full SQL backups, before I have to go to tape. However if you do the maths, the numbers get really scary!
On top of that, indexing is supposed to take somewhere from 10-50% of the total disk space of the content. Lets go with 20% for now. Thus, in two years we would be talking an additional 1TB for the index, plus the mandatory 25% free disk space.
The Recommendation
I am not going to divulge the full details of the recommendation here, but suffice to say that anybody who takes these figures and waltzes into IT Management and suggests that you need 5TB+ for one application for what used to be 1.2TB of file storage is likely going to be laughed out of the office. Quite rightly so. This is an estimation at the end of the day and it all depends on your assumptions (garbage in, garbage out).
Change a percentage here and there and the result may have been totally different.
But what this does tell you, is that there is the *potential* for this sort of growth, and even if the real world figure is half this rate, you are still talking a more accelerated growth rate than ever before, putting more pressure on your disk and backup infrastructure to cope. I simply asked the client to examine my methodology and draw their own conclusions.
So irrespective of your assumptions and results, a detailed review of your storage infrastructure is a must and a SAN is likely for any SharePoint farm for medium to large organizations. A prudent organisation will start planning now for that eventuality, rather than spend stupid amounts of cash up front.
So the company did not buy a 5TB SAN, and I never suggested they do :-), but they did opt to review both their existing SAN and backup infrastructure as a result and found that both were sub-optimally set up and incapable of scaling/growing even to a fraction of the estimates. thus, when they approached vendors, they had clear, well defined requirements. (SharePoint was not the only consideration either – Oracle/ERP also weighed into this process heavily).
Eventually a new, fully redundant SAN was purchased and installed with 1.5TB of disk and that allowed for SQL Backups as well. The old backup technology was replaced by a new architecture and considerable time and effort was spent on disk array/logical volume design and performing redundancy and performance testing.
As a side note.. the factor of 3 estimation in disk consumption observed during the pilot phase continued into full scale production 🙂
Now many, many consultants will be asked to perform this sort of work as part of the planning phase of a large SharePoint implementation. If you find that these two article helped you in your endeavors, please let me know 🙂
thanks for reading
A very well reasoned and not overly ambiguous set of estimates. Good job! Whilst the figures themselves are obviously irrelevant to anyone else. The very fact I was able to forward this over to my managers, finally gave some perspective on my apparent inane ramblings over sizing servers! So many thanks!
This is an excellent job.
using the formulas you used above, please advise on how you would go about with the following requirements:
user: 35000
concurrent user: 40%
Peak factor: 2:1
Business hours = 12.
When i followed the steps you used i got 1360 requests per seconds which i think doesn’t make sense.
if my calculation is right, how many web servers, search servers would you recommend?
The microsoft tables referenced in your blog, where did you get them from?