背景

我是计算机科学一年级的学生,我在我爸爸的小公司兼职。我没有任何实际应用程序开发的经验。我用Python写过脚本,用C写过一些课程,但没有像这样的。

我爸爸有一家小型培训公司,目前所有的课程都是通过外部网络应用程序安排、录制和跟踪的。有一个导出/“报告”功能,但它是非常通用的,我们需要特定的报告。我们无法访问实际的数据库来运行查询。我被要求建立一个自定义报告系统。

我的想法是每天晚上创建通用的CSV导出,并将它们导入(可能使用Python)到办公室托管的MySQL数据库中,从那里我可以运行所需的特定查询。我没有数据库方面的经验,但了解最基本的知识。我读了一些关于数据库创建和标准表单的知识。

我们可能很快就会有国际客户,所以我希望数据库不会爆炸,如果/当这种情况发生时。我们目前也有一些大公司作为客户,他们有不同的部门(例如ACME母公司、ACME医疗保健部门、ACME身体护理部门)。

我想到的模式如下:

From the client perspective: Clients is the main table Clients are linked to the department they work for Departments can be scattered around a country: HR in London, Marketing in Swansea, etc. Departments are linked to the division of a company Divisions are linked to the parent company From the classes perspective: Sessions is the main table A teacher is linked to each session A statusid is given to each session. E.g. 0 - Completed, 1 - Cancelled Sessions are grouped into "packs" of an arbitrary size Each packs is assigned to a client

我在一张纸上“设计”(更像是乱写)模式,试图将其正常化到第三种形式。然后我把它插入MySQL工作台,它使它对我来说都很漂亮:(点击这里查看全尺寸图形)

(来源:maian.org)

我将运行的示例查询

哪些仍有信用的客户是不活跃的(未来没有安排课程的客户) 每个客户/部门/部门的出勤率是多少(以每次会议的状态id来衡量) 一个老师一个月上几节课 标记出勤率低的客户 人力资源部门的自定义报告,包括其部门人员的出勤率

问题(s)

这是设计过度还是我的方向正确? 对于大多数查询,需要连接多个表是否会导致很大的性能损失? 我给客户端添加了一个“lastsession”列,因为这可能是一个常见的查询。这是一个好主意,还是我应该保持数据库严格规范化?

谢谢你的宝贵时间


当前回答

以下是关于你问题的更多答案:

1)对于第一次遇到这种问题的人来说,你说得很对。我认为到目前为止,其他人在这个问题上的建议几乎涵盖了这个问题。好工作!

2 & 3)性能的下降很大程度上取决于为特定的查询/过程设置和优化正确的索引,更重要的是记录的数量。除非您的主表中有超过一百万条记录,否则您似乎已经走上了一条足够主流的设计道路,在合理的硬件上,性能不会成为问题。

That said, and this relates to your question 3, with the start you have you probably shouldn't really be overly worried about performance or hyper-sensitivity to normalization orthodoxy here. This is a reporting server you are building, not a transaction based application backend, which would have a much different profile with respect to the importance of performance or normalization. A database backing a live signup and scheduling application has to be mindful of queries that take seconds to return data. Not only does a report server function have more tolerance for complex and lengthy queries, but the strategies to improve performance are much different.

For example, in a transaction based application environment your performance improvement options might include refactoring your stored procedures and table structures to the nth degree, or developing a caching strategy for small amounts of commonly requested data. In a reporting environment you can certainly do this but you can have an even greater impact on performance by introducing a snapshot mechanism where a scheduled process runs and stores pre-configured reports and your users access the snapshot data with no stress on your db tier on a per request basis.

所有这些都是为了说明根据所创建的db的角色不同,所采用的设计原则和技巧可能会有所不同。我希望这对你们有帮助。

其他回答

顺便说一句,值得注意的是,如果您已经在生成csv并希望将它们加载到mySQL数据库中,load DATA LOCAL INFILE是您最好的朋友:http://dev.mysql.com/doc/refman/5.1/en/load-data.html。Mysqlimport也值得一看,它是一个命令行工具,基本上是一个装载数据文件的漂亮包装器。

我在培训/学校领域工作过,我想我要指出的是,在你所说的“会话”(给定课程的实例)和课程本身之间通常存在M:1的关系。换句话说,你的目录提供了这门课程(“西班牙语101”之类的),但你可能在一个学期中有两个不同的实例(Tu-Th由Smith教授,wed - friday由Jones教授)。

除此之外,这看起来是个不错的开始。我敢打赌,您会发现客户端域(指向“客户端”的图形)比您所建模的要复杂得多,但在获得一些实际数据来指导您之前,不要过于复杂。

你说得对。但是,您可以清理它,并删除一些映射(有*)表。

你能做的就是在部门表中添加CityId和DivisionId。

除此之外,我觉得一切都很好……

以下是关于你问题的更多答案:

1)对于第一次遇到这种问题的人来说,你说得很对。我认为到目前为止,其他人在这个问题上的建议几乎涵盖了这个问题。好工作!

2 & 3)性能的下降很大程度上取决于为特定的查询/过程设置和优化正确的索引,更重要的是记录的数量。除非您的主表中有超过一百万条记录,否则您似乎已经走上了一条足够主流的设计道路,在合理的硬件上,性能不会成为问题。

That said, and this relates to your question 3, with the start you have you probably shouldn't really be overly worried about performance or hyper-sensitivity to normalization orthodoxy here. This is a reporting server you are building, not a transaction based application backend, which would have a much different profile with respect to the importance of performance or normalization. A database backing a live signup and scheduling application has to be mindful of queries that take seconds to return data. Not only does a report server function have more tolerance for complex and lengthy queries, but the strategies to improve performance are much different.

For example, in a transaction based application environment your performance improvement options might include refactoring your stored procedures and table structures to the nth degree, or developing a caching strategy for small amounts of commonly requested data. In a reporting environment you can certainly do this but you can have an even greater impact on performance by introducing a snapshot mechanism where a scheduled process runs and stores pre-configured reports and your users access the snapshot data with no stress on your db tier on a per request basis.

所有这些都是为了说明根据所创建的db的角色不同,所采用的设计原则和技巧可能会有所不同。我希望这对你们有帮助。

不。看起来你设计得很细致。

我认为在你的设计中,国家和公司实际上是同一个实体,城市和部门也是如此。我将去掉国家和城市表(以及Cities_Has_Departments),并在必要时向公司表添加一个布尔标志IsPublicSector(如果除了简单的Private Sector / PublicSector之外还有其他选择,则添加CompanyType列)。

Also, I think there's an error in your usage of the Departments table. It looks like the Departments table serves as a reference to the various kinds of departments that each customer division can have. If so, it should be called DepartmentTypes. But your clients (who are, I assume, attendees) do not belong to a department TYPE, they belong to an actual department instance in a company. As it stands now, you will know that a given client belongs to an HR department somewhere, but not which one!

换句话说,客户机应该链接到您称为Divisions_Has_Departments的表(但是我将简单地称之为Departments)。如果是这样,那么如果您想在数据库中使用标准引用完整性,就必须将Cities分解为division,就像上面讨论的那样。