Apache Iceberg
Apache Iceberg is an open-source table format for large-scale data processing frameworks like Apache Spark and Apache Hive. It ensures consistent and efficient data management by providing features such as atomic commits, schema evolution, and time travel capabilities. Iceberg simplifies data warehousing and enables reliable analytics at scale.
Features
- Atomic Commits- transactions fully commit or not at all
- Schema Evolution- seamless changes to table schemas without availability disruption.
- Time Travel- query historical data for audit, debugging, analysis.
- Partitioning- enable efficient data organisation and improve query performance
- Data deduplication- reduce storage and improve query performance.
- Metadata Management- manage table metadata, get unified view
- Data consistency
Benefits
- Data consistency- from atomic commits.
- Schema Evolution with no downtime, data migrations
- Improved query performance
- Incremental processing that minimises processing and reduces operational complexity
- Time Travel by enabling querying of past data snapshots
Pricing
£700 a unit a day
Service documents
Request an accessible format
Framework
G-Cloud 14
Service ID
4 7 3 3 3 5 0 2 4 9 9 7 7 4 1
Contact
Millersoft Ltd
Gerry Conaghan
Telephone: 0131 376 7114
Email: gerry@millersoftltd.com
Service scope
- Software add-on or extension
- Yes, but can also be used as a standalone service
- What software services is the service an extension to
-
Sheetloom www.sheetloom.com
Can be integrated into most ETL and CMS software applications. - Cloud deployment model
-
- Public cloud
- Private cloud
- Hybrid cloud
- Service constraints
- None
- System requirements
-
- Supports file systems with in-place write capabilities.
- Requires seekable read operations for data files.
- Must handle file deletions on the storage system.
- Compatible with major object storage solutions like Amazon S3.
- No specific software licenses required for base operation.
- Optimized for integration with big data computing engines.
- Works with existing SQL and NoSQL database systems.
- Secure handling of metadata for consistent state management.
- Scalable architecture without central metadata bottlenecks.
- Supports schema evolution for future-proof data management.
User support
- Email or online ticketing support
- Email or online ticketing
- Support response times
- Depends on SLA, normally within 4 hours
- User can manage status and priority of support tickets
- No
- Phone support
- Yes
- Phone support availability
- 9 to 5 (UK time), Monday to Friday
- Web chat support
- No
- Onsite support
- Yes, at extra cost
- Support levels
- L1: Tier/Level 1(T1/L1) Initial support level responsible for basic customer issues. Gathering formation to determine the issue by analysing the symptoms and figuring out the underlying problem. L2: Tier/Level 2(T2/L2) This is a more in-depth technical support level than Tier I containing experienced and more knowledgeable personnel on a particular product or service. L3 Tier/Level 3(T3/L3) Individuals are experts in their fields and are responsible for not only assisting both Tier I and Tier II personnel, but with the research and development of solutions to new or unknown issues. Severity Definitions 1- Critical: Proven Error of the Product in a production environment. The Product Software is unusable, resulting in a critical impact on the operation. No workaround is available. 2- Serious: The Product will operate but due to an Error, its operation is severely restricted. No workaround is available. 3- Moderate: The Product will operate with limitations due to an Error that is not critical to the overall operation. For example, a workaround forces a user and/or a systems operator to use a time consuming procedure to operate the system; or removes a nonessential feature. 4- Due to an Error, the Product can be used with only slight inconvenience.
- Support available to third parties
- Yes
Onboarding and offboarding
- Getting started
-
We provide training and support on how to access and use the data
Documentation on using the system. - Service documentation
- Yes
- Documentation formats
-
- HTML
- End-of-contract data extraction
- We can export all data we hold back to the client
- End-of-contract process
- The service will be switched off and the Buyer will not have any access to it. This is done as part of the contract. There is no cancellation cost
Using the service
- Web browser interface
- No
- Application to install
- No
- Designed for use on mobile devices
- No
- Service interface
- No
- User support accessibility
- None or don’t know
- API
- No
- Customisation available
- Yes
- Description of customisation
-
Output format
Source inputs
Output destination Eg cloud storage, email etc)
User can integrated into existing system
Users can customise via liason with supplier
Supplier customises from requirements supplied by Buyer
Scaling
- Independence of resources
- Separate hardware for each client- each client has their own bespoke virtual server e.g. AWS EC2.
Analytics
- Service usage metrics
- No
Resellers
- Supplier type
- Not a reseller
Staff security
- Staff security clearance
- Other security clearance
- Government security clearance
- Up to Developed Vetting (DV)
Asset protection
- Knowledge of data storage and processing locations
- Yes
- Data storage and processing locations
-
- United Kingdom
- European Economic Area (EEA)
- User control over data storage and processing locations
- Yes
- Datacentre security standards
- Managed by a third party
- Penetration testing frequency
- At least once a year
- Penetration testing approach
- In-house
- Protecting data at rest
-
- Encryption of all physical media
- Scale, obfuscating techniques, or data storage sharding
- Data sanitisation process
- Yes
- Data sanitisation type
- Deleted data can’t be directly accessed
- Equipment disposal approach
- Complying with a recognised standard, for example CSA CCM v.30, CAS (Sanitisation) or ISO/IEC 27001
Data importing and exporting
- Data export approach
- This is a customisable option. e.g. can be from overnight batch job, trigger process from frontend UI. on a scheduled. These can be configured through liason with Supplier.
- Data export formats
-
- CSV
- ODF
- Other
- Other data export formats
- JSON
- Data import formats
-
- CSV
- ODF
- Other
- Other data import formats
-
- JSON
- Parquet
- XML
Data-in-transit protection
- Data protection between buyer and supplier networks
-
- TLS (version 1.2 or above)
- IPsec or TLS VPN gateway
- Legacy SSL and TLS (under version 1.2)
- Data protection within supplier network
-
- TLS (version 1.2 or above)
- IPsec or TLS VPN gateway
- Legacy SSL and TLS (under version 1.2)
Availability and resilience
- Guaranteed availability
- AWS guaranteee 99.999% uptime service. AWS services are delivered from multiple datacentres worldwide. When deploying customer services to AWS, Iceberg can be configured such that services span multiple availability zones (data centres) to ensure service availability.
- Approach to resilience
- The data centre is provided by AWS who comply with the strictest of resiliency standards. Further information is available on request.
- Outage reporting
- Email alerts
Identity and authentication
- User authentication needed
- Yes
- User authentication
-
- 2-factor authentication
- Dedicated link (for example VPN)
- Username or password
- Access restrictions in management interfaces and support channels
- This is all handled through the cloud provider´s IAM (Identity Access Management) service, which provides granular control to restrict roles and users.
- Access restriction testing frequency
- At least every 6 months
- Management access authentication
-
- 2-factor authentication
- Dedicated link (for example VPN)
- Username or password
Audit information for users
- Access to user activity audit information
- You control when users can access audit information
- How long user audit data is stored for
- User-defined
- Access to supplier activity audit information
- You control when users can access audit information
- How long supplier audit data is stored for
- User-defined
- How long system logs are stored for
- At least 12 months
Standards and certifications
- ISO/IEC 27001 certification
- No
- ISO 28000:2007 certification
- No
- CSA STAR certification
- No
- PCI certification
- No
- Cyber essentials
- Yes
- Cyber essentials plus
- No
- Other security certifications
- No
Security governance
- Named board-level person responsible for service security
- Yes
- Security governance certified
- Yes
- Security governance standards
- Other
- Other security governance standards
- Cyber Essentials
- Information security policies and processes
- Millersoft follows AWS best practice on security https://aws.amazon.com/security/. We have a range of technical and organisational measures to ensure data security and protection. These cover Access, Roles and Responsibilities, Resource/asset management, Access Control & Authentication, Workstation & Device Security, Network/Communications Security, Back-up, mobile/portable device security, and physical security of our premises. Staff training and awareness is ongoing, staff / contractors must sign confidentiality and privacy statements and read and sign company security policy. Sanctions are applicable for non-compliance. Our reporting structure if a security breach happens or is suspected: staff are trained to and required to immediately flag to DPO and CEO and lock down or isolate the breach where feasible; DPO/CEO will take immediate action including isolation or lock down of affected systems, notification to affected parties, implementation of business continuity and disaster recovery. Risk impact reviews are conducted when a new data category is processed, or system implemented, and security measures adapted as necessary. Category logs, training logs, access logs, and breach logs are maintained, reviewed and signed off periodically by the assigned DPO and CEO.
Operational security
- Configuration and change management standard
- Supplier-defined controls
- Configuration and change management approach
-
All code is under version control using Git.
An automated test framework is used for integration testing.
Changes are tracked via jira. - Vulnerability management type
- Supplier-defined controls
- Vulnerability management approach
- Apache Iceberg emphasizes security through proactive measures including regular security audits, community engagement for vulnerability reporting, and prompt release of patches. It employs automated security testing within its CI/CD pipeline, offers detailed security documentation, and maintains a responsible disclosure policy. Organizations are advised to integrate Iceberg with existing security tools for enhanced monitoring and to follow the project's security advisories for timely updates. These practices ensure ongoing prioritization of security, safeguarding user data and maintaining the integrity of the identity management platform.
- Protective monitoring type
- Supplier-defined controls
- Protective monitoring approach
-
All logs go to AWS Cloudwatch for auditing, monitoring and alerting
Real-time monitoring to detect unauthorized access.
Maintain audit trails for all schema and data modifications.
Enforced strict access controls integrated with enterprise authentication.
Utilized encryption for data both at rest and in transit.
Uses anomaly detection to identify and alert on unusual activities.
Conducts regular security audits and compliance checks.
Establishes robust backup and disaster recovery protocols.
Ensures prompt application of security patches and updates.
Provides ongoing security awareness training for all users. - Incident management type
- Supplier-defined controls
- Incident management approach
-
Detection and Reporting: Monitoring systems detect anomalies and issues are reported by users or automated systems.
Response: A dedicated team assesses the incident to determine its impact and urgency.
Analysis and Investigation: The team investigates to identify the root cause and extent of the incident.
Resolution and Recovery: Steps are taken to resolve the issue and restore service to normal operations.
Post-Incident Review: Analyze the incident to improve future response and prevent recurrence.
Secure development
- Approach to secure software development best practice
- Supplier-defined process
Public sector networks
- Connection to public sector networks
- No
Social Value
- Social Value
-
Social Value
Tackling economic inequalityTackling economic inequality
We believe that our social mission to assist young people into employment is compatible with the guidelines laid out in the Governments Social Value theme of tacking economic inequality (MAC 2.2). Wherever it has the opportunity to do so, Millersoft has and continues to offer placements, internships and employment to technology students from the deprived local area studying in local colleges and universities with whom we hold relations. Our method is to provide initial training and inductions to suitable internees before assigning them to live projects, where they are monitored, supported, challenged, and encouraged by experienced senior consultants and developers. As an organisation that values fresh and radical ideas to find new products and solutions to solve existing problems, internees are also encouraged to share their thoughts and ideas in a stimulating and collaborative environment, and often asked to implement, test and deploy them into real world projects. Regular development reviews are held with internees and progress objectives adapted accordingly. Internees, as is the case with all staff, receive regular training in the latest technologies which may cover Cloud Technologies (staff are trained to be Amazon Web Service Engineers and Architects), data processing tools, database management, project management, security. In most cases internees become full time employees at Millersoft once they graduate and are already well equipped to take on more responsibility and autonomy within the company.
Pricing
- Price
- £700 a unit a day
- Discount for educational organisations
- No
- Free trial available
- No